the world of financial services, Know-Your-Customer (KYC) and Anti-Money Laundering (AML) are critical defense lines against illicit activities. KYC is naturally modelled as a graph problem, where customers, accounts, transactions, IP addresses, devices, and locations are all interconnected nodes in a vast network of relationships. Investigators sift through these complex webs of connections, trying to connect seemingly disparate dots to uncover fraud, sanctions violations, and money laundering rings.
This is a great use case for AI grounded by a knowledge graph (GraphRAG). The intricate web of connections requires capabilities beyond standard document-based RAG (typically based on vector similarity search and reranking techniques).
Disclosure
I am a Senior Product Manager for AI at Neo4j, the graph database featured in this post. Although the snippets focus on Neo4j, the same patterns can be applied with any graph database. My main aim is to share practical guidance on building GraphRAG agents with the AI/ML community. All code in the linked repository is open-source and free for you to explore, experiment with, and adapt.
All in this blog post were created by the author.
A GraphRAG KYC Agent
This blog post provides a hands-on guide for AI engineers and developers on how to build an initial KYC agent prototype with the OpenAI Agents SDK. We’ll explore how to equip our agent with a suite of tools to uncover and investigate potential fraud patterns.
The diagram below illustrates the agent processing pipeline to answer questions raised during a KYC investigation.
Let’s walk through the major components:
- The KYC Agent: It leverages the OpenAI Agents SDK and acts as the “brain,” deciding which tool to use based on the user’s query and the conversation history. It plays the role of MCP Host and MCP client to the Neo4j MCP Cypher Server. Most importantly, it runs a very simple loop that takes a question from the user, invokes the agent, and processes the results, while keeping the conversation history.
- The Toolset. A collection of tools available to the agent.
- GraphRAG Tools: These are Graph data retrieval functions that wrap a very specific Cypher query. For example:
- Get Customer Details: A graph retrieval tool that given a Customer ID, it retrieves information about a customer, including their accounts and recent transaction history.
- Neo4j MCP Server: A Neo4j MCP Cypher Server exposing tools to interact with a Neo4j database. It provides three essential tools:
- Get Schema from the Database.
- Run a READ Cypher Query against the database
- Run a WRITE Cypher QUery against the database
- A Text-To-Cypher tool: A python function wrapping a fine-tuned Gemma3-4B model running locally via Ollama. The tool translates natural language questions into Cypher graph queries.
- A Memory Creation tool: This tool enables investigators to document their findings directly in the knowledge graph. It creates a “memory” (of an investigation) in the knowledge graph and links it to all relevant customers, transactions, and accounts. Over time, this helps build an invaluable knowledge base for future investigations.
- GraphRAG Tools: These are Graph data retrieval functions that wrap a very specific Cypher query. For example:
- A KYC Knowledge Graph: A Neo4j database storing a knowledge graph of 8,000 fictitious customers, their accounts, transactions, devices and IP addresses. It is also used as the agent’s long-term memory store.
Want to try out the agent now? Just follow the instructions on the project repo. You can come back and read how the agent was built later.
Why GraphRAG for KYC?
Traditional RAG systems focus on finding information within large bodies of text that are chunked up into fragments. KYC investigations rely on finding interesting patterns in a complex web of interconnected data – customers linked to accounts, accounts connected through transactions, transactions tied to IP addresses and devices, and customers associated with personal and employer addresses.
Understanding these relationships is key to uncovering sophisticated fraud patterns.
- “Does this customer share an IP address with someone on a watchlist?”
- “Is this transaction part of a circular payment loop designed to obscure the source of funds?”
- “Are multiple new accounts being opened by individuals working for the same, newly-registered, shell company?”
These are questions of connectivity. A knowledge graph, where customers, accounts, transactions, and devices are nodes and their relationships are explicit edges, is the ideal data structure for this task. GraphRAG (data retrieval) tools make it simple to identify unusual patterns of activity.

A Synthetic KYC Dataset
For the purposes of this blog, I have created a synthetic dataset with 8,000 fictitious customers and their accounts, transactions, registered addresses, devices and IP addresses.
The image below shows the “schema” of the database after the dataset is loaded into Neo4j. In Neo4j, a schema describes the type of entities and relationships stored in the database. In our case, the main entities are: Customer, Address, Accounts, Device, IP Address, Transactions. The main relationships amongst them are as illustrated below.

The dataset contains a few anomalies. Some customers are involved in suspicious transaction rings. There are a few isolated devices and IP addresses (not linked to any customer or account). There are some addresses shared by a large number of customers. Feel free to explore the synthetic dataset generation script, if you want to understand or modify the dataset to your requirements.
A Basic Agent with OpenAI Agents SDK
Let’s walk through the key parts of our KYC Agent.
The implementation is mostly within kyc_agent.py. The full source code and step-by-step instructions on how to run the agent are available on Github.
First, let’s define the agent’s core identity with suitable instructions.
import os
from agents import Agent, Runner, function_tool
# ... other imports
# Define the instructions for the agent
instructions = """You are a KYC analyst with access to a knowledge graph. Use the tools to answer questions about customers, accounts, and suspicious patterns.
You are also a Neo4j expert and can use the Neo4j MCP server to query the graph.
If you get a question about the KYC database that you can not answer with GraphRAG tools, you should
- use the Neo4j MCP server to fetch the schema of the graph (if needed)
- use the generate_cypher tool to generate a Cypher query from question and the schema
- use the Neo4j MCP server to query the graph to answer the question
"""
The instructions are crucial. They set the agent’s persona and provide a high-level strategy for how to approach problems, especially when a pre-defined tool doesn’t fit the user’s request.
Now, let’s start with a minimal agent. No tools. Just the instructions.
# Agent Definition, we will add tools later.
kyc_agent = Agent(
name="KYC Analyst",
instructions=instructions,
tools=[...], # We will populate this list
mcp_servers=[...] # And this one
)
Let’s add some tools to our KYC Agent
An agent is only as good as its tools. Let’s examine five tools we’re giving our KYC analyst.
Tool 1 & 2: Pre-defined Cypher Queries
For common and critical queries, it’s best to have optimized, pre-written Cypher queries wrapped in Python functions. You can use the @function_tool decorator from the OpenAI Agent SDK to make these functions available to the agent.
Tool 1: `find_customer_rings`
This tool is designed to detect recursive patterns characteristic of money laundering, specifically ‘circular transactions’ where funds cycle through multiple accounts to disguise their origin.
In KYC graph, this translates directly to finding cycles or paths that return to or near their starting point within a directed transaction graph. Implementing such detection involves complex graph traversal algorithms, often utilizing variable-length paths to explore connections up to a certain ‘hop’ distance.
The code snippet below shows a find_customer_rings function that executes a Cypher Query against the KYC database and returns up to 10 potential customer rings. For each rings, the following information is returned: the customers accounts and transactions involved in those rings.
@function_tool
def find_customer_rings(max_number_rings: int = 10, customer_in_watchlist: bool = True, ...):
"""
Detects circular transaction patterns (up to 6 hops) involving high-risk customers.
Finds account cycles where the accounts are owned by customers matching specified
risk criteria (watchlisted and/or PEP status).
Args:
max_number_rings: Maximum rings to return (default: 10)
customer_in_watchlist: Filter for watchlisted customers (default: True)
customer_is_pep: Filter for PEP customers (default: False)
customer_id: Specific customer to focus on (not implemented)
Returns:
dict: Contains ring paths and associated high-risk customers
"""
logger.info(f"TOOL: FIND_CUSTOMER_RINGS")
with driver.session() as session:
result = session.run(
f"""
MATCH p=(a:Account)-[:FROM|TO*6]->(a:Account)
WITH p, [n IN nodes(p) WHERE n:Account] AS accounts
UNWIND accounts AS acct
MATCH (cust:Customer)-[r:OWNS]->(acct)
WHERE cust.on_watchlist = $customer_in_watchlist
// ... more Cypher to collect results ...
""",
max_number_rings=max_number_rings,
customer_in_watchlist=customer_in_watchlist,
)
# ... Python code to process and return results ...
It is worth noting that the documentation string (doc string) is automatically used by OpenAI Agents SDK as the tool description! So good Python function documentation pays off!.
Tool 2: `get_customer_and_accounts`
A simple, yet essential, tool for retrieving a customer’s profile, including their accounts and most recent transactions. This is the bread-and-butter of any investigation. The code is similar to our previous tool – a function that takes a customer ID and wraps around a simple Cypher query.
Once again, the function is decorated with @function_tool to make it available to the agent.
The Cypher query wrapped by this Python is shown below
result = session.run(
"""
MATCH (c:Customer {id: $customer_id})-[o:OWNS]->(a:Account)
WITH c, a
CALL (c,a) {
MATCH (a)-[b:TO|FROM]->(t:Transaction)
ORDER BY t.timestamp DESC
LIMIT $tx_limit
RETURN collect(t) as transactions
}
RETURN c as customer, a as account, transactions
""",
customer_id=input.customer_id
)
A notable aspect of this tool’s design is the use of Pydantic to specify the function’s output. The OpenAI AgentsSDK uses Pydantic models returned by the function to automatically generate a text description of the output parameters.
If you look carefully, the function returns
return CustomerAccountsOutput(
customer=CustomerModel(**customer),
accounts=[AccountModel(**a) for a in accounts],
)
The CustomerModel and AccountModel include each of the properties returned for each Customer, its accounts and a list of recent transactions. You can see their definition in schemas.py.
Tools 3 & 4: Where Neo4j MCP Server meets Text-To-Cypher
This is where our KYC agent gets some more interesting powers.
A significant challenge in building versatile AI agents is enabling them to interact dynamically with complex data sources, beyond pre-defined, static functions. Agents need the ability to perform general-purpose querying where new insights might require spontaneous data exploration without requiring a priori Python wrappers for every possible action.
This section explores a common architectural pattern to address this. A tool to translate natural language question into Cypher coupled with another tool to allow dynamic query execution.
We demonstrate this mechanism using the Neo4 MCP Server to expose dynamic graph query execution and a Google Gemma3-4B fine-tuned model for Text-to-Cypher translation.
Tool 3: Adding the Neo4j MCP server toolset
For a robust agent to operate effectively with a knowledge graph, it needs to understand the graph’s structure and to execute Cypher queries. These capabilities enable the agent to introspect the data and execute dynamic ad-hoc queries.
The MCP Neo4j Cypher server provides the basic tools: get-neo4j-schema
(to retrieve graph schema dynamically), read-neo4j-cypher
(for executing arbitrary read queries), and write-neo4j-cypher
(for create, update, delete queries).
Fortunately, the OpenAI Agents SDK has support for MCP. The code snippet below shows how easy it is to add the Neo4j MCP Server to our KYC Agent.
# Tool 3: Neo4j MCP server setup
neo4j_mcp_server = MCPServerStdio(
params={
"command": "uvx",
"args": ["[email protected]"],
"env": {
"NEO4J_URI": NEO4J_URI,
"NEO4J_USERNAME": NEO4J_USER,
"NEO4J_PASSWORD": NEO4J_PASSWORD,
"NEO4J_DATABASE": NEO4J_DATABASE,
},
},
cache_tools_list=True,
name="Neo4j MCP Server",
)
You can learn more about how MCP is supported in OpenAI Agents SDK here.
Tool 4: A Text-To-Cypher Tool
The ability to dynamically translate natural language into powerful graph queries often relies on specialized Large Language Models (LLMs) – finetuned with schema-aware query generation.
We can use open weights, publicly available Text-to-Cypher models available on Huggingface, such as neo4j/text-to-cypher-Gemma-3-4B-Instruct-2025.04.0. This model was specifically finetuned to generate accurate Cypher queries from user question and a schema.
In order to run this model on a local device, we can turn to Ollama. Using Llama.cpp, it is relatively straightforward to convert any HuggingFace models to GGUF format, which is required to run a model in Ollama. Using the ‘convert-hf-to-GGUF’ python script, I generated a GGUF version of the Gemma3-4B finetuned model and uploaded it to Ollama.
If you are an Ollama user, you can download this model to your local device with:
ollama pull ed-neo4j/t2c-gemma3-4b-it-q8_0-35k
What happens when a user asks a question that doesn’t match any of our pre-defined tools?
For example, “For customer CUST_00001, find his addresses and check if they are shared with other customers”
Instead of failing, our agent can generate a Cypher query on the fly…
@function_tool
async def generate_cypher(request: GenerateCypherRequest) -> str:
"""
Generate a Cypher query from natural language using a local finetuned text2cypher Ollama model
"""
USER_INSTRUCTION = """...""" # Detailed prompt instructions
user_message = USER_INSTRUCTION.format(
schema=request.database_schema,
question=request.question
)
# Generate Cypher query using the text2cypher model
model: str = "ed-neo4j/t2c-gemma3-4b-it-q8_0-35k"
response = await chat(
model=model,
messages=[{"role": "user", "content": user_message}]
)
return response['message']['content']
The generate_cypher
tool addresses the challenge of Cypher query generation, but how does the agent know when to use this tool? The answer lies in the agent instructions.
You may remember that at the start of the blog, we defined the instructions for the agent as follows:
instructions = """You are a KYC analyst with access to a knowledge graph. Use the tools to answer questions about customers, accounts, and suspicious patterns.
You are also a Neo4j expert and can use the Neo4j MCP server to query the graph.
If you get a question about the KYC database that you can not answer with GraphRAG tools, you should
- use the Neo4j MCP server to get the schema of the graph (if needed)
- use the generate_cypher tool to generate a Cypher query from question and the schema
- use the Neo4j MCP server to query the graph to answer the question
"""
This time, note the specific instructions to handle ad-hoc queries that can not be answered by the graph retrieval based tools.
When the agent goes down this path, it goes through following steps:
- The agent gets a novel question.
- It first calls `neo4j-mcp-server.get-neo4j-schema` to get the schema of the database.
- It then feeds the schema and the user’s question to the `generate_cypher` tool. This will generate a Cypher query.
- Finally, it takes the generated Cypher query and run it using `neo4j-mcp-server.read-neo4j-cypher`.
If there are errors, in either the cypher generation or the execution of the cypher, the agent retries to generate Cypher and rerun it.
As you can see, the above approach is not bullet-proof. It relies heavily on the Text-To-Cypher model to produce valid and correct Cypher. In most cases, it works. However, in cases where it doesn’t, you should consider:
- Defining explicit Cypher retrieval tools for this type of questions.
- Adding some form of end user feedback (thumbs up / down) in your UI/UX. This will help flag questions that the agent is struggling with. You can then decide best approach to handle this class of questions. (e.g cypher retrieval tool, better instructions, improvement to text2cypher model, guardrails or just get your agent to politely decline to answer the question).
Tool 5 – Adding Memory to the KYC Agent
The topic of agent memory is getting lots of attention lately.
While agents inherently manage short-term memory through conversational history, complex, multi-session tasks like financial investigations demand a more persistent and evolving long-term memory.
This long-term memory isn’t just a log of past interactions; it’s a dynamic knowledge base that can accumulate insights, track ongoing investigations, and provide context across different sessions and even different agents.
The create_memory
tool implements a form of explicit knowledge graph memory, where summaries of investigations are stored as dedicated nodes and explicitly linked to relevant entities (customers, accounts, transactions).
@function_tool
def create_memory(content: str, customer_ids: list[str] = [], account_ids: list[str] = [], transaction_ids: list[str] = []) -> str:
"""
Create a Memory node and link it to specified customers, accounts, and transactions
"""
logger.info(f"TOOL: CREATE_MEMORY")
with driver.session() as session:
result = session.run(
"""
CREATE (m:Memory {content: $content, created_at: datetime()})
WITH m
UNWIND $customer_ids as cid
MATCH (c:Customer {id: cid})
MERGE (m)-[:FOR_CUSTOMER]->(c)
WITH m
UNWIND $account_ids as aid
MATCH (a:Account {id: aid})
MERGE (m)-[:FOR_ACCOUNT]->(a)
WITH m
UNWIND $transaction_ids as tid
MATCH (t:Transaction {id: tid})
MERGE (m)-[:FOR_TRANSACTION]->(t)
RETURN m.content as content
""",
content=content,
customer_ids=customer_ids,
account_ids=account_ids,
transaction_ids=transaction_ids
# ...
)
Additional considerations for implementing “agent memory” include:
- Memory Architectures: Exploring different types of memory (episodic, semantic, procedural) and their common implementations (vector databases for semantic search, relational databases, or knowledge graphs for structured insights).
- Contextualization: How the knowledge graph structure allows for rich contextualization of memories, enabling powerful retrieval based on relationships and patterns, rather than just keyword matching.
- Update and Retrieval Strategies: How memories are updated over time (e.g., appended, summarized, refined) and how they are retrieved by the agent (e.g., through graph traversal, semantic similarity, or fixed rules).
- Challenges: The complexities of managing memory consistency, handling conflicting information, preventing ‘hallucinations’ in memory retrieval, and ensuring the memory remains relevant and up-to-date without becoming overly large or noisy.”
This is an area of active development and rapidly evolving with many frameworks addressing some of the considerations above.
Putting it all together – An Example Investigation
Let’s see how our agent handles a typical workflow. You can run this yourself (or feel free to follow along step-by-step instructions on the KYC agent github repo)
1. “Get me the schema of the database“
- Agent Action: The agent identifies this as a schema query and uses the Neo4j MCP Server’s `get-neo4j-schema` tool.
2. “Show me 5 watchlisted customers involved in suspicious rings“
- Agent Action: This directly matches the purpose of our custom tool. The agent calls `find_customer_rings` with `customer_in_watchlist=True`.
3. “For each of these customers, find their addresses and find out if they are shared with other customers“.
- Agent Action: This is a question that can’t be answered with any of the GraphRAG tools. The agent should follow its instructions:
- It already has the schema (from our first interaction above).
- It calls `generate_cypher` with the question and schema. The tool returns a Cypher query that tries to answer the investigator’s question.
- It executes this Cypher query using the Neo4j MCP Cypher Server `read-neo4j-cypher` tool.
4. “For the customer whose address is shared , can you get me more details“
- Agent Action: The agent determines that the `get_customer_and_accounts` tool is the perfect fit and calls it with the customer’s ID.
5. “Write a 300-word summary of this investigation. Store it as a memory. Make sure to link it to every account and transaction belonging to this customer“.
- Agent Action: The agent first uses its internal LLM capabilities to generate the summary. Then, it calls the `create_memory` tool, passing the summary text and the list of all customer, account, and transaction IDs it has encountered during the conversation.
Key Takeaways
If you got this far, I hope you enjoyed the journey of getting familiar with a basic implementation of a KYC GraphRAG Agent. Lots of cool technologies here: OpenAI Agent SDK, MCP, Neo4j, Ollama and a Gemma3-4B finetuned Text-To-Cypher model!
I hope you gained some appreciation for:
- GraphRAG, or more specifically Graph-powered data retrieval as an essential for connected-data problems. It allows agents to answer questions on heavily connected data that would be impossible to answer with standard RAG.
- The importance of a balanced toolkit is powerful. Combine MCP Server tools with your own optimized tools.
- MCP Servers are a game-changer. They allow you to connect your agents to an increasing set of MCP servers.
- Experiment with more MCP Servers so you get a better sense of the possibilities.
- Agents should be able to write back to your data store in a controlled way.
- In our example we saw how an analyst can persist its findings (e.g., adding Memory nodes to the knowlege graph) and in the process creating a virtuous cycle where the agent improves the underlying knowledge base for entire teams of investigators.
- The agent adds information to the knowledge graph and it never updates or deletes existing information.
The patterns and tools discussed here are not limited to KYC. They can be applied to supply chain analysis, digital twin management, drug discovery, and any other domain where the relationships between data points are as important as the data itself.
The era of graph-aware AI agents is here.
What’s Next?
You have built a simple AI agent on top of OpenAI Agents SDK with MCP, Neo4j and a Text-to-Cypher model. All running on a single device.
While this initial agent provides a strong foundation, transitioning to a production-level system involves addressing several additional requirements, such as:
- Agent UI/UX: This is the central part for your users to interact with your agent. This will ultimately be a key driver of the adoption and success of your agent.
Long running tasks and multiagent systems: Some tasks are valuable but take a significant amount of time to run. In these cases, agents should be able to offload parts of their workload to other agents.- OpenAI does provide some support for handing off to subagents but it might not be suitable for long-running agents.
- Agent Guardrails – OpenAI Agents SDK provides some support for Guardrails.
- Agent Hosting – It exposes your agent to your users.
- Securing comms to your agent – End user authentication and authorization to your agent.
- Database access controls – Managing access control to the data stored in the KYC Knowledge Graph.
- Conversation History.
- Agent Observability.
- Agent Memory.
- Agent Evaluation – What is the impact of changing agent instruction and or adding/removing a tool?.
- And more…
In the meantime, I hope this has inspired you to keep learning and experimenting!.