MAS is all you need: supercharge your RAG with a Multi-Agent System

AG2 (formerly known as AutoGen) is an innovative open-source programming framework designed to facilitate the development of AI agents and enhance collaboration among multiple agents to tackle complex tasks. Its primary goal is to simplify the creation and research of agentic AI. While the official AG2 website claims that the framework is ready to “build production-ready multi-agent systems in minutes,” I personally believe that there is still some work needed before it can be considered fully production-ready. However, it is undeniable that AG2 provides a very user-friendly environment for creating experiments aimed at research. It is important to emphasize that there are many other frameworks available for creating multi-agent systems. For example: Letta, LangGraph, CrewAI, etc.

In this tutorial we are going to implement a MAS with:

Human → a proxy for human input.
Agent Ingestion → responsible for ingesting information from text files or directly from text inputs.
Agent Retrieve → responsible for extracting relevant information from the internal database to assist other agents in answering user questions.
Agent Answer → responsible for providing answers to user queries using information retrieved by the Agent Ingestion.
Agent Router → responsible for facilitating communication between the human user and other agents.

Human will interact only with Agent Router which will be responsible of an internal chat group that includes Agent Retrieve, Agent Answer and Agent Ingestion. Agents inside the chat group collaborate with their knowledge and tools to provide the best answer possible.

# Agents' TopologyHuman <-> Agent Router <-> [Agent Ingestion, Agent Retrieve, Agent Answer]

The complete code for the MA-RAG (Multi-Agent Retrieval-Augmented Generation) system can be found in the mas.py file. In this section, we will discuss some key components and features of the code that are particularly noteworthy.

Agents Definition

To define an agent in AG2, we use the ConversableAgent() class. For instance, to define the Agent Ingestion:

agent_ingestion = ConversableAgent(
name = "agent_ingestion",
system_message = SYSTEM_PROMPT_AGENT_INGESTION,
description = DESCRIPTION_AGENT_INGESTION,
llm_config = llm_config,
human_input_mode = "NEVER",
silent=False
)

ee specify:

a name (agent_ingestion);
the system prompt that defines the agent (SYSTEM_PROMPT_AGENT_INGESTION is a variable defined in prompts.py);

SYSTEM_PROMPT_AGENT_INGESTION = '''You are the **Ingestion Agent** tasked with acquiring new knowledge from various sources. Your primary responsibility is to ingest information from text files or directly from text inputs. 
### Key Guidelines:
- **No New Information**: You do not contribute new information to conversations; your role is strictly to ingest and store knowledge.
- **Evaluation of Information**: Before ingesting any new knowledge, carefully assess whether the information provided is genuinely novel and relevant.
- **Step-by-Step Approach**: Take a moment to reflect and approach each task methodically. Breathe deeply and focus on the process.
### Tools Available:
1. **`path_to_db()`**: Use this tool to ingest knowledge from a specified text file.
2. **`text_to_db()`**: Utilize this tool to ingest knowledge directly from provided text.
Your mission is to enhance the database with accurate and relevant information while ensuring that you adhere to the guidelines above.
'''

the description that will help during the routing of messages (DESCRIPTION_AGENT_INGESTION is a variable defined in prompts.py);

DESCRIPTION_AGENT_INGESTION = '''I am the **Ingestion Agent** responsible for acquiring new knowledge from text files or directly from user-provided text. 
'''

the configuration for LLM;

llm_config = {
"config_list": [
{
"model": "gpt-4o-mini", 
"api_key": os.environ["OPENAI_API_KEY"], 
"temperature": 0.7,
}
]
}

whether to ask for human inputs every time a message is received (by setting human_input_mode = “NEVER” the agent will never prompt for human input);
whether to not print the message sent.

Similarly, we can define all other agents (human, agent_retrieve, agent_answer, agent_router).

Adding Tools

So far, we have defined various agents; however, as they are currently configured, these agents can only receive text inputs and respond with text outputs. They are not equipped to perform more complex tasks that require specific tools. For instance, an agent in its current state cannot access the database we created in the first part of this tutorial to conduct searches.

To enable this functionality, we need to “tell” the agent that it has access to a tool capable of performing certain tasks. Our preference for implementing a tool deterministically, rather than asking the agent to figure it out on its own, is based on efficiency and reliability. A deterministic approach reduces the likelihood of errors, as the process can be clearly defined and coded. Nevertheless, we will still give the agent the responsibility and autonomy to select which tool to use, determine the parameters for its use, and decide how to combine multiple tools to address complex requests. This balance between guidance and autonomy will enhance the agent’s capabilities while maintaining a structured approach.

I hope it is clear by now that, contrary to the claims made by many non-experts who suggest that agents are “so intelligent” that they can effortlessly handle complex tasks, there is actually a significant amount of work happening behind the scenes. The foundational tools that agents rely on require careful study, implementation, and testing. Nothing occurs “automagically,” even in the realm of generative AI. Understanding this distinction is crucial for appreciating the complexity and effort involved in developing effective AI systems. While these agents can perform impressive tasks, their capabilities are the result of meticulous engineering and thoughtful design rather than innate intelligence.

Remember the functions text_to_db() and path_to_db() we created before for the ingestion? We can “register” them to Agent Ingestion in this way:

register_function(
path_to_db,
caller=agent_ingestion,  
executor=agent_ingestion,  
name="path_to_db",  
description="Ingest new knowledge from a text file given its path.", 
)register_function(
text_to_db,
caller=agent_ingestion,  
executor=agent_ingestion,  
name="text_to_db",  
description="Ingest new knowledge from a piece of conversation.",
)

Similarly, we can add the retrieve tool to Agent Retrieve:

register_function(
retrieve_str,
caller=agent_retrieve,  
executor=agent_retrieve,  
name="retrieve_str",  
description="Retrieve useful information from internal DB.",
)

MAS Topology

So far, we have defined each agent, their roles, and the tools they can utilize. What remains is how these agents are organized and how they communicate with one another. We aim to create a topology in which the Human interacts with the Agent Router, which then participates in a nested chat group with other agents. This group collaborates to address the human query, autonomously determining the order of operations, selecting the appropriate tools, and formulating responses. In this setup, the Agent Router acts as a central coordinator that directs the flow of information among the agents (Agent Ingestion, Agent Retrieve, and Agent Answer). Each agent has a specific function: Agent Ingestion processes incoming data, Agent Retrieve accesses relevant information from the database, and Agent Answer proposes the final response based on the gathered insights.

To create a group chat, we can use the GroupChat() class.

group_chat = GroupChat(
agents = [
agent_router,
agent_ingestion, 
agent_retrieve, 
agent_answer
],
messages=[],
send_introductions=False,
max_round=10,
speaker_selection_method="auto", 
speaker_transitions_type="allowed", 
allowed_or_disallowed_speaker_transitions={
agent_router: [agent_ingestion, agent_retrieve, agent_answer],
agent_ingestion: [agent_router],
agent_retrieve: [agent_answer],
agent_answer: [agent_router],
}, 
)

In this instantiation, we list the agents that will be part of the group (agents), decide that they don’t need to introduce themselves at the beginning of the chat (send_introductions), set the max rounds of conversation to 10 (max_round), delegate the selection of the speaker at each round to the chat manager (speaker_selection_method), and constrain the conversation transitions to a particular scheme (allowed_or_disallowed_speaker_transitions).

Created the group, we need a group manager that manage the order of conversation:

group_chat_manager = GroupChatManager(
groupchat=group_chat,
llm_config=llm_config, 
silent=False, 
is_termination_msg=lambda msg: "(to human)" in msg["content"].lower()
)

It is important to note the lambda function used for the is_termination_msg parameter. This function determines when the chat should terminate by checking if the last message contains the substring “(to human).” This mechanism is crucial because, in the system prompt for the Agent Router, it specifies: “Clearly indicate your message’s intended recipient. For example, use (to human) when addressing the user.” This approach provides a clear signal for when to exit the nested chat and return a response to the human user.

Now, we need to make group chat we have just created a nested chat that stats from Agent Router.

nested_chats = [
{
"recipient": group_chat_manager,
"summary_method": "last_msg",
}
]agent_router.register_nested_chats(
nested_chats,
trigger=lambda sender: sender in [human],
)

By leveraging a structured communication framework and predefined transitions between agents, we ensure efficient collaboration between agents and in the same time allow flexibility in decision-making.

Let’s start chatting

We are really ready now. To start chatting with Agent Router:

chat_results = human.initiate_chat(
agent_router, 
message=input("Ciao! How can I assist you today? 😊 : "), 
max_turns = 100
)

Or, we can run the Python script directly with:

python mas.py

and you should see something like

$ python mas.py                                                                                     [conda:mas]
Ciao! How can I assist you today? 😊 :

just type your message and you will receive the answer from Agent Router.

Here you can find an example of conversation. The chat started with the question

[Human]

Is there a university in Philadelphia, and was it established before the University of California, Berkeley?

and the answer given by Agent Router was

[Agent Router]

Yes, there is a university in Philadelphia known as the **University of Pennsylvania** (commonly referred to as Penn or UPenn). It is a private Ivy League research university and was chartered prior to the U.S. Declaration of Independence, making it one of the oldest institutions of higher education in the United States. The university first convened its board of trustees in 1749, which positions it as the fourth-oldest institution, though some argue it may be the fifth-oldest due to the establishment dates of other institutions. 📚

On the other hand, the **University of California, Berkeley** was founded in **1868**.

Given these dates, the University of Pennsylvania was established significantly earlier than the University of California, Berkeley. 🎓

**Sources**:
1. University of Pennsylvania Overview (last updated: 12 January 2025)
2. University of California, Berkeley Overview (last updated: 12 January 2025)

Note that the direct answer to the initial question was not provided in the text files we ingested at the beginning. In fact, if we look at the log of the nested chat at line 74 we can see that inside the nested chat Agent Retrieve, autonomously, makes two different queries with the retrieve tool querying information about University of Philadelphia and UC Berkeley, and then Agent Answer proposes the solution with the retrieved chunks.

A similar mechanism is involved also when we add a new data source (remember the file about Turin we left out previously?):

[Human]

What do you know about the University of Turin?

[Agent Router]

🤔 It seems I couldn’t find specific information about the University of Turin. The details I retrieved were mainly about other universities like the University of Barcelona and UC Berkeley. If there’s a particular aspect of the University of Turin you’re curious about, or if you’d like information on something else, just let me know! 😊

[Human]

Information about Turin can be found in kb/cities/europe/turin.txt

[Agent Router]

Here is some information about Turin:

### Overview

**Location**: Turin (Italian: Torino) is a city in Northern Italy, …

[…]

In the background, what happened was that during the first interaction in the nested chat, Agent Answer realized that none of the retrieved information pertained to Turin (line 163). However, the second time, when the agents detected a new source of information, Agent Ingestion was called upon (line 179) to process the new data and save it in the database. This dynamic illustrates the collaborative nature of the agents within the system. Initially, Agent Answer’s inability to find relevant information highlights the importance of thorough data retrieval processes. Once a new source was identified, Agent Ingestion swiftly stepped in to ensure that valuable information was incorporated into the database, thereby enhancing the agents’ ability to respond effectively in future interactions.

You can get more examples in the log file where we tested how the system is resilient to external contradictory statements (line 34) and how new information directly from the conversation is saved by Agent Ingestion (lines 54, 352).

Beyond Toy MA-RAG

We have explored how to build a RAG system based on a Multi-Agent paradigm. What we presented is, of course, a simplification of how such a system needs to function in a production environment. We intentionally left out many important aspects (such as guardrails, token consumption, chat interface design, authentication, etc.) and there are numerous areas that require significant improvement. For instance, a complete pipeline for data ingestion and knowledge base updates is essential, as well as enhancing information retrieval methods that could leverage graph-based approaches rather than relying solely on embedding similarity. Moreover, the topology of the agents can be as complex as desired. For example, multiple chat groups could be created, each specialized in a particular aspect of the overall pipeline. Additionally, we could introduce oversight/judge roles to critically assess proposed plans and solutions. The possibilities are virtually limitless, and finding the right solution for a specific use case is often a form of art itself.

The rapid rise in popularity of MAS certainly has elements of a bubble, but it is also driven by the potential of such systems to tackle complex tasks that were previously unimaginable. Currently, we are still in a preliminary phase of this technology, even though platforms are emerging to facilitate the creation of MAS. Reflecting on this tutorial, it is evident that, in addition to the capabilities of LLMs, the management of the knowledge base is fundamentally important for a RAG system, even when enhanced by a MAS.

Moreover, while MAS unlocks new capabilities, it also introduces complexities in programming such systems. As we increase the number of agents linearly, the number of interactions between them can potentially grow quadratically. With each interaction comes the risk of ambiguities and inefficiencies that may propagate into subsequent interactions. In summary, there are numerous opportunities but also significant new risks. What we can do is strive to understand these systems deeply to be prepared for their challenges and possibilities.

MAS is all you need: supercharge your RAG with a Multi-Agent System

Agents Definition

Adding Tools

MAS Topology

Let’s start chatting

Beyond Toy MA-RAG

Related Posts

Are You Being Unfair to LLMs?

Worried About AI? Use It to Your Advantage

Leave a Reply Cancel reply