Build Multi-Agent Apps with OpenAI’s Agent SDK

of abstraction built on top of fundamentally simple ideas, some agent framework devs seem to believe complexity is a virtue.

I tend to go along with Einstein’s maxim, “Everything should be made as simple as possible, but not simpler”. So, let me show you a framework that is easy to use and easy to understand.

OpenAI takes a refreshingly different approach to other framework developers : they don’t try to be clever, they try to be clear.

In this article, I’ll show how you can build multi-agent apps using OpenAI’s open-source SDK.

We’ll see how to construct a simple single-agent app and then go on to explore multi-agent configurations. We’ll cover tool-calling, linear and hierarchical configurations, handoffs from one agent to another and using agents as tools.

Specifically, we will see the following examples:

A simple call to an agent
A tool-using agent
Handoffs from one agent to another
Handoffs to multiple agents
Using agents as tools
Hierarchical agent orchestration using agents as tools

The agent SDK

The agent SDK is based on a handful of concepts essential to agentic and multi-agent systems and builds a framework around them — it replaces Swarm, an educational framework developed by OpenAI, where those concepts were identified and implemented. The Agent SDK builds upon and expands Swarm while maintaining its founding principles of being lightweight and simple.

Simple it may be, but you can construct sophisticated agent-based systems with this framework where agents use tools (which can be other agents), hand off to other agents, and can be orchestrated in any number of clever ways.

Installation is via pip, or your preferred package management tool, and the package is called openai-agents. I favour UV, so to start a new project, I would do something like the following.

uv init agentTest
cd agentTest
uv add openai-agents

A simple call to an agent

A simple agent call is shown in the diagram below.

This is a data flow diagram that shows the running agent as a process with data flowing in and out. The flow that starts the process is the user prompt; the agent makes one or more calls to the LLM and receives responses. When it has completed its task, it outputs the agent response.

Below we see the code for a basic program that uses the SDK to implement this flow. It instantiates an agent, gives it a name and some instructions; it then runs it and prints the result. It’s similar to the first example from OpenAI’s documentation, but here we will create a Streamlit app.

First, we import the libraries.

import streamlit as st
import asyncio from agents 
import Agent, Runner

We need the Streamlit package, of course, and asyncio because we will use its functionality to wait for the agent to complete before proceeding. Next, we import the minimum from the agents package, Agent (to create an agent) and Runner (to run the agent).

Below, we define the code to create and run the agent.

agent = Agent(name="Assistant", instructions="You are a helpful assistant")

async def run_agent(input_string):
    result = await Runner.run(agent, input_string)
    return result.final_output

This code uses the default model from OpenAI (and assumes you have a valid API key set as an environment variable – and you will, of course, be charged , but for our purposes here, it won’t be much. I’ve only spent a few tens of cents on this).

First, we instantiate an agent called “Assistant” with some simple instructions, then we define an asynchronous function that will run it with a string (the query) provided by the user.

The run function is asynchronous; we need to wait for the LLM to complete before we continue, and so we will run the function using asyncio.

We define the user interface with Streamlit functions.

st.title("Simple Agent SDK Query")

user_input = st.text_input("Enter a query and press 'Send':")

st.write("Response:")
response_container = st.container(height=300, border=True)

if st.button("Send"):
    response = asyncio.run(run_agent(user_input))
    with response_container:
        st.markdown(response)

This is mostly self-explanatory. The user is prompted to enter a query and press the ‘Send’ button. When the button is pressed run_agent is run via a call to asyncio.run. The result is displayed in a scrollable container. Below is a screenshot of a sample run.

Your result may differ (LLMs are renowned for not giving the same answer twice).

To define an agent, give it a name and some instructions. Running is also straightforward; pass in the agent and a query. Running the agent starts a loop that completes when a final answer is reached. This example is simple and does not need to run through the loop more than once, but an agent that calls tools might need to go through several iterations before an answer is finalised.

The result is easily displayed. As we can see, it is the final_output attribute of the value that is returned from the Runner.

This program uses default values for several parameters that could be set manually, such as the model name and the temperature setting for the LLM. The Agent SDK also uses the Responses API by default. That’s an OpenAI-only API (so far, at least), so if you need to use the SDK with another LLM, you have to switch to the more widely supported Chat Completions API.

from agents import set_default_openai_api
set_default_openai_api("chat_completions")

Initially, and for simplicity, we’ll use the default Response API.

A tool-using agent

Agents can use tools, and the agent, in conjunction with the LLM, decides which tools, if any, it needs to use.

Here is a data flow diagram that shows a tool-using agent.

It is similar to the simple agent, but we can see an additional process, the tool, that the agent utilises. When the agent makes a call to the LLM, the response will indicate whether or not a tool needs to be used. If it does, then the agent will make that call and submit the result back to the LLM. Again, the response from the LLM will indicate whether another tool call is necessary. The agent will continue this loop until the LLM no longer requires the input from a tool. At this point, the agent can respond to the user.

Below is the code for a single agent using a single tool.

The program consists of four parts:

The imports from the Agents library and wikipedia (which will be used as a tool).
The definition of a tool — this is simply a function with the @function_tool decorator.
The definition of the agent that uses the tool.
Running the agent and printing the result in a Streamlit app, as before.

import streamlit as st
import asyncio
from agents import Agent, Runner, function_tool
import wikipedia

@function_tool
def wikipedia_lookup(q: str) -> str:
    """Look up a query in Wikipedia and return the result"""
    return wikipedia.page(q).summary

research_agent = Agent(
    name="Research agent",
    instructions="""You research topics using Wikipedia and report on 
                    the results. """,
    model="o4-mini",
    tools=[wikipedia_lookup],
)

async def run_agent(input_string):
    result = await Runner.run(research_agent, input_string)
    return result.final_output

# Streamlit UI

st.title("Simple Tool-using Agent")
st.write("This agent uses Wikipedia to look up information.")

user_input = st.text_input("Enter a query and press 'Send':")

st.write("Response:")
response_container = st.container(height=300, border=True)

if st.button("Send"):
    response = asyncio.run(run_agent(user_input))
    with response_container:
        st.markdown(response)

The tool looks up a Wikipedia page and returns a summary via a standard call to a library function. Note that we’ve used type hints and a docstring to describe the function so the agent can work out how to use it.

Next is the definition of the agent, and here we see that there are more parameters than earlier: we specify the model that we want to use and a list of tools (there’s only one in this list).

Running and printing the result is as before, and it dutifully returns an answer (the height of the Eiffel Tower).

That is a simple test of the tool-using agent, which only requires a single lookup. A more complex query may use a tool more than once to collect the information.

For example, I asked, “Find the name of the famous tower in Paris, find its height and then find the date of birth of its creator“. This required two tool calls, one to get information about the Eiffel Tower and the second to find when Gustav Eiffel was born.

This process is not reflected in the final output, but we can see the stages that the agent went through by viewing the raw messages in the agent’s result. I printed result.raw_messages for the query above, and the result is shown below.

[
0:"ModelResponse(output=[ResponseReasoningItem(id='rs_6849968a438081a2b2fda44aa5bc775e073e3026529570c1', summary=[], type='reasoning', status=None), 
ResponseFunctionToolCall(arguments='{"q":"Eiffel Tower"}', call_id='call_w1iL6fHcVqbPFE1kAuCGPFok', name='wikipedia_lookup', type='function_call', id='fc_6849968c0c4481a29a1b6c0ad80fba54073e3026529570c1', status='completed')], usage=Usage(requests=1, input_tokens=111, output_tokens=214, total_tokens=325), 
response_id='resp_68499689c60881a2af6411d137c13d82073e3026529570c1')"

1:"ModelResponse(output=[ResponseReasoningItem(id='rs_6849968e00ec81a280bf53dcd30842b1073e3026529570c1', summary=[], type='reasoning', status=None), 
ResponseFunctionToolCall(arguments='{"q":"Gustave Eiffel"}', call_id='call_DfYTuEjjBMulsRNeCZaqvV8w', name='wikipedia_lookup', type='function_call', id='fc_6849968e74ac81a298dc17d8be4012a7073e3026529570c1', status='completed')], usage=Usage(requests=1, input_tokens=940, output_tokens=23, total_tokens=963), 
response_id='resp_6849968d7c3081a2acd7b837cfee5672073e3026529570c1')"

2:"ModelResponse(output=[ResponseReasoningItem(id='rs_68499690e33c81a2b0bda68a99380840073e3026529570c1', summary=[], type='reasoning', status=None), 
ResponseOutputMessage(id='msg_6849969221a081a28ede4c52ea34aa54073e3026529570c1', content=[ResponseOutputText(annotations=[], text='The famous tower in Paris is the Eiffel Tower.  n• Height: 330 metres (1,083 ft) tall  n• Creator: Alexandre Gustave Eiffel, born 15 December 1832', type='output_text')], role='assistant', status='completed', type='message')], usage=Usage(requests=1, input_tokens=1190, output_tokens=178, total_tokens=1368), 
response_id='resp_6849968ff15481a292939a6eed683216073e3026529570c1')"
]

You can see that there are three responses: the first two are the result of the two tool calls, and the last is the final output, which is generated from the information derived from the tool calls.

We’ll see tools again shortly when we use agents as tools, but now we are going to consider how we can use multiple agents that cooperate.

Multiple agents

Many agent applications only require a single agent, and these are already a long step beyond simple chat completions that you find in the LLM chat interfaces, such as ChatGPT. Agents run in loops and can use tools, making even a single agent pretty powerful. However, multiple agents working together can achieve even more complex behaviours.

In keeping with its simple philosophy, OpenAI doesn’t attempt to incorporate agent orchestration abstractions like some other frameworks. But despite its simple design, it supports the construction of both simple and complex configurations.

First, we’ll look at handoffs where one agent passes control to another. After that, we’ll see how agents can be combined hierarchically.

Handoffs

When an agent decides that it has completed its task and passes information to another agent for further work, that is termed a handoff.

There are two fundamental ways of achieving a handoff: with an agentic handoff, the entire message history is passed from one agent to another. It’s a bit like when you call the bank but the person you first speak to doesn’t know your particular circumstances, and so passes you on to someone who does. The difference is that, in the case of the AI agent, the new agent has a record of all that was said to the previous one.

The second method is a programmatic handoff. This is where only the required information provided by one agent is passed to another (via conventional programming methods).

Let’s look at programmatic handoffs first.

Programmatic handoffs

Sometimes the new agent doesn’t need to know the entire history of a transaction; perhaps only the final result is required. In this case, instead of a full handoff, you can arrange a programmatic handoff where only the relevant data is passed to the second agent.

The diagram shows a generic programmatic handoff between two agents.

Below is an example of this functionality, where one agent finds information about a topic and another takes that information and writes an article that’s suitable for kids.

To keep things simple, we won’t use our Wikipedia tool in this example; instead, we rely on the LLM’s knowledge.

import streamlit as st
import asyncio
from agents import Agent, Runner

writer_agent = Agent(
    name="Writer agent",
    instructions=f"""Re-write the article so that it is suitable for kids
                     aged around 8. Be enthusiastic about the topic -
                     everything is an adventure!""",
    model="o4-mini",
)

researcher_agent = Agent(
    name="Research agent",
    instructions=f"""You research topics and report on the results.""",
    model="o4-mini",
)

async def run_agent(input_string):
    result = await Runner.run(researcher_agent, input_string)
    result2 = await Runner.run(writer_agent, result.final_output)
    return result2

# Streamlit UI

st.title("Writer Agent")
st.write("Write stuff for kids.")

user_input = st.text_input("Enter a query and press 'Send':")

st.write("Response:")
response_container = st.container(height=300, border=True)

if st.button("Send"):
    response = asyncio.run(run_agent(user_input))
    with response_container:
        st.markdown(response.final_output)
    st.write(response)
    st.json(response.raw_responses)

In the code above, we define two agents: one researches a topic and the other produces text suitable for kids.

This technique doesn’t rely on any special SDK functions; it simply runs one agent, gets the output in result and uses it as the input for the next agent (output in result2). It’s just like using the output of one function as the input for the next in conventional programming. Indeed, that is precisely what it is.

Agentic handoffs

However, sometimes an agent needs to know the history of what occurred previously. That’s where the OpenAI Agents Handoffs come in.

Below is the data flow diagram that represents the Agentic Handoff. You will see that it is very similar to the Programmatic Handoff; the difference is the data being transferred to the second agent, and also, there is a possible output from the first agent when the handoff is not required.

The code is also similar to the previous example. I’ve tweaked the instructions slightly, but the main difference is the handoffs list in researcher_agent. This is not dissimilar to the way we declare tools.

The Research Agent has been allowed to hand off to the Kid’s Writer Agent when it has completed its work. The effect of this is that the Kid’s Writer Agent not only takes over control of the processing but also has knowledge of what the Research Agent did, as well as the original prompt.

However, there is another major difference. It is up to the agent to determine whether the handoff takes place or not. In the example run below, I have instructed the agent to write something suitable for kids, and so it hands off to the Kids’ Writer Agent. If I had not told it to do that, it would have simply returned the original text.

import streamlit as st
import asyncio
from agents import Agent, Runner

kids_writer_agent = Agent(
    name="Kids Writer Agent",
    instructions=f"""Re-write the article so that it is suitable for kids aged around 8. 
                     Be enthusiastic about the topic - everything is an adventure!""",
    model="o4-mini",
)

researcher_agent = Agent(
    name="Research agent",
    instructions=f"""Answer the query and report the results.""",
    model="o4-mini",
    handoffs = [kids_writer_agent]
)

async def run_agent(input_string):
    result = await Runner.run(researcher_agent, input_string)
    return result

# Streamlit UI

st.title("Writer Agent2")
st.write("Write stuff for kids.")

user_input = st.text_input("Enter a query and press 'Send':")

st.write("Response:")
response_container = st.container(height=300, border=True)

if st.button("Send"):
    response = asyncio.run(run_agent(user_input))
    with response_container:
        st.markdown(response.final_output)
    st.write(response)
    st.json(response.raw_responses)

It is not in the screenshot, but I’ve added code to output the response and the raw_responses so that you can see the handoff in operation if you run the code yourself.

Below is a screenshot of this agent.

An agent can have a list of handoffs at its disposal, and it will intelligently choose the correct agent (or none) to hand off to. You can see how this would be useful in a customer service situation where a difficult customer query might be escalated through a series of more expert agents, each of whom needs to be aware of the query history.

We’ll now look at how we can use handoffs that involve multiple agents.

Handoffs to multiple agents

We will now see a new version of the previous program where the Research Agent chooses to hand off to different agents depending on the reader’s age.

The agent’s job is to produce text for three audiences: adults, teenagers and kids. The Research Agent will gather information and then hand it off to one of three other agents. Here is the data flow (note that I have excluded the links to an LLM for clarity – each agent communicates with an LLM, but we can consider that as an internal function of the agent).

And here is the code.

import streamlit as st
import asyncio

from agents import Agent, Runner, handoff

adult_writer_agent = Agent(
    name="Adult Writer Agent",
    instructions=f"""Write the article based on the information given that it is suitable for adults interested in culture.
                    """, 
    model="o4-mini",
)

teen_writer_agent = Agent(
    name="Teen Writer Agent",
    instructions=f"""Write the article based on the information given that it is suitable for teenagers who want to have a cool time.
                    """, 
    model="o4-mini",
)

kid_writer_agent = Agent(
    name="Kid Writer Agent",
    instructions=f"""Write the article based on the information given that it is suitable for kids of around 8 years old. 
                    Be enthusiastic!
                    """, 
    model="o4-mini",
)

researcher_agent = Agent(
    name="Research agent",
    instructions=f"""Find information on the topic(s) given.""",

    model="o4-mini",
    handoffs = [kid_writer_agent, teen_writer_agent, adult_writer_agent]
)

async def run_agent(input_string):
    result = await Runner.run(researcher_agent, input_string)
    return result

# Streamlit UI

st.title("Writer Agent3")
st.write("Write stuff for adults, teenagers or kids.")

user_input = st.text_input("Enter a query and press 'Send':")

st.write("Response:")
response_container = st.container(height=300, border=True)

if st.button("Send"):
    response = asyncio.run(run_agent(user_input))
    with response_container:
        st.markdown(response.final_output)
    st.write(response)
    st.json(response.raw_responses)

The program’s structure is similar, but now we have a set of agents to hand off to and a list of them in the Research Agent. The instructions in the various agents are self-explanatory, and the program will correctly respond to a prompt such as “Write an essay about Paris, France for kids” or “…for teenagers” or “…for adults”. The Research Agent will correctly choose the appropriate Writer Agent for the task.

The screenshot below shows an example of writing for teenagers.

The prompts provided in this example are simple. More sophisticated prompts would likely yield a better and more consistent result, but the aim here is to show the techniques rather than to build a clever app.

That is one type of collaboration; another is to use other agents as tools. This is not too dissimilar to the programmatic handoff we saw earlier.

Agents as tools

Running an agent is calling a function in the same way as calling a tool. So why not use agents as intelligent tools?

Instead of giving control over to a new agent, we use it as a function that we pass information to and get information back from.

Below is a data flow diagram that illustrates the idea. Unlike a handoff, the main agent doesn’t pass overall control to another agent; instead, it intelligently chooses to call an agent as if it were a tool. The called agent does its job and then passes control back to the calling agent. Again, the data flows to an LLM have been omitted for clarity.

Below is a screenshot of a modified version of the previous program. We changed the nature of the app a little. The main agent is now a travel agent; it expects the user to give it a destination and the age group for which it should write. The UI is changed so that the age group is selected via a radio button. The text input field should be a destination.

A number of modifications have been made to the logic of the app. The UI changes the way the information is input, and this is reflected in the way that the prompt is constructed – we use an f-string to incorporate the two pieces of data into the prompt.

Additionally, we now have an extra agent that formats the text. The other agents are similar (but note that the prompts have been refined), and we also use a structured output to ensure that the text that we output is precisely what we expect.

Fundamentally, though, we see that the writer agents and the formatting agent are specified as tools in the researcher agent.

import streamlit as st
import asyncio
from agents import Agent, Runner, function_tool
from pydantic import BaseModel

class PRArticle(BaseModel):
    article_text: str
    commentary: str

adult_writer_agent = Agent(
    name="Adult Writer Agent",
    instructions="""Write the article based on the information given that it is suitable for adults interested in culture. 
                    Be mature.""", 
    model="gpt-4o",
)

teen_writer_agent = Agent(
    name="Teen Writer Agent",
    instructions="""Write the article based on the information given that it is suitable for teenagers who want to have a good time. 
                    Be cool!""", 
    model="gpt-4o",
)

kid_writer_agent = Agent(
    name="Kid Writer Agent",
    instructions="""Write the article based on the information given that it is suitable for kids of around 8 years old. 
                    Be enthusiastic!""", 
    model="gpt-4o",
)

format_agent = Agent(
    name="Format Agent",
    instructions=f"""Edit the article to add a title and subtitles and ensure the text is formatted as Markdown. Return only the text of article.""", 
    model="gpt-4o",
)

researcher_agent = Agent(
    name="Research agent",
    instructions="""You are a Travel Agent who will find useful information for your customers of all ages.
                    Find information on the destination(s) given. 
                    When you have a result send it to the appropriate writer agent to produce a short PR text.
                    When you have the result send it to the Format agent for final processing.
                    """,
    model="gpt-4o",
    tools = [kid_writer_agent.as_tool(
                tool_name="kids_article_writer",
                tool_description="Write an essay for kids",), 
            teen_writer_agent.as_tool(
                tool_name="teen_article_writer",
                tool_description="Write an essay for teens",), 
            adult_writer_agent.as_tool(
                tool_name="adult_article_writer",
                tool_description="Write an essay for adults",),
            format_agent.as_tool(
                tool_name="format_article",
                tool_description="Add titles and subtitles and format as Markdown",
        ),],
    output_type = PRArticle
)

async def run_agent(input_string):
    result = await Runner.run(researcher_agent, input_string)
    return result

# Streamlit UI

st.title("Travel Agent")
st.write("The travel agent will write about destinations for different audiences.")

destination = st.text_input("Enter a destination, select the age group and press 'Send':")
age_group = st.radio(
    "What age group is the reader?",
    ["Adult", "Teenager", "Child"],
    horizontal=True,
)

st.write("Response:")
response_container = st.container(height=500, border=True)

if st.button("Send"):
    response = asyncio.run(run_agent(f"The destination is {destination} and reader the age group is {age_group}"))
    with response_container:
        st.markdown(response.final_output.article_text)
    st.write(response)
    st.json(response.raw_responses)

The tools list is a little different to the one we saw earlier:

The tool name is the agent name plus .agent_as_tool(), a method that makes the agent compatible with other tools .
The tool needs a couple of parameters — a name and a description.

One other addition, which is very useful, is the use of structured outputs, as mentioned above. This separates the text that we want from any other commentary that the LLM might want to insert. If you run the code, you can see in the raw_responses the additional information that the LLM generates.

Using structured outputs helps to produce consistent results and solves a problem that is a particular bugbear of mine.

I’ve asked the output to be run through a formatter agent that will structure the result as Markdown. It depends on the LLM, it depends on the prompt, and who knows, maybe it depends on the time of day or the weather, but whenever I think I’ve got it right, an LLM will suddenly insert Markdown fencing. So instead of a clean:

## This is a header

This is some text

I instead get:

Here is your text formatted as Markdown:


''' Markdown  

# This is a header

This is some text  
'''

Infuriating!

Anyway, the answer seems to be to use structured outputs. If you asked it to format the response as the text of what you want, plus a second field called ‘commentary’ or some such thing, it appears to do the right thing. Any extraneous stuff the LLM decides to spout goes in the second field, and the unadulterated Markdown goes in the text field.

OK, Shakespeare, it isn’t: adjusting the instructions so that they are more detailed might give better results (the current prompts are very simple). But it works well enough to illustrate the method.

Conclusion

That’s scratched the surface of OpenAI’s Agents SDK. Thanks for reading, and I hope you found it useful. We have seen how to create agents and how to combine them in different ways, and we took a very quick look at structured outputs.

The examples are, of course, simple, but I hope they illustrate the easy way that agents can be orchestrated simply without resorting to complex abstractions and unwieldy frameworks.

The code here uses the Response API because that is the default. However, it should run the same way with the Completions API, as well. Which means that you are not limited to ChatGPT and, with a bit of jiggery-pokery, this SDK can be used with any LLM that supports the OpenAI Completions API.

There’s plenty more to find out in OpenAI’s documentation.

Images are by the author unless otherwise stated.