Hands-On with Agents SDK: Safeguarding Input and Output with Guardrails

exploring features in the OpenAI Agents SDK framework, there’s one capability that deserves a closer look: input and output guardrails.

In previous articles, we built our first agent with an API-calling tool and then expanded into a multi-agent system. In real-world scenarios, though, building these systems is complex—and without the right safeguards, things can quickly go off track. That’s where guardrails come in: they help ensure safety, focus, and efficiency.

If you haven’t read the earlier parts yet, no worries — you’ll find links to the previous articles at the end of this post.

Here’s why guardrails matter:

Prevent misuse
Save resources
Ensure safety and compliance
Maintain focus and quality

Without proper guardrails, unexpected use cases can pop up. For example, you might have heard of people using AI-powered customer service bots (designed for product support) to write code instead. It sounds funny, but for the company, it can become a costly and irrelevant distraction.

To see why guardrails are important, let’s revisit our last project. I ran the agents_as_tools script and asked it to generate code for calling a weather API. Since no guardrails were in place, the app returned the answer without hesitation—proving that, by default, it will try to do almost anything asked of it.

We definitely don’t want this happening in a production app. Imagine the costs of unintended usage—not to mention the bigger risks it can bring, such as information leaks, system prompt exposure, and other serious vulnerabilities.

Hopefully, this makes the case clear for why guardrails are worth exploring. Next, let’s dive into how to start using the guardrail feature in the OpenAI Agents SDK.

A Quick Intro to Guardrails

In the OpenAI Agents SDK, there are two types of guardrails: input guardrails and output guardrails ^[1]. Input guardrails run on the user’s initial input, while output guardrails run on the agent’s final response.

A guardrail can be an LLM-powered agent—useful for tasks that require reasoning—or a rule-based/programmatic function, such as a regex to detect specific keywords. If the guardrail finds a violation, it triggers a tripwire and raises an exception. This mechanism prevents the main agent from processing unsafe or irrelevant queries, ensuring both safety and efficiency.

Some practical uses for input guardrails include:

Identifying when a user asks an off-topic question ^[2]
Detecting unsafe input attempts, including jailbreaks and prompt injections ^[3]
Moderating to flag inappropriate input, such as harassment, violence, or hate speech ^[3]
Handling specific-case validation. For example, in our weather app, we could enforce that questions only reference cities in Indonesia.

On the other hand, output guardrails can be used to:

Prevent unsafe or inappropriate responses
Stop the agent from leaking personally identifiable information (PII) ^[3]
Ensure compliance and brand safety, such as blocking outputs that could harm brand integrity

In this article, we’ll explore different types of guardrails, including both LLM-based and rule-based approaches, and how they can be applied for various kinds of validation.

Prerequisites

Create a requirements.txt file:

openai-agents
streamlit

Create a virtual environment named venv. Run the following commands in your terminal:

python −m venv venv 
source venv/bin/activate # On Windows: venvScriptsactivate 
pip install -r requirements.txt

Create a .env file to store your OpenAI API key:

OPENAI_API_KEY=your_openai_key_here

For the guardrail implementation, we’ll use the script from the previous article where we built the agents-as-tools multi-agent system. For a detailed walkthrough, please refer back to that article. The full implementation script can be found here: app06_agents_as_tools.py.

Now let’s create a new file named app08_guardrails.py.

Input Guardrail

We’ll start by adding input guardrails to our weather app. In this section, we’ll build two types:

Off-topic guardrail, which uses an LLM to determine if the user input is unrelated to the app’s purpose.
Injection detection guardrail, which uses a simple rule to catch jailbreak and prompt injection attempts.

Import Libraries

First, let’s import the necessary packages from the Agents SDK and other libraries. We’ll also set up the environment to load the OpenAI API key from the .env file. From the Agents SDK, besides the basic functions (Agent, Runner, and function_tool) we’ll also import functions specifically used for implementing input and output guardrails.

from agents import (
    Agent, 
    Runner, 
    function_tool, 
    GuardrailFunctionOutput, 
    input_guardrail, 
    InputGuardrailTripwireTriggered,
    output_guardrail,
    OutputGuardrailTripwireTriggered
)
import asyncio
import requests
import streamlit as st
from pydantic import BaseModel, Field
from dotenv import load_dotenv

load_dotenv()

Define Output Model

For any LLM-based guardrail, we need to define an output model. Typically, we use a Pydantic model class to specify the structure of the data. At the simplest level, we need a boolean field (True/False) to indicate whether the guardrail should trigger, along with a text field that explains the reasoning.

In our case, we want the guardrail to determine whether the query is still within the scope of the app’s purpose (weather and air quality). To do that, we’ll define a model named TopicClassificationOutput as shown below:

# Define output model for the guardrail agent to classify if input is off-topic
class TopicClassificationOutput(BaseModel):
    is_off_topic: bool = Field(
        description="True if the input is off-topic (not related to weather/air quality and not a greeting), False otherwise"
    )
    reasoning: str = Field(
        description="Brief explanation of why the input was classified as on-topic or off-topic"
    )

The boolean field is_off_topic will be set to True if the input is outside the app’s scope. The reasoning field stores a short explanation of why the model made its classification.

Create Guardrail Agent

We need to define an agent with clear and complete instructions to determine whether a user’s question is on-topic or off-topic. This can be adjusted depending on your app’s purpose—the instructions don’t have to be the same for every use case.

For our Weather and Air Quality assistant, here’s the guardrail agent with instructions for classifying a user’s query.

# Create the guardrail agent to determine if input is off-topic
topic_classification_agent = Agent(
    name="Topic Classification Agent",
    instructions=(
        "You are a topic classifier for a weather and air quality application. "
        "Your task is to determine if a user's question is on-topic. "
        "Allowed topics include: "
        "1. Weather-related: current weather, weather forecast, temperature, precipitation, wind, humidity, etc. "
        "2. Air quality-related: air pollution, AQI, PM2.5, ozone, air conditions, etc. "
        "3. Location-based inquiries about weather or air conditions "
        "4. Polite greetings and conversational starters (e.g., 'hello', 'hi', 'good morning') "
        "5. Questions that combine greetings with weather/air quality topics "
        "Mark as OFF-TOPIC only if the query is clearly unrelated to weather/air quality AND not a simple greeting. "
        "Examples of off-topic: math problems, cooking recipes, sports scores, technical support, jokes (unless weather-related). "
        "Examples of on-topic: 'Hello, what's the weather?', 'Hi there', 'Good morning, how's the air quality?', 'What's the temperature?' "
        "The final output MUST be a JSON object conforming to the TopicClassificationOutput model."
    ),
    output_type=TopicClassificationOutput,
    model="gpt-4o-mini" # Use a fast and cost-effective model
)

In the instructions, besides listing the obvious topics, we also allow some flexibility for simple conversational starters like “hello,” “hi,” or other greetings. To make the classification clearer, we included examples of both on-topic and off-topic queries.

Another benefit of input guardrails is cost optimization. To take advantage of this, we should use a faster and more cost-effective model than the main agent. This way, the main (and more expensive) agent is only used when absolutely necessary.

In this example, the guardrail agent uses gpt-4o-mini while the main agent runs on gpt-4o.

Create an Input Guardrail Function

Next, let’s wrap the agent in an async function decorated with @input_guardrail. The output of this function will include two fields defined earlier: is_off_topic and reasoning.

The function returns a structured GuardrailFunctionOutput object containing output_info (set from the reasoning field) and tripwire_triggered.

The tripwire_triggered value determines whether the input should be blocked. If is_off_topic is True, the tripwire triggers, blocking the input. Otherwise, the value is False and the main agent continues processing.

# Create the input guardrail function
@input_guardrail
async def off_topic_guardrail(ctx, agent, input) -> GuardrailFunctionOutput:
    """
    Classifies user input to ensure it is on-topic for a weather and air quality app.
    """

    result = await Runner.run(topic_classification_agent, input, context=ctx.context)
    return GuardrailFunctionOutput(
        output_info=result.final_output.reasoning,
        tripwire_triggered=result.final_output.is_off_topic
    )

Create a Rule-based Input Guardrail Function

Alongside the LLM-based off-topic guardrail, we’ll create a simple rule-based guardrail. This one doesn’t require an LLM and instead relies on programmatic pattern matching.

Depending on your app’s purpose, rule-based guardrails can be very effective at blocking harmful inputs—especially when risky patterns are predictable.

In this example, we define a list of keywords often used in jailbreak or prompt injection attempts. The list includes: "ignore previous instructions", "you are now a", "forget everything above", "developer mode", "override safety", "disregard guidelines".

If the user input contains any of these keywords, the guardrail will trigger automatically. Since no LLM is involved, we can handle the validation directly inside the input guardrail function injection_detection_guardrail:

# Rule-based input guardrail to detect jailbreaking and prompt injection query
@input_guardrail
async def injection_detection_guardrail(ctx, agent, input) -> GuardrailFunctionOutput:
    """
    Detects potential jailbreaking or prompt injection attempts in user input.
    """

    # Simple keyword-based detection
    injection_patterns = [
        "ignore previous instructions",
        "you are now a",
        "forget everything above",
        "developer mode",
        "override safety",
        "disregard guidelines"
    ]

    if any(keyword in input.lower() for keyword in injection_patterns):
        return GuardrailFunctionOutput(
            output_info="Potential jailbreaking or prompt injection detected.",
            tripwire_triggered=True
        )

    return GuardrailFunctionOutput(
        output_info="No jailbreaking or prompt injection detected.",
        tripwire_triggered=False
    )

This guardrail simply checks the input against the keyword list. If a match is found, tripwire_triggered is set to True. Otherwise, it remains False.

Define Specialized Agent for Weather and Air Quality

Now let’s continue by defining the weather and air quality specialist agents with their function tool. The explanation of this part can be found on my previous article so for this article I will skip the explanation.

# Define function tools and specialized agents for weather and air qualities
@function_tool
def get_current_weather(latitude: float, longitude: float) -> dict:
    """Fetch current weather data for the given latitude and longitude."""
    
    url = "https://api.open-meteo.com/v1/forecast"
    params = {
        "latitude": latitude,
        "longitude": longitude,
        "current": "temperature_2m,relative_humidity_2m,dew_point_2m,apparent_temperature,precipitation,weathercode,windspeed_10m,winddirection_10m",
        "timezone": "auto"
    }
    response = requests.get(url, params=params)
    return response.json()

weather_specialist_agent = Agent(
    name="Weather Specialist Agent",
    instructions="""
    You are a weather specialist agent.
    Your task is to analyze current weather data, including temperature, humidity, wind speed and direction, precipitation, and weather codes.

    For each query, provide:
    1. A clear, concise summary of the current weather conditions in plain language.
    2. Practical, actionable suggestions or precautions for outdoor activities, travel, health, or clothing, tailored to the weather data.
    3. If severe weather is detected (e.g., heavy rain, thunderstorms, extreme heat), clearly highlight recommended safety measures.

    Structure your response in two sections:
    Weather Summary:
    - Summarize the weather conditions in simple terms.

    Suggestions:
    - List relevant advice or precautions based on the weather.
    """,
    tools=[get_current_weather],
    tool_use_behavior="run_llm_again"
)

@function_tool
def get_current_air_quality(latitude: float, longitude: float) -> dict:
    """Fetch current air quality data for the given latitude and longitude."""

    url = "https://air-quality-api.open-meteo.com/v1/air-quality"
    params = {
        "latitude": latitude,
        "longitude": longitude,
        "current": "european_aqi,us_aqi,pm10,pm2_5,carbon_monoxide,nitrogen_dioxide,sulphur_dioxide,ozone",
        "timezone": "auto"
    }
    response = requests.get(url, params=params)
    return response.json()

air_quality_specialist_agent = Agent(
    name="Air Quality Specialist Agent",
    instructions="""
    You are an air quality specialist agent.
    Your role is to interpret current air quality data and communicate it clearly to users.

    For each query, provide:
    1. A concise summary of the air quality conditions in plain language, including key pollutants and their levels.
    2. Practical, actionable advice or precautions for outdoor activities, travel, and health, tailored to the air quality data.
    3. If poor or hazardous air quality is detected (e.g., high pollution, allergens), clearly highlight recommended safety measures.

    Structure your response in two sections:
    Air Quality Summary:
    - Summarize the air quality conditions in simple terms.

    Suggestions:
    - List relevant advice or precautions based on the air quality.
    """,
    tools=[get_current_air_quality],
    tool_use_behavior="run_llm_again"
)

Define the Orchestrator Agent with Input Guardrails

Almost the same with previous part, the orchestrator agent here have the same properties with the one that we already discussed on my previous article where in the agents-as-tools pattern, the orchestrator agent will manage the task of each specialized agents instead of handing-offer the task to one agent like in handoff pattern.

The only different here is we adding new property to the agent; input_guardrails. In this property, we pass the list of the input guardrail functions that we have defined before; off_topic_guardrail and injection_detection_guardrail.

# Define the main orchestrator agent with guardrails
orchestrator_agent = Agent(
    name="Orchestrator Agent",
    instructions="""
    You are an orchestrator agent.
    Your task is to manage the interaction between the Weather Specialist Agent and the Air Quality Specialist Agent.
    You will receive a query from the user and will decide which agent to invoke based on the content of the query.
    If both weather and air quality information is requested, you will invoke both agents and combine their responses into one clear answer.
    """,
    tools=[
        weather_specialist_agent.as_tool(
            tool_name="get_weather_update",
            tool_description="Get current weather information and suggestion including temperature, humidity, wind speed and direction, precipitation, and weather codes."
        ),
        air_quality_specialist_agent.as_tool(
            tool_name="get_air_quality_update",
            tool_description="Get current air quality information and suggestion including pollutants and their levels."
        )
    ],
    tool_use_behavior="run_llm_again",
    input_guardrails=[injection_detection_guardrail, off_topic_guardrail],
)


# Define the run_agent function
async def run_agent(user_input: str):
    result = await Runner.run(orchestrator_agent, user_input)
    return result.final_output

One thing that I observed while experimenting with guardrails is when we listed the guardrail function in the agent property, the list will be used as the sequence of the execution. Meaning that we can configure the evaluation order in the point of view of cost and impact.

In our case here, I think I should immediately cut the process if the query violate the prompt injection guardrail due to its impact and also since this validation requires no LLM. So, if the query already identified cannot be proceed, we don’t need to evaluate it using LLM (which has cost) in the off topic guardrail.

Create Main Function with Exception Handler

Here is the part where the input guardrail take a real action. In this part where we define the main function of Streamlit user interface, we will add an exception handling in particular when the input guardrail tripwire has been triggered.

# Define the main function of the Streamlit app
def main():
    st.title("Weather and Air Quality Assistant")
    user_input = st.text_input("Enter your query about weather or air quality:")

    if st.button("Get Update"):
        with st.spinner("Thinking..."):
            if user_input:
                try:
                    agent_response = asyncio.run(run_agent(user_input))
                    st.write(agent_response)
                except InputGuardrailTripwireTriggered as e:
                    st.write("I can only help with weather and air quality related questions. Please try asking something else! ")
                    st.error("Info: {}".format(e.guardrail_result.output.output_info))
                except Exception as e:
                    st.error(e)
            else:
                st.write("Please enter a question about the weather or air quality.")

if __name__ == "__main__":
    main()

As we can see in the code above, when the InputGuardrailTripwireTriggered is raise, it will show a user-friendly message that tell the user the app only can help for weather and air quality related question.

To make the message will be more helpful, we also add additional info specifically for what input guardrail that blocked the user’s query. If the exception raised by off_topic_guardrail, it will show the reasoning from the agent that handle this. Meanwhile if it coming from injection_detection_guardrail, the app will show a hard-coded message “Potential jailbreaking or prompt injection detected.”.

Run and Check

To test how the input guardrail works, let’s start by running the Streamlit app.

streamlit run app08_guardrails.py

First, let’s try asking a question that aligns with the app’s intended purpose.

Agent’s response where the question is aligned with weather and air quality.

As expected, the app returns an answer since the question is related to weather or air quality.

Using Traces, we can see what’s happening under the hood.

Screenshot of Traces dashboard that shows the sequence of input guardrails and main agent run.

As discussed earlier, the input guardrails run before the main agent. Since we set the guardrail list in order, the injection_detection_guardrail runs first, followed by the off_topic_guardrail. Once the input passes these two guardrails, the main agent can execute the process.

However, if we change the question to something completely unrelated to weather or air quality—like the history of Jakarta—the response looks like this:

If the question is not aligned, input guardrail will block the input before main agent takes action.

Here, the off_topic_guardrail triggers the tripwire, cuts the process midway, and returns a message along with some extra details about why it happened.

Screenshot of Traces dashboard that shows how the input guardrail blocked the process.

From the Traces dashboard for that history question, we can see the orchestrator agent throws an error because the guardrail tripwire was triggered.

Since the process was cut before the input reached the main agent, we never even called the main agent model—saving some bucks on a query the app isn’t supposed to handle anyway.

Output Guardrail

If the input guardrail ensures that the user’s query is safe and relevant, the output guardrail ensures that the agent’s response itself meets our desired standards. This is equally important because even with strong input filtering, the agent can still produce outputs that are unintended, harmful, or simply not aligned with our requirements.

For example, in our app we want to ensure that the agent always responds professionally. Since LLMs often mirror the tone of the user’s query, they might reply in a casual, sarcastic, or unprofessional tone—which is outside the scope of the input guardrails we already implemented.

To handle this, we add an output guardrail that checks whether a response is professional. If it’s not, the guardrail will trigger and prevent the unprofessional response from reaching the user.

Prepare the Output Guardrail Function

Just like the off_topic_guardrail, we create a new professionalism_guardrail. It uses a Pydantic model for the output, a dedicated agent to classify professionalism, and an async function decorated with @output_guardrail to enforce the check.

# Define output model for Output Guardrail Agent
class ResponseCheckerOutput(BaseModel):
    is_not_professional: bool = Field(
        description="True if the output is not professional, False otherwise"
    )
    reasoning: str = Field(
        description="Brief explanation of why the output was classified as professional or unprofessional"
    )

# Create Output Guardrail Agent
response_checker_agent = Agent(
    name="Response Checker Agent",
    instructions="""
    You are a response checker agent.
    Your task is to evaluate the professionalism of the output generated by other agents.

    For each response, provide:
    1. A classification of the response as professional or unprofessional.
    2. A brief explanation of the reasoning behind the classification.

    Structure your response in two sections:
    Professionalism Classification:
    - State whether the response is professional or unprofessional.

    Reasoning:
    - Provide a brief explanation of the classification.
    """,
    output_type=ResponseCheckerOutput,
    model="gpt-4o-mini"
)

# Define output guardrail function
@output_guardrail
async def professionalism_guardrail(ctx, agent, output) -> GuardrailFunctionOutput:
    result = await Runner.run(response_checker_agent, output, context=ctx.context)
    return GuardrailFunctionOutput(
        output_info=result.final_output.reasoning,
        tripwire_triggered=result.final_output.is_not_professional
    )

Output Guardrail Implementation

Now we add this new guardrail to the orchestrator agent by listing it under output_guardrails. This ensures every response is checked before being shown to the user.

# Add professionalism guardrail to the orchestrator agent
orchestrator_agent = Agent(
    name="Orchestrator Agent",
    instructions="...same as before...",
    tools=[...],
    input_guardrails=[injection_detection_guardrail, off_topic_guardrail],
    output_guardrails=[professionalism_guardrail],
)

Finally, we extend the main function to handle OutputGuardrailTripwireTriggered exceptions. If triggered, the app will block the unprofessional response and return a friendly fallback message instead.

# Handle output guardrail in the main function
except OutputGuardrailTripwireTriggered as e:
    st.write("The response didn't meet our quality standards. Please try again.")
    st.error("Info: {}".format(e.guardrail_result.output.output_info))

Run and Check

Now, let’s take a look at how the output guardrail works. Start by running the app as before:

streamlit run app08_guardrails.py

To test this, we can try to force the agent to answer in an unprofessional way related to weather or air quality. For example, by asking: “Answer this question with hyperbole. What is the air quality in Jakarta?”

Output guardrail blocked the agent response that is violate the quality standard.

This query passes the input guardrails because it is still on-topic and not an attempt at prompt injection. As a result, the main agent processes the input and calls the correct function.

However, the final output generated by the main agent—since it followed the user’s hyperbole request—does not align with the brand’s communication standard. Here’s the result we got from the app:

Conclusion

Throughout this article, we explored how guardrails in the OpenAI Agents SDK help us maintain control over both input and output. The input guardrail we built here protects the app from harmful or unintended user input that could cost us as developers, while the output guardrail ensures responses remain consistent with the brand standard.

By combining these mechanisms, we can significantly reduce the risks of unintended usage, information leaks, or outputs that fail to align with the intended communication style. This is especially crucial when deploying agentic applications into production environments, where safety, reliability, and trust matter most.

Guardrails are not a silver bullet, but they are an essential layer of defense. As we continue building more advanced multi-agent systems, adopting guardrails early on will help ensure we create applications that are not only powerful but also safe, responsible, and cost-conscious.

Previous Articles in This Series

References

[1] OpenAI. (2025). OpenAI Agents SDK documentation. Retrieved August 30, 2025, from https://openai.github.io/openai-agents-python/guardrails/

[2] OpenAI. (2025). How to use guardrails. OpenAI Cookbook. Retrieved August 30, 2025, from https://cookbook.openai.com/examples/how_to_use_guardrails

[3] OpenAI. (2025). A practical guide to building agents. Retrieved August 30, 2025, from https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf

You can find the complete source code used in this article in the following repository: agentic-ai-weather | GitHub Repository. Feel free to explore, clone, or fork the project to follow along or build your own version.

If you’d like to see the app in action, I’ve also deployed it here: Weather Assistant Streamlit

Lastly, let’s connect on LinkedIn!

Hands-On with Agents SDK: Safeguarding Input and Output with Guardrails

A Quick Intro to Guardrails

Prerequisites

Input Guardrail

Import Libraries

Define Output Model

Create Guardrail Agent

Create an Input Guardrail Function

Create a Rule-based Input Guardrail Function

Define Specialized Agent for Weather and Air Quality

Define the Orchestrator Agent with Input Guardrails

Create Main Function with Exception Handler

Run and Check

Output Guardrail

Prepare the Output Guardrail Function

Output Guardrail Implementation

Run and Check

Conclusion

Previous Articles in This Series

References

Related Posts

The Programming Skills You Need for Today’s Data Roles

Extracting Structured Data with LangExtract: A Deep Dive into LLM-Orchestrated Workflows

Leave a Reply Cancel reply