Home » LangGraph 101: Let’s Build A Deep Research Agent

LangGraph 101: Let’s Build A Deep Research Agent

that actually work in practice is not an easy task.

You need to consider how to orchestrate the multi-step workflow, keep track of the agents’ states, implement necessary guardrails, and monitor decision processes as they happen.

Fortunately, LangGraph addresses exactly those pain points for you.

Recently, Google just demonstrated this perfectly by open-sourcing a full-stack implementation of a Deep Research Agent built with LangGraph and Gemini (with Apache-2.0 license).

This isn’t a toy implementation: the agent can not only search, but also dynamically evaluate the results to decide if more information is needed by doing further searches. This iterative workflow is exactly the kind of thing where LangGraph really shines.

So, if you want to learn how LangGraph works in practice, what better place to start than a real, working agent like this?

Here’s our game plan for this tutorial post: We’ll adopt a “problem-driven” learning approach. Instead of starting with lengthy, abstract concepts, we’ll jump right into the code and examine Google’s implementation. After that, we’ll connect each piece back to the core concepts of LangGraph.

By the end, you’ll not only have a working research agent but also enough LangGraph knowledge to build whatever comes next.

All the code we’ll be discussing in this post comes from the official Google Gemini repository, which you can find here. Our focus will be on the backend logic (backend/src/agent/ directory) where the research agent is defined.

Here is the visual roadmap for this post:

Figure 1. Table of Contents for this post. (Image by author)

1. The Big Picture — Modeling the Workflow with Graphs, Nodes, and Edges

🎯 The problem

In this case study, we’ll build something exciting: an LLM-based research-agumented agent, the minimal replication of the Deep Research features you’ve already seen in ChatGPT, Gemini, Claude, or Perplexity. That’s what we’re aiming for here.

Specifically, our agent will work like this:

It takes in a user query, autonomously searches the web, examines the search results it obtains, and then decide if enough information has been found. If that’s the case, it proceeds with creating a well-crafted mini-report with proper citations; Otherwise, it circles back to dig deeper with more searches.

First things first, let’s sketch out a high-level flowchart so that we are clear what we’re building here:

Figure 2. High-level flowchart (Image by author)

💡LangGraph’s solution

Now, how should we model this workflow in LangGraph? Well, as the name suggests, LangGraph uses graph representations. Ok, but why use graphs?

The short answer is this: graphs are great for modeling complex, stateful flows, just like the application we aim to build here. When you have branching decisions, loops that need to circle back, and all the other messy realities that real-world agentic workflow would throw at you, graphs give you one of the most natural ways to represent them all.

Technically, a graph is composed of nodes and edges. In LangGraph’s world, nodes are individual processing steps in the workflow, and edges define transitions between steps, that is, defining how control and state flow through the system.

> Let’s see some code!

In LangGraph, the translation from flowchart to code is straightforward. Let’s look at agent/graph.py from the Google repository to see how this is done.

The first step is to create the graph itself:

from langgraph.graph import StateGraph
from agent.state import (
    OverallState,
    QueryGenerationState,
    ReflectionState,
    WebSearchState,
)
from agent.configuration import Configuration

# Create our Agent Graph
builder = StateGraph(OverallState, config_schema=Configuration)

Here, StateGraph is LangGraph’s builder class for a state-aware graph. It accepts anOverallState class that defines what information can move between nodes (this is the agent memory part we will discuss in the next section), and a Configuration class that defines runtime-tunable parameters, such as which LLM to call at individual steps, the number of initial queries to generate, etc. More details on this will follow in the next sections.

Once we have the graph container, we can add nodes to it:

# Define the nodes we will cycle between
builder.add_node("generate_query", generate_query)
builder.add_node("web_research", web_research)
builder.add_node("reflection", reflection)
builder.add_node("finalize_answer", finalize_answer)

The add_node() method takes the first argument as the node’s name and the second argument as the callable that is executed when the node runs.

Generally, this callable can be a plain function, an async function, a LangChain Runnable, or even another compiled StateGraph.

In our specific case:

  • generate_query generates search queries based on the user’s question.
  • web_search performs web research using the native Google Search API tool.
  • reflection identifies knowledge gaps and generates potential follow-up queries.
  • finalize_answer finalizes the research summary.

We will examine the detailed implementation of those functions later.

Ok, now that we have the nodes defined, the next step is to add edges to connect them and define execution order:

from langgraph.graph import START, END

# Set the entrypoint as `generate_query`
# This means that this node is the first one called
builder.add_edge(START, "generate_query")

# Add conditional edge to continue with search queries in a parallel branch
builder.add_conditional_edges(
    "generate_query", continue_to_web_research, ["web_research"]
)

# Reflect on the web research
builder.add_edge("web_research", "reflection")

# Evaluate the research
builder.add_conditional_edges(
    "reflection", evaluate_research, ["web_research", "finalize_answer"]
)

# Finalize the answer
builder.add_edge("finalize_answer", END)

A couple of things are worth pointing out here:

  • Notice how those node names we defined earlier (e.g., “generate_query”, “web_research”, etc.) now come in handy—we can reference them directly in our edge definitions.
  • We see that two types of edges are used, i.e., the static edge and the conditional edge.
  • When builder.add_edge() is used, a direct, unconditional connection between two nodes is created. In our case, builder.add_edge("web_research", "reflection") basically means that after web research is completed, the flow will always move to the reflection step.
  • On the other hand, when builder.add_conditional_edges() is used, the flow may jump to different branches at runtime. We need three key arguments when creating a conditional edge: the source node, a routing function, and a list of possible destination nodes. The routing function examines the current state and returns the name of the next node to visit. For example, the evaluate_research() function determines whether the agent needs more research (in that case, go to the "web_research" node) or if the information is already sufficient that the agent can finalize the answer (go to the "finalize_answer" node).

But why do we need a conditional edge between “generate_query” and “web_research”? Shouldn’t it be a static edge since we always want to search after generating queries? Good catch! That actually has something to do with how LangGraph enables parallelization. We will discuss that later in-depth.

  • We also notice two special nodes: START and END. These are LangGraph’s built-in entry and exit points. Every graph needs exactly one starting point (where execution begins), but can have multiple ending points (where execution terminates).

Finally, it’s time to put everything together and compile the graph into an executable agent:

graph = builder.compile(name="pro-search-agent")

And that’s it! We’ve successfully translated our flowchart into a LangGraph implementation.

🎁 Bonus Read: Why Do Graphs Truly Shine?

Beyond being a natural fit for nonlinear workflows, LangGraph’s node/edge/graph representation brings several additional practical benefits that make building and managing agents easy in the real world:

  • Fine-grained control & observability. Because every node/edge has its own identity, you can easily checkpoint your progress and examine under the hood when something unexpected happens. This makes debugging and evaluation simple.
  • Modularity & reuse. You can bundle individual steps into reusable subgraphs, just like Lego bricks. Talking about software best practices in action.
  • Parallel paths. When parts of your workflow are independent, graphs easily let them run concurrently. Obviously, this helps address latency issues and makes your system more robust to faults, which is especially critical when your pipelines are complex.
  • Easily visualizable. Whether it’s debugging or presenting the approach, it’s always nice to be able to see the workflow logic. Graphs are just natural for visualization.

📌Key takeaways

Let’s recap what we’ve covered in this foundational section:

  • LangGraph uses graphs to describe the agentic workflow, as graphs elegantly handle branching, looping, and other nonlinear procedures.
  • In LangGraph, nodes represent processing steps and edges define transitions between steps.
  • LangGraph implements two types of edges: static edges and conditional edges. When you have fixed transitions between nodes, use static edges. If the transition may change in runtime based on dynamic decision, use conditional edges.
  • Building a graph in LangGraph is simple. You first create a StateGraph, then add nodes (with their functions), connect them with edges. Finally, you compile the graph. Done!
Figure 3. Building agentic graph in LangGraph. (Image by author)

Now that we understand the basic structure, you’re probably wondering: how does information flow between these nodes? This brings us to one of LangGraph’s most important concepts: state management.

Let’s check that out.


2. The Agent’s Memory — How Nodes Share Information with State

Figure 4. The current progress. (Image by Author)

🎯 The problem

As our agent walks through the graph we defined earlier, it needs to keep track of things it has generated/learned. For example:

  • The original question from the user.
  • The list of search queries it has generated.
  • The content it has retrieved from the web.
  • Its own internal reflections about whether the gathered information is sufficient.
  • The final, polished answer.

So, how should we maintain that information so that our nodes do not work in isolation but instead collaborate and build upon each other’s work?

💡 LangGraph’s solution

The LangGraph way of solving this problem is by introducing a central state object, a shared whiteboard that every node in the graph can look at and write on.

Here’s how it works:

  • When a node is executed, it receives the current state of the graph.
  • The node performs its task (e.g., calls an LLM, runs a tool) using information from the state.
  • The node then returns a dictionary containing only the parts of the state it wants to update or add.
  • LangGraph then takes this output and automatically merges it into the main state object, before passing it to the next node.

Since the state passing and merging are handled at the framework level by LangGraph, individual nodes don’t need to worry about how to access or update shared data.  They just need to focus on their specific task logic.

Also, this pattern makes your agent workflows highly modular. You can easily add, remove, or reorder nodes without breaking the state flow.

> Let’s see some code!

Remember this line from the last section?

# Create our Agent Graph
builder = StateGraph(OverallState, config_schema=Configuration)

We mentioned that OverallState defines the agent’s memory, but does not yet show how exactly it is implemented. Now it’s a good time to open the black box.

In the repo, OverallState is defined inagent/state.py:

from typing import TypedDict, Annotated, List
from langgraph.graph.message import add_messages
import operator

class OverallState(TypedDict):
    messages: Annotated[list, add_messages]
    search_query: Annotated[list, operator.add]
    web_research_result: Annotated[list, operator.add]
    sources_gathered: Annotated[list, operator.add]
    initial_search_query_count: int
    max_research_loops: int
    research_loop_count: int
    reasoning_model: str

Essentially, we can see that the so-called state is a TypedDict that serves as a contract. It defines every field your workflow cares about and how those fields should be merged when multiple nodes write to them. Let’s break that down:

  • Field purposes: messages stores conversation history, search_query,web_search_result , and source_gathered track the agent’s research process. The other fields control agent behavior by setting limits and tracking progress.
  • The Annotated pattern: We see some fields use Annotated[list, add_messages]or Annotated[list, operator.add]. This is meant to tell LangGraph how to do the merge update when multiple nodes modify the same field. Specifically, add_messages is LangGraph’s built-in function for intelligently merging conversation messages, while operator.add concatenates lists when nodes add new items.
  • Merge behavior: Fields like research_loop_count: int simply replace the old value when updated. Annotated fields, on the other hand, are cumulative.  They build up over time as different nodes dump information into it.

While OverallState serves as the global memory, probably it is better to also define smaller, node-specific states to act as a clear “API contract” for what a node needs and produces. After all, it is often the case that one specific node will not require all the information from the entire OverallState, nor modify all the content in OverallState.

This is exactly what LangGraph did.

Inagent/state.py, besides defining OverallState, three other states are also defined:

class ReflectionState(TypedDict):
    is_sufficient: bool
    knowledge_gap: str
    follow_up_queries: Annotated[list, operator.add]
    research_loop_count: int
    number_of_ran_queries: int

class QueryGenerationState(TypedDict):
    query_list: list[Query]

class WebSearchState(TypedDict):
    search_query: str
    id: str

Those states are used by the nodes in the following way (agent/graph.py):

from agent.state import (
    OverallState,
    QueryGenerationState,
    ReflectionState,
    WebSearchState,
)

def generate_query(
    state: OverallState, 
    config: RunnableConfig
) -> QueryGenerationState:
    # ...Some logic to generate search queries...
    return {"query_list": result.query}

def continue_to_web_research(
    state: QueryGenerationState
):
    # ...Some logic to send out multiple search queries...

def web_research(
    state: WebSearchState, 
    config: RunnableConfig
) -> OverallState:
    # ...Some logic to performs web research...
    return {
        "sources_gathered": sources_gathered,
        "search_query": [state["search_query"]],
        "web_research_result": [modified_text],
    }

def reflection(
    state: OverallState, 
    config: RunnableConfig
) -> ReflectionState:
    # ...Some logic to reflect on the results...
    return {
        "is_sufficient": result.is_sufficient,
        "knowledge_gap": result.knowledge_gap,
        "follow_up_queries": result.follow_up_queries,
        "research_loop_count": state["research_loop_count"],
        "number_of_ran_queries": len(state["search_query"]),
    }

def evaluate_research(
    state: ReflectionState,
    config: RunnableConfig,
) -> OverallState:
    # ...Some logic to determine the next step in the research flow...

def finalize_answer(
    state: OverallState, 
    config: RunnableConfig) -> OverallState:
    # ...Some logic to finalize the research summary...

    return {
        "messages": [AIMessage(content=result.content)],
        "sources_gathered": unique_sources,
    }

Take thereflection node as an example: It reads from the OverallState but returns a dictionary that matches the ReflectionState contract. Afterward, LangGraph will handle the job of merging them into the main OverallState, making them available for the next nodes in the graph.

🎁 Bonus Read: Where Did My State Go?

A common confusion when working with LangGraph is how OverallState and these smaller, node-specific states interact. Let’s clear that confusion here.

The crucial mental model we need to have is this: there is only one state dictionary at runtime, the OverallState.

Node-specific TypedDicts are not extra runtime data stores. Instead, they are just typed “views” onto the one underlying dictionary (OverallState), that temporarily zoom in on the parts a node should see or produce. The purpose of their existence is that the type checker and the LangGraph runtime can enforce clear contracts.

Figure 5. A quick comparison of the two state types. (Image by Author)

Before a node runs, LangGraph can use its type hints to create a “slice” of the OverallState containing only the inputs that the node needs.

The node runs its logic and returns its small, specific output dictionary (e.g., a ReflectionState dict).

LangGraph takes the returned dictionary and runs OverallState.update(return_dict). If any keys were defined with an aggregator (like operator.add), that logic is applied. The updated OverallState is then passed to the next node.

So why has LangGraph embraced this two-level state definition? Besides enforcing a clear contract for the node and making node operations self-documenting, there are two other benefits also worth mentioning:

  • Drop-in reusability: Because a node only advertises the small slice of state it needs and produces, it becomes a modular, plug-and-play component. For example, a generate_query node that only needs {user_query} from the state and outputs {queries} can be dropped into another, completely different graph, so long as that graph’s OverallState can provide a user_query. If the node were coded against the entire global state (i.e., typed with OverallState for both its input and output), you can easily break the workflow if you rename any unrelated key. This modularity is quite essential for building complex systems.
  • Efficiency in parallel flows: Imagine our agent needs to run 10 web searches simultaneously. If we are using a node-specific state as a small payload, we then just need to send the search query to each parallel branch. This is way more efficient than sending a copy of the entire agent memory (remember the full chat history is also stored in OverallState!) to all ten branches. This way, we can dramatically cut down on memory and serialization overhead.

So what does this mean for us in practice?

  •  Declare in OverallState every key that needs to persist or to be visible to multiple different nodes.
  •  Make the node-specific states as small as possible. They should contain only the fields that the node is responsible for producing.
  •  Every key you write must be declared in some state schema; otherwise, LangGraph raises InvalidUpdateError when the node tries to write it.

📌Key takeaways

Let’s recap what we’ve covered in this section:

  • LangGraph maintains states at two levels: At the global level, there is the OverallState object that serves as the central memory. At the individual node level, small, TypedDict-based objects store node-specific inputs/outputs. This keeps the state management clean and organized.
  • After each step, nodes would return minimal output dicts, which is then merged back into the central memory (OverallState). This merging is done according to your custom rules (e.g., operator.add for lists).
  • Nodes are self-contained and modular. You can easily resue them like building blocks to create new workflows.
Figure 6. Key points to remember in LangGraph state management. (Image by author)

Now we’ve understood the graph’s structure and how state flows through it, but what happens inside each node? Let’s now turn to the node operations.


3. Node Operations — Where The Real Work Happens

Figure 7. The current progress. (Image by Author)

Our graph can route messages and hold state, but inside each node, we still need to:

  • Make sure the LLM outputs the right format.
  • Call external APIs.
  • Run multiple searches in parallel.
  • Decide when to stop the loop.

Luckily, LangGraph has your back with several solid approaches for tackling these challenges. Let’s meet them one by one, each through a slice of our working codebase.

3.1 Structured output

🎯 The problem

Getting an LLM to return a JSON object is easy, but parsing free-text JSON is just unreliable in practice. As soon as LLMs use a different phrase, add unexpected formatting, or change the key order, our workflow can easily go off the rails. In short, we need guaranteed, validatable output structures at each processing step.

💡 LangGraph’s solution

We constrain the LLM to generate output that conforms to a predefined schema. This can be done by attaching a Pydantic schema to the LLM call using llm.with_structured_output(), which is a helper method that is provided by every LangChain chat-model wrapper (e.g., ChatGoogleGenerativeAI, ChatOpenAI, etc.).

> Let’s see some code!

Let’s look at the generate_query node, whose job is to create a list of search queries. Since we need this list to be a clean Python object, not a messy string, for the next node to parse, it would be a good idea to enforce the output schema, with SearchQueryList (defined in agent/tools_and_schemas.py):

from typing import List
from pydantic import BaseModel, Field

class SearchQueryList(BaseModel):
    query: List[str] = Field(
        description="A list of search queries to be used for web research."
    )
    rationale: str = Field(
        description="A brief explanation of why these queries are relevant to the research topic."
    )

And here is how this schema is used in the generate_query node:

from langchain_google_genai import ChatGoogleGenerativeAI
from agent.prompts import (
    get_current_date,
    query_writer_instructions,
)

def generate_query(
    state: OverallState, 
    config: RunnableConfig
) -> QueryGenerationState:
    """LangGraph node that generates a search queries 
       based on the User's question.

    Uses Gemini 2.0 Flash to create an optimized search 
    query for web research based on the User's question.

    Args:
        state: Current graph state containing the User's question
        config: Configuration for the runnable, including LLM 
                provider settings

    Returns:
        Dictionary with state update, including search_query key 
        containing the generated query
    """
    configurable = Configuration.from_runnable_config(config)

    # check for custom initial search query count
    if state.get("initial_search_query_count") is None:
        state["initial_search_query_count"] = configurable.number_of_initial_queries

    # init Gemini 2.0 Flash
    llm = ChatGoogleGenerativeAI(
        model=configurable.query_generator_model,
        temperature=1.0,
        max_retries=2,
        api_key=os.getenv("GEMINI_API_KEY"),
    )
    structured_llm = llm.with_structured_output(SearchQueryList)

    # Format the prompt
    current_date = get_current_date()
    formatted_prompt = query_writer_instructions.format(
        current_date=current_date,
        research_topic=get_research_topic(state["messages"]),
        number_queries=state["initial_search_query_count"],
    )
    # Generate the search queries
    result = structured_llm.invoke(formatted_prompt)
    return {"query_list": result.query}

Here, llm.with_structured_output(SearchQueryList) wraps the Gemini model with LangChain’s structured-output helper. Under the hood, it uses the model’s preferred structured-output feature (JSON mode for Gemini 2.0 Flash) and automatically parses the reply into a SearchQueryList Pydantic instance, so result is already validated Python data.

It’s also interesting to check out the system prompt Google used for this node:

query_writer_instructions = """Your goal is to generate sophisticated and 
diverse web search queries. These queries are intended for an advanced 
automated web research tool capable of analyzing complex results, following 
links, and synthesizing information.

Instructions:
- Always prefer a single search query, only add another query if the original 
  question requests multiple aspects or elements and one query is not enough.
- Each query should focus on one specific aspect of the original question.
- Don't produce more than {number_queries} queries.
- Queries should be diverse, if the topic is broad, generate more than 1 query.
- Don't generate multiple similar queries, 1 is enough.
- Query should ensure that the most current information is gathered. 
  The current date is {current_date}.

Format: 
- Format your response as a JSON object with ALL three of these exact keys:
   - "rationale": Brief explanation of why these queries are relevant
   - "query": A list of search queries

Example:

Topic: What revenue grew more last year apple stock or the number of people 
buying an iphone
```json
{{
    "rationale": "To answer this comparative growth question accurately, 
we need specific data points on Apple's stock performance and iPhone sales 
metrics. These queries target the precise financial information needed: 
company revenue trends, product-specific unit sales figures, and stock price 
movement over the same fiscal period for direct comparison.",
    "query": ["Apple total revenue growth fiscal year 2024", "iPhone unit 
sales growth fiscal year 2024", "Apple stock price growth fiscal year 2024"],
}}
```

Context: {research_topic}"""

We see some prompt engineering best practices in action, like defining the model’s role, specifying constraints, providing an example for illustration, etc.

3.2 Tool calling

🎯 The problem

For our research agent to succeed, it needs up-to-date information from the web. To realize that, it needs a “tool” to search the web.

💡 LangGraph’s solution

Nodes can execute tools. These can be native LLM tool-calling features (like in Gemini) or integrated through LangChain’s tool abstractions. Once the tool-calling results are gathered, they can be placed back into the agent’s state.

> Let’s see some code!

For the tool-calling usage pattern, let’s look at the web_research node. This node uses Gemini’s native tool-calling feature to perform Google searches. Notice how the tool is specified directly in the model’s configuration.

from langchain_google_genai import ChatGoogleGenerativeAI
from agent.prompts import (
    web_searcher_instructions,
)
from agent.utils import (
    get_citations,
    insert_citation_markers,
    resolve_urls,
)

def web_research(
    state: WebSearchState, 
    config: RunnableConfig
) -> OverallState:
    """LangGraph node that performs web research using the native Google 
       Search API tool.

    Executes a web search using the native Google Search API tool in 
    combination with Gemini 2.0 Flash.

    Args:
        state: Current graph state containing the search query and 
               research loop count
        config: Configuration for the runnable, including search API settings

    Returns:
        Dictionary with state update, including sources_gathered, 
        research_loop_count, and web_research_results
    """
    # Configure
    configurable = Configuration.from_runnable_config(config)
    formatted_prompt = web_searcher_instructions.format(
        current_date=get_current_date(),
        research_topic=state["search_query"],
    )

    # Uses the google genai client as the langchain client doesn't 
    # return grounding metadata
    response = genai_client.models.generate_content(
        model=configurable.query_generator_model,
        contents=formatted_prompt,
        config={
            "tools": [{"google_search": {}}],
            "temperature": 0,
        },
    )
    # resolve the urls to short urls for saving tokens and time
    resolved_urls = resolve_urls(
        response.candidates[0].grounding_metadata.grounding_chunks, state["id"]
    )
    # Gets the citations and adds them to the generated text
    citations = get_citations(response, resolved_urls)
    modified_text = insert_citation_markers(response.text, citations)
    sources_gathered = [item for citation in citations for item in citation["segments"]]

    return {
        "sources_gathered": sources_gathered,
        "search_query": [state["search_query"]],
        "web_research_result": [modified_text],
    }

The LLM sees the Google Search tool and understands that it can use the tool to fulfill the prompt. A key benefit of this native integration is the grounding_metadata returned with the response. That metadata contains grounding chunks — essentially, snippets of the answer paired with the URL that justified them. This basically gives us citations for free.

3.3 Conditional routing

🎯 The problem

After the initial research, how does the agent know whether to stop or continue? We need a control mechanism to create a research loop that can terminate itself.

💡 LangGraph’s solution

Conditional routing is handled by a special type of node: instead of returning state, this node returns the name of the next node to visit. Effectively, this node implements a routing function that inspects the current state and makes a decision regarding how to direct the traffic within the graph.

> Let’s see some code!

The evaluate_research node is our agent’s decision-maker. It checks the is_sufficient flag set by the reflection node and compares the current research_loop_count value against a pre-configured maximum threshold value.

def evaluate_research(
    state: ReflectionState,
    config: RunnableConfig,
) -> OverallState:
    """LangGraph routing function that determines the next step in the 
       research flow.

    Controls the research loop by deciding whether to continue gathering 
    information or to finalize the summary based on the configured maximum 
    number of research loops.

    Args:
        state: Current graph state containing the research loop count
        config: Configuration for the runnable, including max_research_loops 
                setting

    Returns:
        String literal indicating the next node to visit 
        ("web_research" or "finalize_summary")
    """
    configurable = Configuration.from_runnable_config(config)
    max_research_loops = (
        state.get("max_research_loops")
        if state.get("max_research_loops") is not None
        else configurable.max_research_loops
    )
    if state["is_sufficient"] or state["research_loop_count"] >= max_research_loops:
        return "finalize_answer"
    else:
        return [
            Send(
                "web_research",
                {
                    "search_query": follow_up_query,
                    "id": state["number_of_ran_queries"] + int(idx),
                },
            )
            for idx, follow_up_query in enumerate(state["follow_up_queries"])
        ]

If the condition to stop is met, it returns the string "finalize_answer", and LangGraph proceeds to that node. If not, it returns a new list of Send objects containing the follow_up_queries, which spins up another parallel wave of web research, continuing the loop.

Send object…What is it then?

Well, it is LangGraph’s way of triggering parallel execution. Let’s turn to that now.

3.4 Parallel processing

🎯 The problem

To answer the user’s query as comprehensively as possible, we would need our generate_query node to produce multiple search queries. However, we don’t want to run those search queries one by one, as it would be very slow and inefficient. What we want is to execute the web searches for all queries concurrently.

💡 LangGraph’s solution

To trigger parallel execution, a node can return a list of Send objects. Send is a special directive that tells the LangGraph scheduler to dispatch these tasks to the specified node (e.g.,"web_research") concurrently, each with its own piece of state.

> Let’s see some code!

To enable the parallel search, Google’s implementation introduces the continue_to_web_research node to act as a dispatcher. It takes the query_list from the state and creates a separate Send task for each query.

from langgraph.types import Send

def continue_to_web_research(
    state: QueryGenerationState
):
    """LangGraph node that sends the search queries to the web research node.
    This is used to spawn n number of web research nodes, one for each 
    search query.
    """
    return [
        Send("web_research", {"search_query": search_query, "id": int(idx)})
        for idx, search_query in enumerate(state["query_list"])
    ]

And that’s all the code you need. The magic lives in what happens after this node returns.

When LangGraph receives this list, it’s smart enough not to simply loop through it. In fact, it triggers a sophisticated fan-out/fan-in process under the hood to handle things concurrently:

To begin with, each Send object carries only the tiny payload you gave it ({"search_query": ..., "id": ...}), not the entire OverallState. The purpose here is to have fast serialization.

Then, the graph scheduler spins off an asyncio task for every item in the list. This concurrency happens automatically, you as the workflow builder don’t need to worry anything about writing async def or managing a thread pool.

Finally, after all the parallel web_research branches are completed, their individually returned dictionaries are automatically merged back into the main OverallState. Remember the Annotated[list, operator.add] we discussed in the beginning? Now it becomes crucial: fields defined with this type of reducer, like sources_gathered, will have their results concatenated into a single list.

You may want to ask: what happens if one of the parallel searches fails or times out? This is exactly why we added a custom id to each Send payload. This ID flows directly into the trace logs, allowing you to pinpoint and debug the exact branch that failed.

If you remember from earlier, we have the following line in our graph definition:

# Add conditional edge to continue with search queries in a parallel branch
builder.add_conditional_edges(
    "generate_query", continue_to_web_research, ["web_research"]
)

You might be wondering: why do we need to declare continue_to_web_research node as part of a conditional edge?

The crucial thing to realize is that: continue_to_web_research isn’t just another step in the pipeline—it’s a routing function.

The generate_query node can return zero queries (when the user asks something trivial) or twenty. A static edge would force the workflow to invoke web_research exactly once, even if there’s nothing to do. By implementing as a conditional edge continue_to_web_research decides at runtime, whether to dispatch—and, thanks to Send, how many parallel branches to spawn. If continue_to_web_research returns an empty list, LangGraph simply doesn’t follow the edge. That saves the round-trip to the search API.

Finally, this is again the software engineering best practice in action: generate_query focuses on what to search, continue_to_web_research on whether and how to search, and web_research on doing the search, a clean separation of concerns.

3.5 Configuration management

🎯 The problem

For nodes to properly do their jobs, they need to know, for example:

  • Which LLM to use with what parameter settings (e.g., temperature)?
  • How many initial search queries should be generated?
  • What’s the cap on total research loops and on per-run concurrency?
  • And many others…

In short, we need a clean, centralized way to manage these settings without cluttering our core logic.

💡 LangGraph’s Solution

LangGraph solves this by passing a single, standardized config into every node that needs it. This object acts as a universal container for run-specific settings.

Inside the node, LangGraph then uses a custom, typed helper class to intelligently parse this config object. This helper class implements a clear hierarchy for fetching values:

  • It first looks for overrides passed in the config object for the current run.
  • If not found, it falls back to checking for environment variables.
  • If still not found, it uses the defaults defined directly in this helper class.

> Let’s see some code!

Let’s look at the implementation of the reflection node to see it in action.

def reflection(
    state: OverallState, 
    config: RunnableConfig
) -> ReflectionState:
    """LangGraph node that identifies knowledge gaps and generates 
      potential follow-up queries.

    Analyzes the current summary to identify areas for further research 
    and generates potential follow-up queries. Uses structured output to 
    extract the follow-up query in JSON format.

    Args:
        state: Current graph state containing the running summary and 
               research topic
        config: Configuration for the runnable, including LLM provider 
                settings

    Returns:
        Dictionary with state update, including search_query key containing 
        the generated follow-up query
    """
    configurable = Configuration.from_runnable_config(config)
    # Increment the research loop count and get the reasoning model
    state["research_loop_count"] = state.get("research_loop_count", 0) + 1
    reasoning_model = state.get("reasoning_model") or configurable.reasoning_model

    # Format the prompt
    current_date = get_current_date()
    formatted_prompt = reflection_instructions.format(
        current_date=current_date,
        research_topic=get_research_topic(state["messages"]),
        summaries="nn---nn".join(state["web_research_result"]),
    )
    # init Reasoning Model
    llm = ChatGoogleGenerativeAI(
        model=reasoning_model,
        temperature=1.0,
        max_retries=2,
        api_key=os.getenv("GEMINI_API_KEY"),
    )
    result = llm.with_structured_output(Reflection).invoke(formatted_prompt)

    return {
        "is_sufficient": result.is_sufficient,
        "knowledge_gap": result.knowledge_gap,
        "follow_up_queries": result.follow_up_queries,
        "research_loop_count": state["research_loop_count"],
        "number_of_ran_queries": len(state["search_query"]),
    }

Just one line of boilerplate is required in the node:

configurable = Configuration.from_runnable_config(config)

There are quite a few “config-ish” terms floating around. Let’s unpack them one by one, starting with Configuration:

import os
from pydantic import BaseModel, Field
from typing import Any, Optional

from langchain_core.runnables import RunnableConfig

class Configuration(BaseModel):
    """The configuration for the agent."""

    query_generator_model: str = Field(
        default="gemini-2.0-flash",
        metadata={
            "description": "The name of the language model to use for the agent's query generation."
        },
    )

    reflection_model: str = Field(
        default="gemini-2.5-flash-preview-04-17",
        metadata={
            "description": "The name of the language model to use for the agent's reflection."
        },
    )

    answer_model: str = Field(
        default="gemini-2.5-pro-preview-05-06",
        metadata={
            "description": "The name of the language model to use for the agent's answer."
        },
    )

    number_of_initial_queries: int = Field(
        default=3,
        metadata={"description": "The number of initial search queries to generate."},
    )

    max_research_loops: int = Field(
        default=2,
        metadata={"description": "The maximum number of research loops to perform."},
    )

    @classmethod
    def from_runnable_config(
        cls, config: Optional[RunnableConfig] = None
    ) -> "Configuration":
        """Create a Configuration instance from a RunnableConfig."""
        configurable = (
            config["configurable"] if config and "configurable" in config else {}
        )

        # Get raw values from environment or config
        raw_values: dict[str, Any] = {
            name: os.environ.get(name.upper(), configurable.get(name))
            for name in cls.model_fields.keys()
        }

        # Filter out None values
        values = {k: v for k, v in raw_values.items() if v is not None}

        return cls(**values)

This is the custom helper class we mentioned earlier. You can see Pydantic is heavily used to define all the parameters for the agent. One thing to notice is that this class also defines an alternative constructor method from_runnable_config(). This constructor method creates a Configuration instance by pulling values from different sources while enforcing the overriding hierarchy we discussed in “💡 LangGraph’s Solution” above.

config is the input to from_runnable_config() method. Technically, it’s a RunnableConfig type, but it’s really just a dictionary with optional metadata. In LangGraph, it’s mainly used as a structured way to carry contextual information across the graph. For example, it can carry things like tags, tracing options, and — most importantly—a nested dictionary of overrides under the "configurable" key.

Finally, by calling in every node:

configurable = Configuration.from_runnable_config(config)

we create an instance of the Configuration class by combining data from three sources: first, the config["configurable"], then environment variables, and finally the class defaults. So configurable is a fully initialized, ready-to-use object that gives the node access to all relevant settings, such as configurable.reflection_model.

There is a bug in Google’s original code (both in reflection node & finalize_answer node):

reasoning_model = state.get("reasoning_model") or configurable.reasoning_model

However, reasoning_model is never defined in the configuration.py. Instead, reflect_model and answer_model should be used per configuration.py definitions. Details see PR #46.

To recap: Configuration is the definition, config is the runtime input, and configurable is the result, i.e., the parsed configuration object your node uses.

🎁 Bonus Read: What Didn’t We Cover?

LangGraph has a lot more to offer than what we can cover in this tutorial. As you build more complex agents, you’ll probably find yourself asking questions like these:

1. Can I make my application more responsive?

LangGraph supports streaming, so you can output results token by token for a real-time user experience.

2. What happens when an API call fails?

LangGraph implements retry and fallback mechanisms to handle errors.

3. How to avoid re-running expensive computations?

If some of your nodes need to conduct expensive processing, you can use LangGraph’s caching mechanism to cache the node outputs. Also, LangGraph supports checkpoints. This feature lets you save your graph’s state and pick up where you left off. This is especially important if you have a long-running process and you want to pause it and resume it later.

4. Can I implement human-in-the-loop workflows?

Yes. LangGraph has built-in support for human-in-the-loop workflows. This enables you to pause the graph and wait for user input or approval before proceeding.

5. How can I trace my agent’s behavior?

LangGraph integrates natively with LangSmith, which provides detailed traces and observability into your agent’s behaviors with minimal setup.

6. How can my agent automatically discover and use new tools?

LangGraph supports MCP (Model Context Protocol) integrations. This allows it to auto-discover and use tools that follow this open standard.

Check out the LangGraph official docs for more details.

📌Key takeaways

Let’s recap what we’ve covered in this section:

  • Structured output: Use .with_structured_output to force the AI’s response to fit a specific structure you define. This makes sure you always get clean, reliable data that your downstream steps can easily parse.
  • Tool calling: You can embed tools in the model calls so that the agent can interact with the outside world.
  • Conditional routing: This is how you build “choose your own adventure” logic. A node can decide where to go next simply by returning the name of the next node. This way, you can dynamically create loops and decision points, making your agent’s workflow much more intelligent.
  • Parallel processing: LangGraph allows you to trigger multiple steps to run at the same time. All the heavy lifting of fanning out the jobs and fanning back in to collect the results are automatically handled by LangGraph.
  • Configuration management: Instead of scattering settings throughout your code, you can use a dedicated Configuration class to manage runtime settings, environment variables, defaults, etc., in one clean, central place.
Figure 8. Various aspects of enhancing LLM agent capabilities. (Image by author)

4. Conclusions

We have covered a lot of ground in this post! Now we’ve seen how LangGraph’s core concepts come together to build a real-world research agent, let’s conclude our journey with a few key takeaways:

  • Graphs naturally describe agentic workflows. Real-world workflows involve loops, branches, and dynamic decisions. LangGraph’s graph-based architecture (nodes, edges, and state) provides a clean and intuitive way to represent and manage this complexity.
  • State is the agent’s memory. The central OverallState object is a shared whiteboard that every node in the graph can look at and write on. Together with node-specific state schemas, they create the agent’s memory system.
  • Nodes are modular components that are reusable. In LangGraph, you should build nodes with clear responsibilities, e.g., generating queries, calling tools, or routing logic. This makes the agentic system easier to test, maintain, and extend.
  • Control is in your hands. In LangGraph, you can direct the logical flow with conditional edges, enforce data reliability with structured outputs, use centralized configuration to tune parameters globally, or use Send to achieve parallel execution of tasks. Their combination gives you the power to build smart, efficient, and reliable agents.

Now with all the knowledge you have about LangGraph, what do you want to build next?

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *