of every data-driven application, product, or dashboard lies one critical component: the database. These systems have long been the foundation for storing, managing, and querying structured data — whether relational, time-series, or distributed across cloud platforms.
To interact with these systems, we’ve relied on SQL (Structured Query Language), a standardized and incredibly powerful way to retrieve, manipulate, and analyze data. SQL is expressive, precise, and optimized for performance. Yet for many users — especially those new to data — SQL can be intimidating. Remembering syntax, understanding joins, and navigating complex schemas can be a barrier to productivity.
But the idea of querying databases in natural languages isn’t new! In fact, research into Natural Language Interfaces to Databases (NLIDBs) dates back to the 1970s. Projects like LUNAR and PRECISE explored how users could ask questions in plain English and receive structured answers powered by SQL. Despite great academic interest, these early systems struggled with generalization, ambiguity, and scalability. Back in 2029, PowerBI also shown us an early glimpse of natural language data querying back in 2019. While the Q&A feature was promising, it struggled with complex queries, required precise phrasing, and depended heavily on how clean the data model was. In the end, it lacked the kind of reasoning and flexibility users expect from a true assistant!
But what about 2025? Do we know have the technology to make it happen?
Can LLMs do now what we were not able to do before?
Based on what we know about LLMs and their capabilities, we also understand that they along with the concept of AI Agents are uniquely equipped to bridge the gap between technical SQL and natural human queries. They’re excellent at interpreting vague questions, generating syntactically correct SQL, and adapting to different user intents. This makes them ideal for conversational interfaces to data. However, LLMs are not deterministic; they heavily rely on probabilist inference, which can lead to hallucinations, incorrect assumptions or
This is where AI Agents become relevant. By wrapping an LLM inside a structured system — one that includes memory, tools, validation layers, and a defined purpose — we can reduce the downsides of probabilistic outputs. The agent becomes more than just a text generator: it becomes a collaborator that understands the environment it’s operating in. Combined with proper strategies for grounding, schema inspection, and user intent detection, agents allow us to build systems that are far more reliable than prompt-only setups.
And that’s the foundation of this short tutorial: How to build your first AI Agent assistant to query your data catalog!
Step-by-Step Guide to Creating a Databricks Catalog Assistant
First and foremost, we need to pick our tech stack. We’ll need a model provider, a tool to help us enforce structure in our agent’s flow, connectors to our databases, and a simple UI to power the chat experience!
- OpenAI (gpt-4): Best-in-class for natural language understanding, reasoning, and SQL generation.
- Pydantic AI: Adds structure to LLM responses. No hallucinations or vague answers — just clean, schema-validated outputs.
- Streamlit: Quickly build a responsive chat interface with built-in LLM and feedback components.
- Databricks SQL Connector: Access your Databricks workspace’s catalog, schema, and query results in real time.
And well, let’s not forget — this is just a small, simple project. If you were planning to deploy it in production, across multiple users and spanning several databases, you’d definitely need to think about other concerns: scalability, access control, identity management, use-case design, user experience, data privacy… and the list goes on.
1. Environment setup
Before we dive into coding, let’s get our development environment ready. This step ensures that all the required packages are installed and isolated in a clean virtual environment. This avoids version conflicts and keeps our project organized.
conda create -n sql-agent python=3.12
conda activate sql-agent
pip install pydantic-ai openai streamlit databricks-sql-connector
2. Create the tools and logic to access Databricks Data Catalog information
While building a conversational SQL agent might seem like an LLM problem, it’s actually a data problem first. You need metadata, column-level context, constraints, and ideally a profiling layer to know what’s safe to query and how to interpret the results. This is part of what we call the data-centric ai stack (might sound too 2021 but I promise you it is still super relevant!!) – one where profiling, quality, and schema validation come before prompt engineering.
In this context, and because the agent needs context to reason about your data, this step includes setting up a connection to your Databricks workspace and programmatically extract the structure of your Data Catalog. This metadata will serve as the foundation for generating accurate SQL queries.
def set_connection(server_hostname: str, http_path: str, access_token: str):
connection = sql.connect(
server_hostname=server_hostname,
http_path=http_path,
access_token=access_token
)
return connection
The full code for the metadata connector can be found here.
3. Build the SQL Agent with Pydantic AI
Here is were we define our AI agent. We’re using pydantic-ai
to enforce structured outputs, in this case, we want to ensure that we will always receive a clean SQL query from the LLM. This makes the agent safe to use in applications and reduces the chance of vague and more importantly, unparseable code.
To define the agent, we start by specifying an output schema with Pydantic, in this case, a single field code
representing the SQL query. Then, we use the Agent
class to wire together the system prompt, model name, and output type.
from pydantic import BaseModel
from pydantic_ai.agent import Agent
from pydantic_ai.messages import ModelResponse, TextPart
# ==== Output schema ====
class CatalogQuery(BaseModel):
code: str
# ==== Agent Factory ====
def catalog_metadata_agent(system_prompt: str, model: str="openai:gpt-4o") -> Agent:
return Agent(
model=model,
system_prompt=system_prompt,
output_type=CatalogQuery,
instrument=True
)
# ==== Response Adapter ====
def to_model_response(output: CatalogQuery, timestamp: str) -> ModelResponse:
return ModelResponse(
parts=[TextPart(f"```sqln{output.code}n```")],
timestamp=timestamp
)
The system prompt provides instructions and examples to guide the LLM’s behavior, while instrument=True
enables tracing and observability for debugging or evaluation.
The system prompt itself was designed to guide the agent’s behavior. It clearly states the assistant’s objective (writing SQL queries for Unity Catalog), includes the metadata context to ground its reasoning, and provides concrete examples to illustrate the expected output format. This structure helps the LLM model to stay focused, reduce ambiguity, and return predictable, valid responses.
4. Build the Streamlit Chat Interface
Now that we have the foundations for our SQL Agent it is time to make it interactive. Leveraging Streamlit we will now create a simple front-end where we can ask natural language questions and receive generated SQL queries in real-time.
Luckily, Streamlit already gives us powerful building blocks to create LLM-powered chat experiences. If you’re curious, here’s a great tutorial that walks through the whole process in detail.
You can find the full code for this tutorial here and you can try the application on Streamlit Community Cloud.
Final Thoughts
In this tutorial, you’ve learned to walk through the initial mechanics of building a simple AI agent. The focus was on creating a lightweight prototype to help you understand how to structure agent flows and experiment with modern AI tooling.
But, if you were to take this further into production, here are a few things to consider:
- Hallucinations are real, and you can’t be sure wether the return SQL is correct. Leverage SQL static analysis to validate the output and implement retry mechanism, ideally more deterministic;
- Leverage schema-aware tools to sanity-check the table names and columns.
- Add fallback flows when a query fails — e.g., “Did you mean this table instead?”
- Make it stateful
- All things infrastructure, identify managements, and operations of the system.
At the end of the day, what makes these systems effective isn’t just the model, it’s the data that grounds it. Clean metadata, well-scoped prompts, and contextual validation are all part of the data quality stack that turns generative interfaces into trustworthy agents.