For many organizations, the biggest challenge with AI agents built over unstructured data isn’t the model, but it’s the context. If the agent can’t retrieve the right information, even the most advanced model will miss key details and give incomplete or incorrect answers.
We’re introducing reranking in Mosaic AI Vector Search, now in Public Preview. With a single parameter, you can boost retrieval accuracy by an average of 15 percentage points on our enterprise benchmarks. This means higher-quality answers, better reasoning, and more consistent agent performance—without extra infrastructure or complex setup.
What Is Reranking?
Reranking is a technique that improves agent quality by ensuring the agent gets the most relevant data to perform its task. While vector databases excel at quickly finding relevant documents from millions of candidates, reranking applies deeper contextual understanding to ensure the most semantically relevant results appear at the top. This two-stage approach—fast retrieval followed by intelligent reordering—has become essential for RAG agent systems where quality matters.
Why We Added Reranking
You might be building internally-facing chat agents to answer questions about your documents. Or you might be building agents that generate reports for your customers. Either way, if you want to build agents that can accurately use your unstructured data, then quality is tied to retrieval. Reranking is how Vector Search customers boost the quality of their retrieval and thereby boost the quality of their RAG agents.
From customer feedback, we’ve seen two common issues:
- Agents can miss critical context buried in large sets of unstructured documents. The “right” passage rarely sits at the very top of the retrieved results from a vector database.
- Homegrown reranking systems significantly increase agent quality, but they take weeks to build and then need significant maintenance.
By making reranking a native Vector Search feature, you can use your governed enterprise data to surface the most relevant information without extra engineering.
The reranker feature helped elevate our Lexi chatbot from functioning like a high school student to performing like a law school graduate. We have seen transformative gains in how our systems understand, reason over, and generate content from legal documents-unlocking insights that were previously buried in unstructured data. — David Brady, Senior Director, G3 Enterprises
A Substantial Quality Improvement Over Baselines
Our research team achieved a breakthrough by building a novel compound AI system for agent workloads. On our enterprise benchmarks, the system retrieves the correct answer within its top 10 results 89% of the time (recall@10), a 15-point improvement over our baseline (74%) and 10 points higher than leading cloud alternatives (79%). Crucially, our reranker delivers this quality with latencies as low as 1.5 seconds, whereas contemporary systems often take several seconds—or even minutes—to return high-quality answers.
Easy, High-Quality Retrieval
Enable enterprise-grade reranking in minutes, not weeks. Teams typically spend weeks researching models, deploying infrastructure, and writing custom logic. In contrast, enabling reranking for Vector Search requires just one additional parameter in your Vector Search query to instantly get higher quality retrieval for your agents. No model serving endpoints to manage, no custom wrappers to maintain, no complex configurations to tune.
By specifying multiple columns in columns_to_rerank, you’re taking the reranker’s quality to the next level by giving it access to metadata beyond just the main text. In this example, the reranker uses contract summaries and category information to better understand context and improve the relevance of search results.
Optimized for Agent Performance
Speed meets quality for real-time AI, agentic applications. Our research team optimized this compound AI system to rerank 50 results in as little as 1.5 seconds. This makes it highly effective for agent systems that demand both accuracy and responsiveness. This breakthrough performance enables sophisticated retrieval strategies without compromising user experience.
When to use Reranking?
We recommend testing reranking for any RAG agent use case. Typically, customers will see massive quality gains when their current systems do find the right answer somewhere in the top 50 results from retrieval, but struggle to surface it within the top 10. In technical terms, this means customers with low recall@10 but high recall@50.
Enhanced Developer Experience
Beyond core reranking capabilities, we’re making it easier than ever to build and deploy high-quality retrieval systems.
LangChain Integration: Reranker works seamlessly with VectorSearchRetrieverTool, our official LangChain integration for Vector Search. Teams building RAG agents with VectorSearchRetrieverTool can benefit from higher quality retrieval—no code changes required.
Transparent Performance Metrics: Reranker latency is now included in query debug info, giving you a complete end-to-end breakdown of your query performance.
response latency breakdown in milliseconds
Flexible Column Selection: Rerank based on any combination of text and metadata columns, allowing you to leverage all available domain context—from document summaries to categories to custom metadata—for high relevance.
Start Building Today
Reranker in Vector Search transforms how you build AI applications. With zero infrastructure overhead and seamless integration, you can finally deliver the retrieval quality your users deserve.
Ready to get started?