At a leading manufacturer of diagnostic healthcare products, contract management across the EMEA region presented a significant challenge. With contracts distributed across multiple regional platforms and managed individually by contract managers, extracting critical data was a manual, labour-intensive process that could take up to 2 days per contract. This fragmented approach hindered sales performance, increased operational costs, and slowed strategic decision-making.
Working with Advancing Analytics and Databricks, the company implemented an innovative Generative AI solution that has transformed their contract analysis process, delivering remarkable efficiency gains and business insights. Here’s how they did it.
The Challenge: Contract Complexity Across EMEA
The company’s extensive product portfolio spans diagnostic products used globally. However, their contract management process was holding them back:
- Contract data was distributed across multiple regional platforms
- No centralised or standardised approach to contract management
- Manual review process required contract managers to examine entire documents
- Contracts are a mix of digital documents, hand-written documents, and scanned documents
- Extraction of key attributes took up to 2 days per contract
- Multilingual contracts (English, French, and German) added complexity
“Our contract managers were spending nearly 2 days on each contract just to extract basic information,” explains a company executive. “With hundreds of contracts across EMEA, this manual approach was unsustainable and prevented us from gaining the insights we needed to make strategic decisions.”
The Solution: A Gen AI-Powered Pipeline Built by Advancing Analytics on Databricks
Partnering with Advancing Analytics, the company stood up a Retrieval-Augmented Generation (RAG) pipeline that runs end-to-end in Azure Databricks:
- Automated ingestion from SharePoint lands PDFs in Delta tables governed by Unity Catalog with full audit trails.
- Azure AI Document Intelligence performs OCR across scans, handwriting and mixed languages.
- French and German, and any non-English text is routed through translation models for consistent downstream processing.
- Chunks are embedded and indexed with Mosaic AI Vector Search, giving millisecond similarity look-ups.
- An ensemble of LLM endpoints (Databricks-hosted and Azure OpenAI) pulls the right chunks and extracts ~100 attributes, hardening output with a JSON-correction chain.
New files are handled by a custom Unity Catalog based queue system with full traceability of queue properties, items, run times, and failures. This enables the system to balance resources effectively whilst also providing a scalable queue of near indefinite size. It also ensures that the processing rates and outcomes of all input files remains fully visible and traceable.
A novel ensemble approach: Accuracy you can trust
Most extraction pipelines trust a single model. We don’t. Inspired by the 2024 research paper Probabilistic Consensus through Ensemble Validation (arXiv:2411.06535), we run three LLMs in parallel and accept a value only when at least two agree. The payoff is dramatic:
- Trusted: catches hallucinations without slowing throughput
- Model-agnostic: swap in cheaper or domain-specific models and still keep quality high
- Audit-grade traceability: every disagreement is logged for SME review
We believe this is one of the first ensemble-validated GenAI solutions running in production on the Databricks lakehouse for multilingual, regulated contracts.
Reliable workflows and non-disruptive updates
The solution’s workflow is fully automated, from document ingestion through SharePoint to final output delivery via Excel files and custom dashboards. Databricks Workflows enable this process to occur at a regular cadence, resulting in predictable traffic rates which aid with resource provisioning and cost predictions.
Updates and improvements to this process propagate from development to production environments via robust CI/CD pipelines, centred around Databricks Asset Bundles. This ensures notebooks, workflows, and resources remain in sync and seamlessly update without risking interruptions to ongoing production jobs.
Real Business Impact
The implementation of this Databricks-powered solution by Advancing Analytics has delivered significant business value:
- 95% reduction in processing time: Contract analysis that previously took up to 2 days now completes in hours
- Improved accuracy: The solution achieves approximately 90% accuracy, validated by SMEs
- Enhanced visibility: Centralised database of key customer attributes improves collaboration across regional teams
- Scalability: The solution efficiently handles both extensive document backlogs and ad-hoc processing requirements
- Multilingual capability: Seamless processing of contracts in English, French, and German and up to 15 other languages
For this company, this solution translates to millions in annual savings, accelerated deal cycles, and a powerful new capability: querying every EMEA contract instantly, using natural language.
Subject matter experts can now ask the chatbot for insights and attributes that were previously buried in documents or simply not captured in standard tables.
What’s more, the process is 92% faster and because it’s fully automated, SMEs spend virtually no time managing it. Instead, they can focus on higher-value work while the system handles the heavy lifting.
Why it worked
- One platform, zero silos: Databricks unified ETL, vector search, LLM serving and governance
- Hybrid model strategy: Swap models, using Mosaic AI model serving endpoints, as cost or accuracy dictates without rewiring code
- Human-in-the-loop: SMEs validated early runs and fed edge cases back into prompt templates, lifting precision significantly
- Deployment discipline: Asset Bundles and Workflows deliver CI/CD to ensure successful change propagation between environments without interruption live processes
Looking Forward: Expanding the Impact
With the success of the Contract Analysis solution, the company is now exploring additional applications of Generative AI across their operations. The scalable architecture built by Advancing Analytics on Databricks provides a foundation for future innovations, with potential applications in product development, regulatory compliance, and customer service.
This implementation demonstrates how organisations can leverage Advancing Analytics’ expertise with Databricks and Azure to transform complex, manual processes into efficient, automated workflows that deliver real business value. By combining the power of Generative AI with robust data management and governance, companies can unlock insights previously hidden in unstructured data, driving better decision-making and operational excellence.
This project is the blueprint for how data, AI and domain expertise come together. We didn’t just speed up a process, we unlocked a strategic asset. — Dr. Gavita Regunath, Chief AI Officer, Advancing Analytics
As businesses continue to grapple with increasing volumes of complex documents, this case study offers a compelling blueprint for how Advancing Analytics and Databricks can help turn document challenges into strategic advantages.
Three take-aways for data & AI leaders
- Start with the business pain: Cycle-time, cost and risk guided every design choice
- Build governance in, not on: Unity Catalog and Delta Lake kept security teams happy from day one
- Treat GenAI as a platform capability: With Vector Search, AI Functions and Mosaic AI in place, new document-heavy use cases are weeks, not months, away