Tracing the Future: How We Harness GenAI for Enhanced Security Solutions at Barracuda Networks

At Barracuda, we’re constantly innovating to stay ahead of emerging security threats in an increasingly complex digital landscape. As a company trusted by hundreds of thousands of businesses worldwide to protect their email, networks, applications, and data, we understand the critical importance of comprehensive security solutions. Barracuda exists to protect and support customers for life – how can we leverage cutting-edge AI technology to further our mission?

As Principal Engineer leading the Barracuda GenAI platform initiative, I know how important it is to provide product teams with a consolidated regional, scalable, and compliant platform with minimal overhead while enabling them to confidently build, iterate, and deploy AI solutions. Barracuda AI provides easy access to over 20 AI models, with support for the latest models added within days through stable APIs. We rely on Databricks’ advanced tracing capabilities to monitor, troubleshoot, and improve our AI platform and are actively working on integrating Databricks’ LLMOps solutions, such as LLM Judge Metrics and Monitoring, to simplify LLMOps for product teams using Barracuda AI.

Power of Tracing for Barracuda AI

In cybersecurity, understanding exactly how AI models make decisions is crucial for both effectiveness and trust. Tracing provides unprecedented visibility into our AI applications, allowing us to track every step of the decision-making process from initial request to final response.

When we saw MLflow LangChain autologging at Databricks Data + AI Summit, we integrated easily and have been reaping benefits ever since.

Tracing enables us to:

Follow the complete journey of a request through our system
Identify bottlenecks and performance issues in real-time
Debug complex interactions between multiple AI components
Ensure consistent behavior across different environments
Provide audit trails for security and compliance purposes

By implementing comprehensive tracing across our platform, we can quickly identify and resolve issues, optimize performance, and ensure our security solutions are functioning at their best even as attack patterns evolve.

Our Technical Implementation

Barracuda AI is built on a foundation of flexible, interoperable technologies designed to maximize performance while minimizing overhead.

Barracuda AI API Infrastructure

Our API offers OpenAI-compatible and LangChain AIMessage/AIMessageChunk endpoints (with more coming soon) that enable seamless integration with existing tools and workflows. This compatibility layer allows product teams to iterate and experiment without worrying about deployments or code changes across model or agentic frameworks. Behind the scenes, we carefully wrap interfaces and handle translations through a regional, scalable API gateway deployed via Kubernetes clusters and built using FastAPI served by Uvicorn, ensuring consistent behavior and performance while maintaining detailed tracing.

Barracuda AI Frontend

Barracuda AI also has a secure, SSO-authenticated Next.js front-end application for wider AI usage across the company.

Monitoring and Logging

MLflow autologging capabilities automatically track all model interactions without requiring extensive code changes. This “set it and forget it” approach to tracing ensures we capture comprehensive data even as our platform evolves.

Data Processing and Analysis

Databricks integration offers powerful analytics and monitoring capabilities that allow us to process massive amounts of trace data efficiently. For recent traces (within the last hour), we use the MLflow UI for immediate analysis. For older exported traces, we’ve built views with DBT for our Databricks Genie space, allowing us to extract meaningful insights and analytics using natural language.

Day-to-Day Usage Scenarios

Our tracing infrastructure supports a variety of critical use cases that help us maintain security excellence:

Troubleshooting Complex Issues

When users report unusual behavior, our developers can immediately look up the associated request_id and retrieve the corresponding trace. This allows them to trace the entire journey of that request through our system, identifying exactly where things went wrong.

Comprehensive Performance Monitoring

We’ve built sophisticated dashboards and daily reports that give us visibility into:

Usage patterns by team and model
Cost analysis and optimization opportunities
Token usage tracking for efficiency
Model performance metrics and latency statistics

These dashboards allow us to make data-driven decisions about resource allocation and identify opportunities for optimization.

Abuse Detection and Prevention

Security is about protecting against both external threats and potential internal vulnerabilities. Our tracing system helps identify misuse scenarios, such as when development keys are accidentally deployed in production environments.

Managing Large-Scale Data

Handling trace data at scale presents unique challenges. For very large traces containing massive context loads (such as extensive code bases or large copies of logs), we’ve implemented intelligent truncation strategies to stay within the 16MB JSON limit of Databricks’ VARIANT type while preserving the most critical information.

We also prioritize data privacy. For traces at rest in Delta Lake Tables, we remove personally identifiable information (PII) for data protection purposes while preserving the analytical value of our trace data.

Future Directions

We’re actively exploring several exciting enhancements to our Barracuda AI platform:

Advanced Evaluation Capabilities

Using evaluation and monitoring APIs is high on our priority list and on our hackathon roadmap. We plan to expose these evaluation capabilities through our platform APIs, allowing teams to measure and improve the quality of their AI-powered security solutions.

Democratized Data Access

Use Databricks Delta Sharing to allow teams to run their own analyses on trace data. This capability will empower them to derive insights and drive changes specific to their applications.

Enhanced Offline Evaluation

We’re developing capabilities for offline evaluation of trace data, enabling teams to test hypotheses and improvements without impacting production systems. This approach accelerates innovation while maintaining the stability of our security infrastructure.

Expanded Monitoring

As we incorporate new features and enhancements in our GenAI platform, we’re exploring ways to enhance our monitoring capabilities. We want to accelerate product innovation, like deploying AI agents on Databricks that integrate with our GenAI platform, and expand the visibility of our tracing infrastructure.

Conclusion

Barracuda AI is a foundation for future innovation at Barracuda, giving product teams the flexibility, power, and visibility they need to build the next generation of security solutions. By centralizing AI capabilities, streamlining observability through tracing, and harnessing the scalable infrastructure provided by Databricks, Barracuda AI has become a cornerstone that empowers many of our product initiatives. As the threat landscape evolves, we remain committed to protecting customers for life by continually refining and expanding this AI foundation, ensuring every Barracuda solution benefits from robust, agile, and future-ready innovation.