Recent developments in LLM agents have largely focused on enhancing capabilities in complex task execution. However, a critical dimension remains underexplored: memory—the capacity of agents to persist, recall, and reason over user-specific information across time. Without persistent memory, most LLM-based agents remain stateless, unable to build context beyond a single prompt, limiting their usefulness in real-world settings where consistency and personalization are essential.
To address this, MIRIX AI introduces MIRIX, a modular multi-agent memory system explicitly designed to enable robust long-term memory for LLM-based agents. Unlike flat, purely text-centric systems, MIRIX integrates structured memory types across modalities—including visual input—and is built upon a coordinated multi-agent architecture for memory management.
Core Architecture and Memory Composition
MIRIX features six specialized, compositional memory components, each governed by a corresponding Memory Manager:
- Core Memory: Stores persistent agent and user information, segmented into ‘persona’ (agent profile, tone, and behavior) and ‘human’ (user facts such as name, preferences, and relationships).
- Episodic Memory: Captures time-stamped events and user interactions with structured attributes like event_type, summary, details, actors, and timestamp.
- Semantic Memory: Encodes abstract concepts, knowledge graphs, and named entities, with entries organized by type, summary, details, and source.
- Procedural Memory: Contains structured workflows and task sequences using clearly defined steps and descriptions, often formatted as JSON for easy manipulation.
- Resource Memory: Maintains references to external documents, images, and audio, recorded by title, summary, resource type, and content or link for contextual continuity.
- Knowledge Vault: Secures verbatim facts and sensitive information such as credentials, contacts, and API keys with strict access controls and sensitivity labels.
A Meta Memory Manager orchestrates the activities of these six specialized managers, enabling intelligent message routing, hierarchical storage, and memory-specific retrieval operations. Additional agents—with roles like chat and interface—collaborate within this architecture.
Active Retrieval and Interaction Pipeline
A core innovation of MIRIX is its Active Retrieval mechanism. On user input, the system first autonomously infers a topic, then retrieves relevant memory entries from all six components, and finally tags the retrieved data for contextual injection into the resulting system prompt. This process decreases reliance on outdated parametric model knowledge and provides much stronger answer grounding.
Multiple retrieval strategies—including embedding_match
, bm25_match
, and string_match
—are available, ensuring accurate and context-aware access to memory. The architecture allows for further expansion of retrieval tools as needed.
System Implementation and Application
MIRIX is deployed as a cross-platform assistant application developed with React-Electron (for the UI) and Uvicorn (for the backend API). The assistant monitors screen activity by capturing screenshots every 1.5 seconds; only non-redundant screens are kept, and memory updates are triggered in batches after collecting 20 unique screenshots (approximately once per minute). Uploads to the Gemini API are streaming, enabling efficient visual data processing and sub-5-second latency for updating memory from visual inputs.
Users interact through a chat interface, which dynamically draws on the agent’s memory components to generate context-aware, personalized responses. Semantic and procedural memories are rendered as expandable trees or lists, providing transparency and allowing users to audit and inspect what the agent “remembers” about them.
Evaluation on Multimodal and Conversational Benchmarks
MIRIX is validated on two rigorous tasks:
- ScreenshotVQA: A visual question-answering benchmark requiring persistent, long-term memory over high-resolution screenshots. MIRIX outperforms retrieval-augmented generation (RAG) baselines—specifically SigLIP and Gemini—by 35% in LLM-as-a-Judge accuracy, while reducing retrieval storage needs by 99.9% compared to text-heavy methods.
- LOCOMO: A textual benchmark assessing long-form conversation memory. MIRIX achieves 85.38% average accuracy, outperforming strong open-source systems such as LangMem and Mem0 by over 8 points, and approaching full-context sequence upper bounds.
The modular design enables high performance across both multimodal and text-only inference domains.
Use Cases: Wearables and the Memory Marketplace
MIRIX is designed for extensibility, with support for lightweight AI wearables—including smart glasses and pins—via its efficient, modular architecture. Hybrid deployment allows both on-device and cloud-based memory handling, while practical applications include real-time meeting summarization, granular location and context recall, and dynamic modeling of user habits.
A visionary feature of MIRIX is the Memory Marketplace: a decentralized ecosystem enabling secure memory sharing, monetization, and collaborative AI personalization between users. The Marketplace is designed with fine-grained privacy controls, end-to-end encryption, and decentralized storage to ensure data sovereignty and user self-ownership.
Conclusion
MIRIX represents a significant step toward endowing LLM-based agents with human-like memory. Its structured, multi-agent compositional architecture enables robust memory abstraction, multimodal support, and real-time, contextually grounded reasoning. With empirical gains across challenging benchmarks and an accessible, cross-platform application interface, MIRIX sets a new standard for memory-augmented AI systems.
FAQs
1. What makes MIRIX different from existing memory systems like Mem0 or Zep?
MIRIX introduces multi-component, compositional memory (beyond text passage storage), multimodal support (including vision), and a multi-agent retrieval architecture for more scalable, accurate, and context-rich long-term memory management.
2. How does MIRIX ensure low-latency memory updates from visual inputs?
By using streaming uploads in combination with Gemini APIs, MIRIX is able to update screenshot-based visual memory with under 5 seconds latency, even during active user sessions.
3. Is MIRIX compatible with closed-source LLMs like GPT-4?
Yes. Since MIRIX operates as an external system (and not as a model plugin or retrainer), it can augment any LLM, regardless of its base architecture or licensing, including GPT-4, Gemini, and other proprietary models.
Check out the Paper, GitHub and Project. All credit for this research goes to the researchers of this project.
Sponsorship Opportunity: Reach the most influential AI developers in US and Europe. 1M+ monthly readers, 500K+ community builders, infinite possibilities. [Explore Sponsorship]
Sajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.