Cognis: Context-Aware Memory for Conversational AI Agents

arXiv cs.CL / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The paper introduces Lyzr Cognis, a unified, context-aware memory architecture designed to give conversational LLM agents persistent, cross-session memory for better long-term personalization.
  • Cognis uses a multi-stage retrieval pipeline that combines OpenSearch BM25 keyword matching and Matryoshka-based vector similarity search, fused with Reciprocal Rank Fusion for more robust memory retrieval.
  • A context-aware ingestion pipeline first retrieves existing memories before extraction, enabling intelligent version tracking to preserve full memory history while keeping the backend store consistent.
  • The approach adds temporal boosting for time-sensitive queries and uses a BGE-2 cross-encoder reranker to improve the quality of the final retrieved results.
  • Experiments on LoCoMo and LongMemEval across eight answer-generation models show state-of-the-art performance, and the system is open-source with production deployment for conversational AI applications.

Abstract

LLM agents lack persistent memory, causing conversations to reset each session and preventing personalization over time. We present Lyzr Cognis, a unified memory architecture for conversational AI agents that addresses this limitation through a multi-stage retrieval pipeline. Cognis combines a dual-store backend pairing OpenSearch BM25 keyword matching with Matryoshka vector similarity search, fused via Reciprocal Rank Fusion. Its context-aware ingestion pipeline retrieves existing memories before extraction, enabling intelligent version tracking that preserves full memory history while keeping the store consistent. Temporal boosting enhances time-sensitive queries, and a BGE-2 cross-encoder reranker refines final result quality. We evaluate Cognis on two independent benchmarks -- LoCoMo and LongMemEval -- across eight answer generation models, demonstrating state-of-the-art performance on both. The system is open-source and deployed in production serving conversational AI applications.