Cognis: Context-Aware Memory for Conversational AI Agents

arXiv cs.CL / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The paper introduces Lyzr Cognis, a unified, context-aware memory architecture designed to give conversational LLM agents persistent, cross-session memory for better long-term personalization.
Cognis uses a multi-stage retrieval pipeline that combines OpenSearch BM25 keyword matching and Matryoshka-based vector similarity search, fused with Reciprocal Rank Fusion for more robust memory retrieval.
A context-aware ingestion pipeline first retrieves existing memories before extraction, enabling intelligent version tracking to preserve full memory history while keeping the backend store consistent.
The approach adds temporal boosting for time-sensitive queries and uses a BGE-2 cross-encoder reranker to improve the quality of the final retrieved results.
Experiments on LoCoMo and LongMemEval across eight answer-generation models show state-of-the-art performance, and the system is open-source with production deployment for conversational AI applications.

Abstract

LLM agents lack persistent memory, causing conversations to reset each session and preventing personalization over time. We present Lyzr Cognis, a unified memory architecture for conversational AI agents that addresses this limitation through a multi-stage retrieval pipeline. Cognis combines a dual-store backend pairing OpenSearch BM25 keyword matching with Matryoshka vector similarity search, fused via Reciprocal Rank Fusion. Its context-aware ingestion pipeline retrieves existing memories before extraction, enabling intelligent version tracking that preserves full memory history while keeping the store consistent. Temporal boosting enhances time-sensitive queries, and a BGE-2 cross-encoder reranker refines final result quality. We evaluate Cognis on two independent benchmarks -- LoCoMo and LongMemEval -- across eight answer generation models, demonstrating state-of-the-art performance on both. The system is open-source and deployed in production serving conversational AI applications.