ATANT: An Evaluation Framework for AI Continuity

arXiv cs.AI / 4/10/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper introduces ATANT, an open, system-agnostic evaluation framework that measures “AI continuity” (persistence, updating, disambiguation, and reconstruction of meaningful context over time) rather than just using memory components like RAG or long-context windows.
Continuity is defined via seven required properties, alongside a 10-checkpoint evaluation methodology that can run without an LLM inside the evaluation loop to avoid evaluation-time bias.
ATANT provides a narrative test corpus of 250 life-domain stories with 1,835 verification questions, enabling repeatable benchmarking across scenarios.
A reference implementation is evaluated over multiple suite iterations, improving from 58% with a legacy architecture to 100% in isolated testing and achieving 96% at the 250-story cumulative scale, where cross-contamination is a key failure mode.
The framework, example stories, and protocol are published on GitHub, with the full 250-story corpus planned for incremental release.

Abstract

We present ATANT (Automated Test for Acceptance of Narrative Truth), an open evaluation framework for measuring continuity in AI systems: the ability to persist, update, disambiguate, and reconstruct meaningful context across time. While the AI industry has produced memory components (RAG pipelines, vector databases, long context windows, profile layers), no published framework formally defines or measures whether these components produce genuine continuity. We define continuity as a system property with 7 required properties, introduce a 10-checkpoint evaluation methodology that operates without an LLM in the evaluation loop, and present a narrative test corpus of 250 stories comprising 1,835 verification questions across 6 life domains. We evaluate a reference implementation across 5 test suite iterations, progressing from 58% (legacy architecture) to 100% in isolated mode (250 stories) and 100% in 50-story cumulative mode, with 96% at 250-story cumulative scale. The cumulative result is the primary measure: when 250 distinct life narratives coexist in the same database, the system must retrieve the correct fact for the correct context without cross-contamination. ATANT is system-agnostic, model-independent, and designed as a sequenced methodology for building and validating continuity systems. The framework specification, example stories, and evaluation protocol are available at https://github.com/Kenotic-Labs/ATANT. The full 250-story corpus will be released incrementally.

Black Hat USA

AI Business

Black Hat Asia

AI Business

v0.20.5

Ollama Releases

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Dev.to

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

Reddit r/LocalLLaMA

ATANT: An Evaluation Framework for AI Continuity

Key Points

Abstract

Related Articles

Black Hat USA

Black Hat Asia

v0.20.5

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer