HTM-EAR: Importance-Preserving Tiered Memory with Hybrid Routing under Saturation

arXiv cs.AI / 3/12/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

HTM-EAR is a hierarchical memory substrate combining HNSW-based working memory (L1) with archival storage (L2), using importance-aware eviction and hybrid routing.
When L1 reaches capacity, items are evicted based on a weighted score of their importance and usage to preserve essential information.
Queries are resolved in L1 first; if similarity or entity coverage is insufficient, retrieval falls back to L2, and candidates are re-ranked with a cross-encoder.
In saturation experiments, the full HTM-EAR preserves active-query precision (MRR = 1.000) and approaches oracle performance while enabling controlled forgetting of stale history.
On real-world BGL logs, HTM-EAR achieves MRR 0.336 (near the oracle 0.370) and outperforms LRU (0.069), with code publicly available on GitHub.

Abstract

Memory constraints in long-running agents require structured management of accumulated facts while preserving essential information under bounded context limits. We introduce HTM-EAR, a hierarchical tiered memory substrate that integrates HNSW-based working memory (L1) with archival storage (L2), combining importance-aware eviction and hybrid routing. When L1 reaches capacity, items are evicted using a weighted score of importance and usage. Queries are first resolved in L1; if similarity or entity coverage is insufficient, retrieval falls back to L2, and candidates are re-ranked using a cross-encoder. We evaluate the system under sustained saturation (15,000 facts; L1 capacity 500; L2 capacity 5000) using synthetic streams across five random seeds and real BGL system logs. Ablation studies compare the full system against variants without cross-encoder re-ranking, without routing gates, with LRU eviction, and an oracle with unbounded memory. Under saturation, the full model preserves active-query precision (MRR = 1.000) while enabling controlled forgetting of stale history, approaching oracle active performance (0.997 +/- 0.003). In contrast, LRU minimizes latency (21.1 ms) but permanently evicts 2416 essential facts. On BGL logs, the full system achieves MRR 0.336, close to the oracle (0.370), while LRU drops to 0.069. Code is publicly available at: https://github.com/shubham-61291/HTM-EAR

[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning

Reddit r/MachineLearning

How I Built an AI SDR Agent That Finds Leads and Writes Personalized Cold Emails

Dev.to

Complete Guide: How To Make Money With Ai

Dev.to

I Analyzed My Portfolio with AI and Scored 53/100 — Here's How I Fixed It to 85+

Dev.to

The Demethylation

Dev.to

HTM-EAR: Importance-Preserving Tiered Memory with Hybrid Routing under Saturation

Key Points

Abstract

Related Articles

[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning

How I Built an AI SDR Agent That Finds Leads and Writes Personalized Cold Emails

Complete Guide: How To Make Money With Ai

I Analyzed My Portfolio with AI and Scored 53/100 — Here's How I Fixed It to 85+

The Demethylation

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer