Storing Less, Finding More: How Novelty Filtering Improves Cross-Modal Retrieval on Edge Cameras
arXiv cs.CV / 4/1/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that always-on edge cameras suffer cross-modal retrieval degradation because redundant frames crowd out correct matches in top-k results.
- It proposes a streaming retrieval architecture that uses an on-device epsilon-net novelty filter to keep only semantically novel frames, forming a denoised embedding index.
- To address alignment limitations from using a compact on-device encoder, the system adds a cross-modal adapter plus a cloud re-ranker.
- In single-pass experiments, the approach outperforms several offline frame selection baselines (k-means, farthest-point, uniform, random) across eight vision-language models on two egocentric datasets (AEA and EPIC-KITCHENS).
- The method reports strong retrieval quality (45.6% Hit@5 on held-out data) while running with an 8M on-device encoder at very low estimated power (2.7 mW).
Related Articles

Black Hat Asia
AI Business

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs
Dev.to

I Built an AI Agent That Can Write Its Own Tools When It Gets Stuck
Dev.to

How to Create AI Videos in 20 Minutes (3 Free Tools, Zero Experience)
Dev.to

Agent Self-Discovery: How AI Agents Find Their Own Wallets
Dev.to