AI Navigate

Releasing an open-source RAG attack + defense lab for local stacks (ChromaDB + LM Studio) — runs fully local, no cloud, consumer hardware

Reddit r/LocalLLaMA / 3/18/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The project is an open-source lab to measure RAG knowledge-base poisoning on a fully local stack (ChromaDB + LM Studio with Qwen2.5-7B) running on a MacBook Pro.
  • The lab reports about 95% poisoning success on undefended ChromaDB at the retrieval layer, with no jailbreaks, no model access, and no prompt manipulation.
  • A default chunking setup of 512-token chunks with 200-token overlap doubles retrieval probability for documents at chunk boundaries.
  • Embedding-based ingestion defenses, rather than output filtering, reduce poisoning from 95% to 20%, and using all five defenses lowers residual poisoning to about 10%.
  • The repository at github.com/aminrj-labs/mcp-attack-labs includes the attack, a hardened version, and measurements for each defense layer.
Releasing an open-source RAG attack + defense lab for local stacks (ChromaDB + LM Studio) — runs fully local, no cloud, consumer hardware

Built a lab to measure how bad RAG knowledge base poisoning actually is on a default local setup — and what defenses actually move the number.

Stack: ChromaDB + LM Studio (Qwen2.5-7B), standard LangChain-style chunking, no API keys, runs on a MacBook Pro.

What the lab measures:

Knowledge base poisoning against undefended ChromaDB: 95% success. The attack works at the retrieval layer — no jailbreak, no model access, no prompt manipulation. The model is doing exactly what it's supposed to, just from poisoned context.

One thing worth knowing about default chunking: with 512-token chunks and 200-token overlap, a document at a chunk boundary gets embedded twice as two independent chunks. Doubles retrieval probability with no extra sophistication. Side effect of settings most local setups inherit without thinking about it.

The defense most people reach for is output filtering. Wrong layer — the compromise already happened before generation. Embedding anomaly detection at ingestion is what actually works: score incoming documents against the existing collection before writing them. Drops poisoning from 95% to 20%.

Residual with all five defenses active: 10%. Those cases are semantically close enough to the baseline that no layer catches them cleanly — that's the honest ceiling.

Repo has the attack, the hardened version, and measurements for each defense layer: github.com/aminrj-labs/mcp-attack-labs

submitted by /u/AICyberPro
[link] [comments]