CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks
arXiv cs.CL / 3/13/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- CLASP defends hybrid SSM-based LLMs against Hidden State Poisoning Attacks by framing the mitigation as a token-level binary classification problem using an XGBoost classifier on block output embeddings.
- It achieves high detection performance in a realistic resume-scanning scenario: 95.9% token-level F1 and 99.3% document-level F1 on malicious tokens, with strong generalization to unseen attack patterns (96.9% doc-level F1 in leave-one-out; 91.6% doc-level F1 under structurally novel triggers).
- CLASP operates with modest resources—about 1,032 tokens/second and under 4 GB VRAM—making it a lightweight front-line defense that is independent of downstream models.
- The paper provides code and detailed results at the linked URL, illustrating a practical defense technique for SSM-based and hybrid architectures.
Related Articles
We asked 200 ChatGPT users their biggest frustration. All top 5 answers are problems ChatGPT Toolbox solves.
Reddit r/artificial
I Built an AI That Reviews Every PR for Security Bugs — Here's How (2026)
Dev.to
[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning
Reddit r/MachineLearning
How I Built an AI SDR Agent That Finds Leads and Writes Personalized Cold Emails
Dev.to
Complete Guide: How To Make Money With Ai
Dev.to