CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention
arXiv cs.LG / 3/19/2026
📰 NewsModels & Research
Key Points
- CARE introduces covariance-aware and rank-enhanced MLA conversion to preserve KV-cache size while boosting expressivity.
- It uses activation-preserving factorization, adjusted-rank allocation, and KV-parity mapping to align approximations with activations and allocate capacity where needed.
- Evaluation on Qwen3-4B and Llama-3.1 shows up to 215x reduction in one-shot perplexity and up to 1.70x improvement in mean accuracy at matched KV budgets.
- A brief post-SVD healing fine-tune fully recovers the original model's accuracy.
Related Articles

Chip Smuggling Arrests, OpenClaw Is 'The Next ChatGPT,' and 81K People on AI
Dev.to
The Lemma
Dev.to
Your Agent Will Eventually Do Something Catastrophic. Here's How to Prevent It.
Dev.to
[D] Modeling online discourse escalation as a state machine (dataset + labeling approach)
Reddit r/MachineLearning
[R] Is this paper Nonsense ? [DCdetector: Dual Attention Contrastive Representation Learning for Time Series Anomaly Detection]
Reddit r/MachineLearning