Is Compression Really Linear with Code Intelligence?
arXiv cs.CL / 3/27/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper investigates how code-focused compression metrics relate to the capabilities of Large Language Models (LLMs), especially for code intelligence across varied languages and tasks.
- It argues prior studies’ assumed linear relationship was incomplete due to limited, less fair evaluation of modern Code LLMs and insufficient coverage of real-world code diversity.
- To improve evaluation, the authors propose a lightweight “Format Annealing” method and run experiments on a broad set of open-source Code LLMs using multi-language, multi-task benchmarks.
- Using bits-per-character (BPC) measured on a newly created large GitHub-derived validation set, the study finds an intrinsic logarithmic (not linear) relationship between measured code intelligence and compression.
- The authors interpret earlier linear-looking findings as possibly arising from the logarithmic curve’s tail under restricted experimental conditions, and they present a more robust framework for code-domain model assessment.
広告
Related Articles

Black Hat Asia
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

I shipped Google's TurboQuant as a vLLM plugin 72 hours after the paper — here's what nobody else tested
Dev.to

We built a governance layer for AI-assisted development (with runtime validation and real system)
Dev.to
No AI system using the forward inference pass can ever be conscious.
Reddit r/artificial