CoQuant: Joint Weight-Activation Subspace Projection for Mixed-Precision LLMs

arXiv cs.LG / 4/30/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces CoQuant, a PTQ method for mixed-precision LLMs that jointly considers both weight and activation quantization noise rather than relying only on activation statistics.
CoQuant uses a theoretical formulation of expected output error to derive a closed-form, weighted PCA solution for selecting an optimal high-precision subspace.
Experiments on Llama-3.2 and Qwen2.5 demonstrate consistent improvements over strong PTQ baselines, measured via WikiText perplexity and zero-shot common-sense reasoning accuracy.
The work provides code for implementation, supporting adoption and further validation of the joint subspace modeling approach in low-bit LLM quantization.

Abstract

Post-training quantization (PTQ) has become an important technique for reducing the inference cost of Large Language Models (LLMs). While recent mixed-precision methods improve ultra-low bit quantization by preserving critical subspaces in high precision, they typically construct these subspaces relying solely on activation statistics. This ignores the fundamental nature of linear operations, where the output perturbation is jointly driven by both activation and weight quantization noise. In this paper, we propose CoQuant, a joint weight-activation subspace projection method. By theoretically modeling the expected output error, CoQuant formulates a closed-form weighted PCA solution that balances activation and weight covariances to select the optimal high-precision subspace. Extensive experiments on Llama-3.2 and Qwen2.5 models show that CoQuant consistently outperforms strong PTQ baselines in both WikiText perplexity and zero-shot common-sense reasoning accuracy. These results demonstrate that joint weight-activation subspace modeling provides a principled and effective direction for low-bit LLM quantization. The source code is available at https://github.com/Zachary5895/CoQuant.

The Prompt Caching Mistake That's Costing You 70% More Than You Need to Pay

Dev.to

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works

Dev.to

Building AI Evaluation Pipelines: Automating LLM Testing from Dataset to CI/CD

Dev.to

Function Calling Harness 2: CoT Compliance from 9.91% to 100%

Dev.to

Stop Building Signal APIs. Build Systems That Prove Themselves Wrong.

Dev.to

CoQuant: Joint Weight-Activation Subspace Projection for Mixed-Precision LLMs

Key Points

Abstract

Related Articles

The Prompt Caching Mistake That's Costing You 70% More Than You Need to Pay

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works

Building AI Evaluation Pipelines: Automating LLM Testing from Dataset to CI/CD

Function Calling Harness 2: CoT Compliance from 9.91% to 100%

Stop Building Signal APIs. Build Systems That Prove Themselves Wrong.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer