PQuantML: A Tool for End-to-End Hardware-aware Model Compression
arXiv cs.LG / 3/30/2026
📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- PQuantML is introduced as a new open-source, hardware-aware library for end-to-end neural network model compression focused on meeting strict latency constraints in deployment environments.
- The tool provides a unified workflow to apply pruning and fixed-point quantization either jointly or separately, including support for high-granularity quantization.
- It includes multiple pruning techniques with different granularities and is designed to simplify training compressed models without requiring separate toolchains.
- Experiments on tasks such as jet substructure classification and real-time LHC-oriented jet tagging show substantial reductions in parameter counts and bit-widths while preserving accuracy.
- The paper compares PQuantML’s compression results against existing approaches like QKeras and HGQ.




