CPUBone: Efficient Vision Backbone Design for Devices with Low Parallelization Capabilities
arXiv cs.AI / 3/30/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that most vision backbone efficiency research targets highly parallel hardware, but CPU-based inference needs a different design approach that emphasizes high MACs per second (MACpS) to sustain low latency.
- It evaluates two modifications to standard convolutions—grouped convolutions and smaller kernel sizes—that substantially reduce total MACs while aiming to preserve hardware efficiency.
- Across experiments on multiple CPU devices, the authors show these convolution changes maintain high hardware efficiency despite lowering computational cost.
- They introduce CPUBone, a new CPU-optimized vision backbone family that achieves strong speed–accuracy trade-offs across a range of CPU hardware.
- CPUBone’s efficiency is reported to carry over to downstream tasks such as object detection and semantic segmentation, and the models/code are released on GitHub.
Related Articles

Black Hat Asia
AI Business

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer
Simon Willison's Blog
Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026
Dev.to

I missed the "fun" part in software development
Dev.to

The Billion Dollar Tax on AI Agents
Dev.to