Once-for-All Channel Mixers (HYPERTINYPW): Generative Compression for TinyML

arXiv cs.LG / 3/27/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Introduces HYPER-TINYPW, a “compression-as-generation” method that stores compact per-layer codes and uses a shared tiny MLP to generate most 1x1 pointwise (PW) mixer weights at load time for MCU deployment.
By caching the generated PW kernels and running inference with standard integer operators, the approach preserves commodity microcontroller runtimes and keeps steady-state latency/energy comparable to INT8 separable CNN baselines.
Enforces a shared latent basis across layers to remove cross-layer redundancy while keeping PW1 in INT8 to stabilize morphology-sensitive mixing during early training/inference stages.
Reports strong flash/memory tradeoffs on ECG benchmarks (Apnea-ECG, PTB-XL, MIT-BIH), achieving about a 6.31× reduction in bytes (~225 kB) while retaining at least ~95% of large-model macro-F1, and better performance under tight 32–64 kB budgets.
Demonstrates broader applicability beyond ECG by transferring to TinyML audio, reaching 96.2% test accuracy on Speech Commands, suggesting the technique fits other embedded sensing/speech settings where repeated linear mixers dominate storage.

Abstract

Deploying neural networks on microcontrollers is constrained by kilobytes of flash and SRAM, where 1x1 pointwise (PW) mixers often dominate memory even after INT8 quantization across vision, audio, and wearable sensing. We present HYPER-TINYPW, a compression-as-generation approach that replaces most stored PW weights with generated weights: a shared micro-MLP synthesizes PW kernels once at load time from tiny per-layer codes, caches them, and executes them with standard integer operators. This preserves commodity MCU runtimes and adds only a one-off synthesis cost; steady-state latency and energy match INT8 separable CNN baselines. Enforcing a shared latent basis across layers removes cross-layer redundancy, while keeping PW1 in INT8 stabilizes early, morphology-sensitive mixing. We contribute (i) TinyML-faithful packed-byte accounting covering generator, heads/factorization, codes, kept PW1, and backbone; (ii) a unified evaluation with validation-tuned t* and bootstrap confidence intervals; and (iii) a deployability analysis covering integer-only inference and boot versus lazy synthesis. On three ECG benchmarks (Apnea-ECG, PTB-XL, MIT-BIH), HYPER-TINYPW shifts the macro-F1 versus flash Pareto frontier: at about 225 kB it matches a roughly 1.4 MB CNN while being 6.31x smaller (84.15% fewer bytes), retaining at least 95% of large-model macro-F1. Under 32-64 kB budgets it sustains balanced detection where compact baselines degrade. The mechanism applies broadly to other 1D biosignals, on-device speech, and embedded sensing tasks where per-layer redundancy dominates, indicating a wider role for compression-as-generation in resource-constrained ML systems. Beyond ECG, HYPER-TINYPW transfers to TinyML audio: on Speech Commands it reaches 96.2% test accuracy (98.2% best validation), supporting broader applicability to embedded sensing workloads where repeated linear mixers dominate memory.