FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices
arXiv cs.LG / 4/29/2026
📰 NewsDeveloper Stack & InfrastructureIndustry & Market MovesModels & Research
Key Points
- The paper introduces Fed-FSTQ, a Fisher-guided token quantization method aimed at making federated fine-tuning of LLMs practical on edge/mobile settings where uplink is limited and clients join intermittently.
- It estimates token sensitivity using a lightweight Fisher proxy, then applies importance-aware token selection together with non-uniform mixed-precision quantization to preserve task-critical signals while reducing redundant communication.
- Fed-FSTQ is model-agnostic and works as a drop-in module for standard federated PEFT pipelines such as LoRA without changing the server aggregation rule.
- Experiments on multilingual QA and medical QA with non-IID data partitions show large gains, including up to 46x reduction in cumulative uplink traffic to reach the same quality and 52% faster end-to-end time-to-accuracy.
- When used at inference, Fisher-guided token reduction also provides up to 1.55x end-to-end speedup on NVIDIA Jetson-class edge devices, supporting deployment under constrained resources.


