POaaS: Minimal-Edit Prompt Optimization as a Service to Lift Accuracy and Cut Hallucinations on On-Device sLLMs
arXiv cs.AI / 3/18/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- POaaS introduces a minimal-edit prompt optimization layer for on-device sLLMs that routes each query to lightweight specialists (Cleaner, Paraphraser, Fact-Adder) and merges their outputs under strict drift and length constraints.
- In experiments with Llama-3.2-3B-Instruct and Llama-3.1-8B-Instruct, POaaS improves task accuracy and factuality, while representative APO baselines degrade them.
- The approach uses a conservative skip policy for well-formed prompts and shows up to +7.4% improvement under token deletion and input mixup.
- The design aims to reduce context waste and avoid the cost of search-heavy APO within on-device constraints.
- The authors argue that per-query conservative optimization is a practical alternative to APO for on-device sLLMs.




