JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency
arXiv cs.CL / 2026/4/6
📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper introduces JoyAI-LLM Flash, an efficient Mixture-of-Experts (MoE) language model aimed at improving the performance–token-efficiency trade-off for sub-50B parameter settings.
- JoyAI-LLM Flash is pretrained on 20T tokens and then post-trained using SFT, DPO, and large-scale reinforcement learning across diverse environments.
- To boost token efficiency, the model balances “thinking” and “non-thinking” cognitive modes and proposes FiberPO, an RL algorithm that decomposes trust-region maintenance into global and local components for unified multi-scale stability control.
- Architecturally, it uses 48B total parameters while activating only 2.7B per forward pass, targeting a much higher sparsity ratio than similarly sized industry-leading models.
- For faster inference, it applies joint training–inference co-design with dense Multi-Token Prediction (MTP) and Quantization-Aware Training (QAT), and releases base and post-trained checkpoints on Hugging Face.




