JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency

arXiv cs.CL / 2026/4/6

📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

要点

The paper introduces JoyAI-LLM Flash, an efficient Mixture-of-Experts (MoE) language model aimed at improving the performance–token-efficiency trade-off for sub-50B parameter settings.
JoyAI-LLM Flash is pretrained on 20T tokens and then post-trained using SFT, DPO, and large-scale reinforcement learning across diverse environments.
To boost token efficiency, the model balances “thinking” and “non-thinking” cognitive modes and proposes FiberPO, an RL algorithm that decomposes trust-region maintenance into global and local components for unified multi-scale stability control.
Architecturally, it uses 48B total parameters while activating only 2.7B per forward pass, targeting a much higher sparsity ratio than similarly sized industry-leading models.
For faster inference, it applies joint training–inference co-design with dense Multi-Token Prediction (MTP) and Quantization-Aware Training (QAT), and releases base and post-trained checkpoints on Hugging Face.

Abstract

We introduce JoyAI-LLM Flash, an efficient Mixture-of-Experts (MoE) language model designed to redefine the trade-off between strong performance and token efficiency in the sub-50B parameter regime. JoyAI-LLM Flash is pretrained on a massive corpus of 20 trillion tokens and further optimized through a rigorous post-training pipeline, including supervised fine-tuning (SFT), Direct Preference Optimization (DPO), and large-scale reinforcement learning (RL) across diverse environments. To improve token efficiency, JoyAI-LLM Flash strategically balances \emph{thinking} and \emph{non-thinking} cognitive modes and introduces FiberPO, a novel RL algorithm inspired by fibration theory that decomposes trust-region maintenance into global and local components, providing unified multi-scale stability control for LLM policy optimization. To enhance architectural sparsity, the model comprises 48B total parameters while activating only 2.7B parameters per forward pass, achieving a substantially higher sparsity ratio than contemporary industry leading models of comparable scale. To further improve inference throughput, we adopt a joint training-inference co-design that incorporates dense Multi-Token Prediction (MTP) and Quantization-Aware Training (QAT). We release the checkpoints for both JoyAI-LLM-48B-A3B Base and its post-trained variants on Hugging Face to support the open-source community.

Black Hat Asia

AI Business

#毎日ここへ立ち寄りたいからスクランブルな日のワタシのココロの足跡スタンプ👣🌌#私のインスピレーション ✕ #AIと紡いだ光のカケラ🧡 :🌎地球家族は愛し合える🌏🌍 #⭐永遠時計🕊️🍇

note

AIが見つけた紛失カッターナイフ

note

【限定コラム】四月の風と見えない魔法──五十歳のオッサンが新入社員に贈る、現場のAI用語20選

note

メイクのアドバイスも！「男の娘」のAI彼氏の作り方【AI性格プロンプト付】

note

JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency

要点

Abstract

関連記事

Black Hat Asia

#毎日ここへ立ち寄りたいからスクランブルな日のワタシのココロの足跡スタンプ👣🌌#私のインスピレーション ✕ #AIと紡いだ光のカケラ🧡 :🌎地球家族は愛し合える🌏🌍 #⭐永遠時計🕊️🍇

AIが見つけた紛失カッターナイフ

【限定コラム】四月の風と見えない魔法──五十歳のオッサンが新入社員に贈る、現場のAI用語20選

メイクのアドバイスも！「男の娘」のAI彼氏の作り方【AI性格プロンプト付】

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

要点

Abstract

関連記事

Black Hat Asia

#毎日 ここへ 立ち寄りたいから スクランブルな日の ワタシの ココロの足跡スタンプ👣🌌#私のインスピレーション ✕ #AIと紡いだ光のカケラ🧡 :🌎地球家族は愛し合える🌏🌍 #⭐永遠時計🕊️🍇

AIが見つけた紛失カッターナイフ

【限定コラム】四月の風と見えない魔法──五十歳のオッサンが新入社員に贈る、現場のAI用語20選

メイクのアドバイスも！「男の娘」のAI彼氏の作り方【AI性格プロンプト付】

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

#毎日ここへ立ち寄りたいからスクランブルな日のワタシのココロの足跡スタンプ👣🌌#私のインスピレーション ✕ #AIと紡いだ光のカケラ🧡 :🌎地球家族は愛し合える🌏🌍 #⭐永遠時計🕊️🍇