Marco-Mini (17.3B, 0.86B active) and Marco-Nano (8B, 0.6B active) by Alibaba

Reddit r/LocalLLaMA / 4/10/2026

📰 NewsSignals & Early TrendsModels & Research

Read original →

共有:

Key Points

Alibaba International Digital Commerce has released two new instruction-tuned sparse Mixture-of-Experts (MoE) multilingual LLMs on Hugging Face: Marco-Mini-Instruct (17.3B parameters, ~0.86B active per token) and Marco-Nano-Instruct (8B parameters, ~0.6B active per token).
Marco-Mini-Instruct activates about 5% of its parameters per token (0.86B active) and is reported to achieve top average benchmark performance across English, multilingual general, and multilingual cultural tests versus comparable instruct models.
Marco-Nano-Instruct activates about 7.5% per token (0.6B active) yet is reported to outperform the average performance of comparable instruct models with up to ~3.84B activated parameters.
The models emphasize efficiency via extreme sparsity, with Marco-Mini-Instruct described as having 256 experts and using 8 active experts per token, and both variants described as using a post-training pipeline including SFT and online policy distillation.
Both releases are offered under the Apache 2.0 license and support a reported 29-language multilingual capability.

Looks like these were released six days ago. Did a search and didn't see a post about them.

https://huggingface.co/AIDC-AI/Marco-Mini-Instruct

https://huggingface.co/AIDC-AI/Marco-Nano-Instruct

Pretty wild parameter/active ratio, should be lightning fast.

Marco-Mini-Instruct is the instruction-tuned variant of Marco-Mini-Base, a highly sparse Mixture-of-Experts (MoE) multilingual language model from the Marco-MoE family, developed by Alibaba International Digital Commerce. It activates only 0.86B out of 17.3B total parameters (5% activation ratio) per token. Marco-Mini-Instruct achieves the best average performance across English, multilingual general, and multilingual cultural benchmarks when compared against instruct models with up to 12B activated parameters, including Qwen3-4B-Instruct, Ministral3-8B-Instruct, Gemma3-12B-Instruct, LFM2-24B-A2B, and Granite4-Small-Instruct.

Marco-Nano-Instruct is the post-trained variant of Marco-Nano-Base, a highly sparse Mixture-of-Experts (MoE) multilingual language model from the Marco-MoE family, developed by Alibaba International Digital Commerce. It activates only 0.6B out of 8B total parameters (7.5% activation ratio) per token. Despite its extreme sparsity, Marco-Nano-Instruct achieves the best average performance across English, multilingual general, and multilingual cultural benchmarks among all comparable instruct models up to 3.84B activated parameters.

https://xcancel.com/ModelScope2022/status/2042084482661191942

https://pbs.twimg.com/media/HFbvyB-WsAAayv1.jpg?name=orig

Meet Marco-Mini-Instruct: a highly sparse MoE multilingual model from Alibaba International. 17.3B total params, only 0.86B active (5% activation ratio). 🚀

Beats Qwen3-4B, Gemma3-12B, Granite4-Small on English, multilingual general, and cultural benchmarks — with a fraction of their active params.

🌍 29 languages: Arabic, Turkish, Kazakh, Bengali, Nepali and more

🧠 256 experts, 8 active per token. Drop-Upcycling from Qwen3-0.6B-Base.

🎯 2-stage post-training: SFT + Online Policy Distillation (Qwen3-30B → Qwen3-Next-80B cascade)

✅ Apache 2.0

submitted by /u/AnticitizenPrime
[link] [comments]

Black Hat Asia

AI Business

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents

MarkTechPost

Chatbots are great at manipulating people to buy stuff, Princeton boffins find

The Register

I tested and ranked every ai companion app I tried and here's my honest breakdown

Reddit r/artificial

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Marco-Mini (17.3B, 0.86B active) and Marco-Nano (8B, 0.6B active) by Alibaba

Key Points

Related Articles

Black Hat Asia

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents

Chatbots are great at manipulating people to buy stuff, Princeton boffins find

I tested and ranked every ai companion app I tried and here's my honest breakdown

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer