New open weights models: GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B

Reddit r/LocalLLaMA / 2026/3/25

📰 ニュースDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

要点

  • GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B weights have been released under the MIT license on Hugging Face, aiming to expand the open-weights ecosystem.
  • The Ultra model is a 702B MoE intended for high-resource environments, while Lightning is a smaller 10B MoE designed for efficient local inference.
  • Both models are pretrained from scratch (not a DeepSeek fine-tune), emphasize English/Russian with coverage across 14 languages, and support efficient inference features like native FP8 and MTP.
  • The release highlights tool-calling optimization, including a strong reported BFCLv3 score for Lightning, and claims high context efficiency (up to 256k) via a DeepSeekV3 architecture.
  • Benchmark comparisons are used to position Ultra and Lightning against models such as DeepSeek-V3 and Qwen/Gemma variants on the company’s evaluated tasks.

Hey, folks!

We've released the weights of our GigaChat-3.1-Ultra and Lightning models under MIT license at our HF. These models are pretrained from scratch on our hardware and target both high resource environments (Ultra is a large 702B MoE) and local inference (Lightning is a tiny 10B A1.8B MoE). Why?

  1. Because we believe that having more open weights models is better for the ecosystem
  2. Because we want to create a good, native for CIS language model

More about the models:

- Both models are pretrained from scratch using our own data and compute -- thus, it's not a DeepSeek finetune.
- GigaChat-3.1-Ultra is a 702B A36B DeepSeek MoE, which outperforms DeepSeek-V3-0324 and Qwen3-235B. It is trained with native FP8 during DPO stage, supports MTP and can be ran on 3 HGX instances.
- GigaChat-3.1-Lightning is a 10B A1.8B DeepSeek MoE, which outperforms Qwen3-4B-Instruct-2507 and Gemma-3-4B-it on our benchmarks, while being as fast as Qwen3-1.7B due to native FP8 DPO and MTP support and has highly efficient 256k context due to DeepSeekV3 architecture.
- Both models are optimized for English and Russian languages, but are trained on 14 languages, achieving good multilingual results.
- We've optimized our models for tool calling, with GigaChat-3.1-Lightning having a whopping 0.76 on BFCLv3 benchmark.

Metrics:

GigaChat-3.1-Ultra:

Domain Metric GigaChat-2-Max GigaChat-3-Ultra-Preview GigaChat-3.1-Ultra DeepSeek V3-0324 Qwen3-235B-A22B (Non-Thinking)
General Knowledge MMLU RU 0.7999 0.7914 0.8267 0.8392 0.7953
General Knowledge RUQ 0.7473 0.7634 0.7986 0.7871 0.6577
General Knowledge MEPA 0.6630 0.6830 0.7130 0.6770 -
General Knowledge MMLU PRO 0.6660 0.7280 0.7668 0.7610 0.7370
General Knowledge MMLU EN 0.8600 0.8430 0.8422 0.8820 0.8610
General Knowledge BBH 0.5070 - 0.7027 - 0.6530
General Knowledge SuperGPQA - 0.4120 0.4892 0.4665 0.4406
Math T-Math 0.1299 0.1450 0.2961 0.1450 0.2477
Math Math 500 0.7160 0.7840 0.8920 0.8760 0.8600
Math AIME 0.0833 0.1333 0.3333 0.2667 0.3500
Math GPQA Five Shot 0.4400 0.4220 0.4597 0.4980 0.4690
Coding HumanEval 0.8598 0.9024 0.9085 0.9329 0.9268
Agent / Tool Use BFCL 0.7526 0.7310 0.7639 0.6470 0.6800
Total Mean 0.6021 0.6115 0.6764 0.6482 0.6398
Arena GigaChat-2-Max GigaChat-3-Ultra-Preview GigaChat-3.1-Ultra DeepSeek V3-0324
Arena Hard Logs V3 64.9 50.5 90.2 80.1
Validator SBS Pollux 54.4 40.1 83.3 74.5
RU LLM Arena 55.4 44.9 70.9 72.1
Arena Hard RU 61.7 39.0 82.1 70.7
Average 59.1 43.6 81.63 74.4

GigaChat-3.1-Lightning

Domain Metric GigaChat-3-Lightning GigaChat-3.1-Lightning Qwen3-1.7B-Instruct Qwen3-4B-Instruct-2507 SmolLM3 gemma-3-4b-it
General MMLU RU 0.683 0.6803 - 0.597 0.500 0.519
General RUBQ 0.652 0.6646 - 0.317 0.636 0.382
General MMLU PRO 0.606 0.6176 0.410 0.685 0.501 0.410
General MMLU EN 0.740 0.7298 0.600 0.708 0.599 0.594
General BBH 0.453 0.5758 0.3317 0.717 0.416 0.131
General SuperGPQA 0.273 0.2939 0.209 0.375 0.246 0.201
Code Human Eval Plus 0.695 0.7317 0.628 0.878 0.701 0.713
Tool Calling BFCL V3 0.71 0.76 0.57 0.62 - -
Total Average 0.586 0.631 0.458 0.612 0.514 0.421
Arena GigaChat-2-Lite-30.1 GigaChat-3-Lightning GigaChat-3.1-Lightning YandexGPT-5-Lite-8B SmolLM3 gemma-3-4b-it Qwen3-4B Qwen3-4B-Instruct-2507
Arena Hard Logs V3 23.700 14.3 46.700 17.9 18.1 38.7 27.7 61.5
Validator SBS Pollux 32.500 24.3 55.700 10.3 13.7 34.000 19.8 56.100
Total Average 28.100 19.3 51.200 14.1 15.9 36.35 23.75 58.800

Lightning throughput tests:

Model Output tps Total tps TPOT Diff vs Lightning BF16
GigaChat-3.1-Lightning BF16 2 866 5 832 9.52 +0.0%
GigaChat-3.1-Lightning BF16 + MTP 3 346 6 810 8.25 +16.7%
GigaChat-3.1-Lightning FP8 3 382 6 883 7.63 +18.0%
GigaChat-3.1-Lightning FP8 + MTP 3 958 8 054 6.92 +38.1%
YandexGPT-5-Lite-8B 3 081 6 281 7.62 +7.5%

(measured using vllm 0.17.1rc1.dev158+g600a039f5, concurrency=32, 1xH100 80gb SXM5. Link to benchmarking script.)

Once again, weights and GGUFs are available at our HuggingFace, and you can read a technical report at our Habr (unfortunately, in Russian -- but you can always use translation).

submitted by /u/netikas
[link] [comments]