| Qwen 3 Baseline: 36 trillion tokens Qwen 3.5 Description: Described as having a *significantly larger scale of visual-text tokens* compared to Qwen 3. Multimodal Factor: Transition from text-only training to native visual-text (multimodal) training increases total token volume due to image-text pair encoding and richer data representation. **Conservative Estimate: 42–48 trillion tokens** Reasoning:
This range stays conservative while avoiding speculative overestimation. **Sources:** [link] [comments] |
Token Estimate for Qwen 3.5-397B. Based on official source only :)
Reddit r/LocalLLaMA / 4/20/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The article estimates that Qwen 3.5-397B was trained on about 42–48 trillion tokens, starting from an assumed baseline of 36T tokens for Qwen 3.
- It attributes the increase mainly to the shift from text-only training to native multimodal (visual-text) training, which encodes image-text pairs and adds extra token streams.
- The estimated range is described as conservative, reflecting an implied 15–30% growth over the 36T figure rather than speculative extrapolation.
- It cites official Qwen blog posts for Qwen 3 and Qwen 3.5 as the basis for the estimate.
- The core takeaway is that multimodal training can substantially raise effective token volume even at the same overall model size class.
Related Articles
![Runtime security for AI agents: risk scoring, policy enforcement, and rollback for production agent pipeline [P]](/_next/image?url=https%3A%2F%2Fpreview.redd.it%2Fjaatbenjg9wg1.jpg%3Fwidth%3D140%26height%3D80%26auto%3Dwebp%26s%3D43ed5a4d6806da42e7feccd461f2fe78add2eae0&w=3840&q=75)
Runtime security for AI agents: risk scoring, policy enforcement, and rollback for production agent pipeline [P]
Reddit r/MachineLearning

Anthropic Won't Fix the MCP Vulnerability — Here's How to Protect Your Server
Dev.to

Vercel Hack: Why You Need to Rotate Your "Non-Sensitive" Environment Variables Today
Dev.to

Researchers gave 1,222 people AI assistants, then took them away after 10 minutes. Performance crashed below the control group and people stopped trying. UCLA, MIT, Oxford, and Carnegie Mellon call it the "boiling frog" effect.
Reddit r/artificial

The Brutal Truth About Learning AI Agents: What I Learned from Building 17 Broken Versions
Dev.to