AI Navigate

Meta announces four new MTIA chips, focussed on inference

Reddit r/LocalLLaMA / 3/13/2026

📰 NewsIndustry & Market Moves

Key Points

  • Meta announced MTIA generation chips (400300400) focused on inference, with development running roughly two years and modular chiplets for swapping components without full redesigns.
  • MTIA 450 and 500 are inference-first designs, contrasting Nvidia's training-first approach, aligned with Meta's scale needs.
  • Memory bandwidth is a central focus, ranging from 6.1 TB/s on MTIA 300 to 27.6 TB/s on MTIA 500, with MTIA 450 said to beat leading commercial products in bandwidth.
  • The stack emphasizes heavy low-precision compute, with MX4 delivering around 30 PFLOPS on the 500 and custom data types intended to preserve model quality while boosting throughput.
  • Software compatibility is PyTorch-native with vLLM support (torch.compile, Triton, vLLM plugin), enabling models to run on GPUs and MTIA without rewrites; MTIA 400 ships to data centers now, with 450/500 slated for 2027.
Meta announces four new MTIA chips, focussed on inference

Meta shared details on four generations of their custom MTIA chips (300–500), all developed in roughly two years.

Meta's building their own silicon and iterating fast, a new chip roughly every 6 months, using modular chiplets where they can swap out pieces without redesigning everything.

Notable:

  • Inference-first design. MTIA 450 and 500 are optimized for GenAI inference, not training. Opposite of how Nvidia does it (build for training, apply to everything). Makes sense given their scale.
  • HBM bandwidth scaling hard. 6.1 TB/s on the 300 → 27.6 TB/s on the 500 (4.5x). Memory bandwidth is the LLM inference bottleneck, and they claim MTIA 450 already beats leading commercial products here.
  • Heavy low-precision push. MX4 hits 30 PFLOPS on the 500. Custom data types designed for inference that they say preserve model quality while boosting throughput.
  • PyTorch-native with vLLM support. torch.compile, Triton, vLLM plugin. Models run on both GPUs and MTIA without rewrites.
  • Timeline: MTIA 400 heading to data centers now, 450 and 500 slated for 2027.

Source: https://ai.meta.com/blog/meta-mtia-scale-ai-chips-for-billions/

submitted by /u/Balance-
[link] [comments]