v0.22.1

Ollama Releases / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • v0.22.1 includes mlxrunner improvements that batch the sampler across multiple sequences to improve efficiency during generation.
  • The tokenizer was fixed to correct multi-regex BPE offset handling, improving correctness of tokenization behavior.
  • The mlx integration now supports importing NVIDIA TensorRT Model Optimizer artifacts, expanding deployment/optimization options.
  • The desktop app/server startup logic was fixed to prevent it from killing active `ollama launch` sessions.
  • Additional model support for batching was added to further enhance throughput for model execution.

What's Changed

Full Changelog: v0.21.3-rc0...v0.22.1-rc0