v0.22.1

Ollama Releases / 4/30/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • Ollama v0.22.1 includes updates to mlxrunner to batch the sampler across multiple sequences, improving throughput for multi-sequence workloads.
  • The tokenizer has been fixed for multi-regex BPE offset handling, addressing correctness issues in tokenization.
  • The mlx integration now supports importing NVIDIA TensorRT Model Optimizer models, expanding compatibility with TensorRT optimization workflows.
  • The app/server has been patched to prevent the desktop app startup from terminating active `ollama launch` sessions.
  • Additional model support for batching was added, further enabling more efficient inference pipelines where batching is applicable.

What's Changed

Full Changelog: v0.21.3-rc0...v0.22.1-rc0