v0.22.1

Ollama Releases / 4/30/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

Ollama v0.22.1 includes updates to mlxrunner to batch the sampler across multiple sequences, improving throughput for multi-sequence workloads.
The tokenizer has been fixed for multi-regex BPE offset handling, addressing correctness issues in tokenization.
The mlx integration now supports importing NVIDIA TensorRT Model Optimizer models, expanding compatibility with TensorRT optimization workflows.
The app/server has been patched to prevent the desktop app startup from terminating active `ollama launch` sessions.
Additional model support for batching was added, further enabling more efficient inference pipelines where batching is applicable.

mlxrunner: batch the sampler across multiple sequences by @jessegross in #15736
tokenizer: fix multi-regex BPE offset handling by @dhiltgen in #15844
mlx: Support NVIDIA TensorRT Model Optimizer import by @dhiltgen in #15566
app/server: fix desktop app startup killing active ollama launch sessions by @hoyyeva in #15657
Model support for batching by @jessegross in #15814
New models by @dhiltgen in #15861