v0.22.1

Ollama Releases / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

v0.22.1 includes mlxrunner improvements that batch the sampler across multiple sequences to improve efficiency during generation.
The tokenizer was fixed to correct multi-regex BPE offset handling, improving correctness of tokenization behavior.
The mlx integration now supports importing NVIDIA TensorRT Model Optimizer artifacts, expanding deployment/optimization options.
The desktop app/server startup logic was fixed to prevent it from killing active `ollama launch` sessions.
Additional model support for batching was added to further enhance throughput for model execution.

mlxrunner: batch the sampler across multiple sequences by @jessegross in #15736
tokenizer: fix multi-regex BPE offset handling by @dhiltgen in #15844
mlx: Support NVIDIA TensorRT Model Optimizer import by @dhiltgen in #15566
app/server: fix desktop app startup killing active ollama launch sessions by @hoyyeva in #15657
Model support for batching by @jessegross in #15814
New models by @dhiltgen in #15861

Full Changelog: v0.21.3-rc0...v0.22.1-rc0

AI Business

Reddit r/artificial

Reddit r/artificial

Dev.to

Dev.to