Breaking change in llama-server?

Reddit r/LocalLLaMA / 3/28/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • llama-serverの最新ビルドで、従来のllama.cppキャッシュからHuggingFaceのキャッシュディレクトリへモデルが自動移行されるようになったと報告されています。
  • 移行により、手元の全ての .gguf モデルが別形式(blob)として扱われるようになり、既存の起動・ローカルパス参照スクリプトが失敗するようになったようです。
  • エラー例として、従来想定していたパスではモデル読み込みができず、srv load_model が失敗する事象が示されています。
  • この変更はggml-org/llama.cppのコミットb8498(4日前)で追加されたとされ、ユーザーファイルへの不可逆な変更にも関わらず事前停止手段がない点が批判されています。

Here's one less-than-helpful result from HuggingFace's takeover of ggml.

When I launched the latest build of llama-server, it automatically did this:

================================================================================ WARNING: Migrating cache to HuggingFace cache directory Old cache: /home/user/.cache/llama.cpp/ New cache: /home/user/GEN-AI/hf_cache/hub This one-time migration moves models previously downloaded with -hf from the legacy llama.cpp cache to the standard HuggingFace cache. Models downloaded with --model-url are not affected. 

================================================================================

And all of my .gguf models were moved and converted into blobs. That means that my launch scripts all fail since the models are no longer where they were supposed to be...

srv load_model: failed to load model, '/home/user/GEN-AI/hf_cache/models/ggml-org_gpt-oss-20b-GGUF_gpt-oss-20b-mxfp4.gguf' 

It also breaks all my model management scripts for distributing ggufs around to various machines.

The change was added in commit b8498 four days ago. Who releases a breaking change like this without the ability to stop the process before making irreversible changes to user files? I knew the HuggingFace takeover would screw things up.

submitted by /u/hgshepherd
[link] [comments]