I've noticed an issue when I'm using Pi as a coding agent with llama-cpp, and I'm wondering if there's an issue with Pi or how I have it configured, or if this is just expected behavior.
I'm using Qwen3.5 122b with thinking enabled. When doing a bunch of agentic edits, it will do a lot of interleaving thinking and tool calls. This all works fine.
But then when it comes to my next turn providing input, I get a whole bunch of the context cache invalidated, because it looks like Pi is no longer sending over the thinking blocks. I see this in the llama-cpp log, where you can see that it diverged by dropping the thinking block:
srv params_from_: Chat format: peg-native slot get_availabl: id 3 | task -1 | selected slot by LCP similarity, sim_best = 0.736 (> 0.100 thold), f_keep = 0.703 slot launch_slot_: id 3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist slot launch_slot_: id 3 | task 29044 | processing task, is_child = 0 slot update_slots: id 3 | task 29044 | new prompt, n_ctx_slot = 262144, n_keep = 0, task.n_tokens = 48112 slot update_slots: id 3 | task 29044 | old: ... <|im_start|>assistant | <think> The user is saying slot update_slots: id 3 | task 29044 | new: ... <|im_start|>assistant | You're right - ball-to slot update_slots: id 3 | task 29044 | 198 248045 74455 198 248068 198 760 1156 369 5315 slot update_slots: id 3 | task 29044 | 198 248045 74455 198 2523 2224 1245 471 4776 4534 slot update_slots: id 3 | task 29044 | n_past = 35407, slot.prompt.tokens.size() = 50377, seq_id = 3, pos_min = 50376, n_swa = 0 And then it goes on to invalidate a bunch of the context checkpoints and recomputes the cache from point that the history diverged, where the thinking context was dropped.
Now, I haven't dug into this too deeply yet, but I wanted to check: is this behavior expected? Do I have something configured wrong, or is Pi buggy in not sending thinking context from previous turns?
Here's the model config from my models.json in my Pi config:
{ "id": "unsloth/Qwen3.5-122B-A10B-GGUF:UD-Q4_K_XL", "name": "Qwen3.5 122B-A10B (local)", "reasoning": true, "input": ["text", "image"], "contextWindow": 262144, "maxTokens": 65536, "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }, "compat": { "thinkingFormat": "qwen-chat-template" } }, [link] [comments]


