Following my latest post about setting up Claude Code to be used with Local Models
I received a recommendation in the comments to try **Pi**. The suggestion was based on its customizability and superior harness for local models. Unlike Claude Code, which is tuned specifically for Anthropic model formats (similar to OpenAI Codex), Pi offers more flexibility.
**TL;DR:** You can assume Pi is like Arch Linux in the world of agentic harnesses.
In this post, I want to share my setup, ideas, feelings, and experiments. I am not going to convince you to use Pi;
for that, you can check other blogs like Pi: The Minimal Agent Within OpenClaw
### Bringing Claude Code Functionality to Pi
I wanted to bring some productive functionality from Claude Code into Pi and run some experiments. Specifically, I wanted to track the working time of the current prompt and session, similar how Claude Code displays `Working... {time}`.
I asked Pi to read its documentation and create an extension to track time and display it. Pi includes references to documents within its 1k system prompt, so it knows how to modify or create extensions.
ANNNNDDD
Qwen did it well in a single shot. Assuming this works on sub-agent performance, it feels like Sonnet 4.5 level or GPT-5.4-mini on small tasks. For bigger tasks, I recommend Qwen Coder Next or larger models.
### Resource Usage and Speed
In my past post, I was using a 64k context window, which in practice was not really enough. I switched to 131k, and I am glad that Qwen's reasoning doesn't drop significantly on high contexts.
* **VRAM Usage:** 29GB on max context usage.
Speed: As you know, prompt processing and token generation speeds drop as context increases. However, compared to Claude Code, Pi feels slightly faster. This is due to its smaller RAM and CPU usage, and the fact that it is not loading an enormous 20k system prompt, just a minimalist one.
Customization: If you want to add details to the system prompt, you can check the leaked code, grab everything you need, and plug it into Pi.
Even skills are not configured out of the box; I had to load my own Brave Search skill.
### Energy Efficiency
I tested this on an **Asus ROG Flow Z13** without a power connection, running on battery.
Battery Drain: A single prompt session took about 30% of the battery.
Power Usage: GPU power usage dropped from 60W to 52W, which is negligible.
Performance: I did not experience any great drop in token generation or prompt processing speed.
### Harness Performance
In the past, Pi was performing well on **Terminal Bench**, but I am not sure why it is not currently available on the leaderboard (maybe someone can explain why??).
From my personal feeling, scratch Pi is about 5% worse than Claude Code and Codex for "Production" grade applications and usage. I haven't tested "ForgeCode" yet and have no clue how it even works. However, for Local Models, Pi is a must-have. You will "build" your own harness in the process of configuration.
### The Adaptation Layer
The most important takeaway from the last post for me was the **Adaptation Layer**. This assumes that you need to adapt your Local Model based on the harness you are using, because each model expects different styles for tool calls and templates.
When I was configuring Pi, it had a field to set the chat template, so I configured it for Qwen. This was the biggest win for Pi.
I will continue to configure Pi until it reaches the perfect harness state for me!
[link] [comments]




