Creating Pi Extension with Pi and Qwen3.5 27B

Reddit r/LocalLLaMA / 4/11/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The author explains their decision to try Pi after community recommendations, highlighting Pi’s greater flexibility for local agentic “harness” workflows compared with Claude Code’s more Anthropic/format-specific tuning.
  • They describe an experiment to recreate Claude Code–style functionality in Pi by having Pi generate a custom extension that tracks and displays prompt/session working time.
  • Using Qwen3.5 27B, the author reports that the extension creation works well “in a single shot” and suggests model sizing tradeoffs (Qwen Coder Next or larger for bigger tasks).
  • They share practical performance and resource observations when increasing context (from 64k to 131k), including approximate VRAM usage (~29GB at max context) and noting Pi’s slightly faster feel versus Claude Code due to a more minimal system prompt.
  • They also evaluate energy efficiency on a battery-powered Asus ROG Flow Z13, reporting about 30% battery drain per session with relatively stable GPU power draw and no major speed regression.

Following my latest post about setting up Claude Code to be used with Local Models

I received a recommendation in the comments to try **Pi**. The suggestion was based on its customizability and superior harness for local models. Unlike Claude Code, which is tuned specifically for Anthropic model formats (similar to OpenAI Codex), Pi offers more flexibility.

**TL;DR:** You can assume Pi is like Arch Linux in the world of agentic harnesses.

In this post, I want to share my setup, ideas, feelings, and experiments. I am not going to convince you to use Pi;

for that, you can check other blogs like Pi: The Minimal Agent Within OpenClaw

Creators Blog

### Bringing Claude Code Functionality to Pi

I wanted to bring some productive functionality from Claude Code into Pi and run some experiments. Specifically, I wanted to track the working time of the current prompt and session, similar how Claude Code displays `Working... {time}`.

I asked Pi to read its documentation and create an extension to track time and display it. Pi includes references to documents within its 1k system prompt, so it knows how to modify or create extensions.

ANNNNDDD

Qwen did it well in a single shot. Assuming this works on sub-agent performance, it feels like Sonnet 4.5 level or GPT-5.4-mini on small tasks. For bigger tasks, I recommend Qwen Coder Next or larger models.

### Resource Usage and Speed

In my past post, I was using a 64k context window, which in practice was not really enough. I switched to 131k, and I am glad that Qwen's reasoning doesn't drop significantly on high contexts.

* **VRAM Usage:** 29GB on max context usage.

Speed: As you know, prompt processing and token generation speeds drop as context increases. However, compared to Claude Code, Pi feels slightly faster. This is due to its smaller RAM and CPU usage, and the fact that it is not loading an enormous 20k system prompt, just a minimalist one.

Customization: If you want to add details to the system prompt, you can check the leaked code, grab everything you need, and plug it into Pi.

Even skills are not configured out of the box; I had to load my own Brave Search skill.

### Energy Efficiency

I tested this on an **Asus ROG Flow Z13** without a power connection, running on battery.

Battery Drain: A single prompt session took about 30% of the battery.

Power Usage: GPU power usage dropped from 60W to 52W, which is negligible.

Performance: I did not experience any great drop in token generation or prompt processing speed.

### Harness Performance

In the past, Pi was performing well on **Terminal Bench**, but I am not sure why it is not currently available on the leaderboard (maybe someone can explain why??).

From my personal feeling, scratch Pi is about 5% worse than Claude Code and Codex for "Production" grade applications and usage. I haven't tested "ForgeCode" yet and have no clue how it even works. However, for Local Models, Pi is a must-have. You will "build" your own harness in the process of configuration.

### The Adaptation Layer

The most important takeaway from the last post for me was the **Adaptation Layer**. This assumes that you need to adapt your Local Model based on the harness you are using, because each model expects different styles for tool calls and templates.

When I was configuring Pi, it had a field to set the chat template, so I configured it for Qwen. This was the biggest win for Pi.

I will continue to configure Pi until it reaches the perfect harness state for me!

submitted by /u/FeiX7
[link] [comments]