Anyone using coding agents like Codex must be aware of the pains of stricter token limits. I am on the Plus plan and the 5-hour limits are so short, they are really annoying. Came across a package called distill - https://github.com/samuelfaj/distill - which claims to reduce token usage by 99%. Behind the scenes, it works by by compressing command output using a LLM (local or hosted) before it reaches the model. Well, I started using it and it is helping in extending my sessions. Sharing in case it helps others stretch their limits a bit further.
Known Issues -
The latest build supports Windows but hasn't been published in npm yet. If you stumble upon it, just like me, use the commands in this comment to set it up (or just copy them from the end of this post).
It silently fails for newer, reasoning models, like the GPT-5 family. I have submitted this issue to add support for the newer models and working on a fix (hope the maintainer is open to contribution!). Meanwhile, use it with the non-reasoning models like gpt-4o, gpt-4.1 or gpt-5-chat-latest
Windows Install Commands
- Install bun -
npm install -g bun - Clone and install -
git clone https://github.com/samuelfaj/distill.git; cd distill; npm install - Build the binary -
npm run build:bins - Add to path -
New-Item -ItemType Directory -Force packages\distill-win32-x64\bin | Out-Null
Copy-Item .dist\bun-windows-x64\distill.exe packages\distill-win32-x64\bin\distill.exe -Force
Once that's done, try verifying -
PS D:\custom_pnpm\distill> distill --version
1.4.1




