gemma-4-26B-A4B with my coding agent Kon

Reddit r/LocalLLaMA / 4/10/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

The post introduces “Kon,” a GitHub coding agent project designed to work smoothly with local LLMs for straightforward coding tasks.
Kon emphasizes simplicity (small system prompt under 270 tokens), no telemetry, and broad compatibility with local models tested against several GGUF options.
The agent supports multiple provider backends (e.g., OpenAI/Anthropic-compatible APIs including OpenAI, Anthropic, Copilot, Azure, etc.), enabling flexible deployment choices.
It offers common coding-agent workflow features such as attachments, commands, AGENTS.md, skills, session resuming, model switching, and “forking”/handoff capabilities.
The author reports local testing using llama-server on an NVIDIA 3090, documenting model performance and setup in separate repo docs.

gemma-4-26B-A4B with my coding agent Kon

Wanted to share my coding agent, which has been working great with these local models for simple tasks. https://github.com/0xku/kon

It takes lots of inspiration from pi (simple harness), opencode (sparing little ui real state for tool calls - mostly), amp code (/handoff) and claude code of course

I hope the community finds it useful. It should check a lot of boxes:
- small system prompt, under 270 tokens; you can change this as well
- no telemetry
- works without any hassle with all the best local models, tested with zai-org/glm-4.7-flash, unsloth/Qwen3.5-27B-GGUF and unsloth/gemma-4-26B-A4B-it-GGUF
- works with most popular providers like openai, anthropic, copilot, azure, zai etc (anything thats compatible with openai/anthropic apis)
- simple codebase (<150 files)

Its not just a toy implementation but a full fledged coding agent now (almost). All the common options like: @ attachments, / commands, AGENTS.md, skills, compaction, forking (/handoff), exports, resuming sessions, model switch ... are supported.
Take a look at the https://github.com/0xku/kon/blob/main/README.md for all the features.

All the local models were tested with llama-server buildb8740 on my 3090 - see https://github.com/0xku/kon/blob/main/docs/local-models.md for more details.

submitted by /u/Weird_Search_4723
[link] [comments]