gemma-4-26B-A4B with my coding agent Kon

Reddit r/LocalLLaMA / 4/10/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • The post introduces “Kon,” a GitHub coding agent project designed to work smoothly with local LLMs for straightforward coding tasks.
  • Kon emphasizes simplicity (small system prompt under 270 tokens), no telemetry, and broad compatibility with local models tested against several GGUF options.
  • The agent supports multiple provider backends (e.g., OpenAI/Anthropic-compatible APIs including OpenAI, Anthropic, Copilot, Azure, etc.), enabling flexible deployment choices.
  • It offers common coding-agent workflow features such as attachments, commands, AGENTS.md, skills, session resuming, model switching, and “forking”/handoff capabilities.
  • The author reports local testing using llama-server on an NVIDIA 3090, documenting model performance and setup in separate repo docs.
gemma-4-26B-A4B with my coding agent Kon

Wanted to share my coding agent, which has been working great with these local models for simple tasks. https://github.com/0xku/kon

It takes lots of inspiration from pi (simple harness), opencode (sparing little ui real state for tool calls - mostly), amp code (/handoff) and claude code of course

I hope the community finds it useful. It should check a lot of boxes:
- small system prompt, under 270 tokens; you can change this as well
- no telemetry
- works without any hassle with all the best local models, tested with zai-org/glm-4.7-flash, unsloth/Qwen3.5-27B-GGUF and unsloth/gemma-4-26B-A4B-it-GGUF
- works with most popular providers like openai, anthropic, copilot, azure, zai etc (anything thats compatible with openai/anthropic apis)
- simple codebase (<150 files)

Its not just a toy implementation but a full fledged coding agent now (almost). All the common options like: @ attachments, / commands, AGENTS.md, skills, compaction, forking (/handoff), exports, resuming sessions, model switch ... are supported.
Take a look at the https://github.com/0xku/kon/blob/main/README.md for all the features.

All the local models were tested with llama-server buildb8740 on my 3090 - see https://github.com/0xku/kon/blob/main/docs/local-models.md for more details.

submitted by /u/Weird_Search_4723
[link] [comments]