Quoting Georgi Gerganov

Simon Willison's Blog / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep Analysis

Key Points

  • Georgi Gerganov argues that the biggest problems with locally run models often stem from the “harness” layer and subtle issues in chat templates, prompt construction, and inference behavior.
  • He highlights that local AI workflows involve a long chain of components—often built by different parties—that are fragile and hard to reason about end-to-end.
  • The quote emphasizes that observed failures may be caused by subtle bugs anywhere in the stack, not just the model itself.
  • Overall, the post frames local LLM success for coding agents as an integration and reliability challenge rather than purely a model-quality problem.
Sponsored by: WorkOS — Ready to sell to Enterprise clients? Build and ship securely with WorkOS.

30th March 2026

Note that the main issues that people currently unknowingly face with local models mostly revolve around the harness and some intricacies around model chat templates and prompt construction. Sometimes there are even pure inference bugs. From typing the task in the client to the actual result, there is a long chain of components that atm are not only fragile - are also developed by different parties. So it's difficult to consolidate the entire stack and you have to keep in mind that what you are currently observing is with very high probability still broken in some subtle way along that chain.

Georgi Gerganov, explaining why it's hard to find local models that work well with coding agents

Posted 30th March 2026 at 9:31 pm

Quoting Georgi Gerganov | AI Navigate