"Second Thoughts" Been playing with adding a small transformer that reads output near the end of generation, and feeds it back near the top as a refinement loop. A quick test of 1.7B model showed drastic improvement in focused tasks (like coding)

Reddit r/LocalLLaMA / 5/4/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The author reports experiments with a small transformer that reads generated text near the end and feeds that information back near the beginning as a refinement loop, improving focused tasks such as coding.
  • A quick test with a 1.7B model showed drastic gains, and the author is now training a 9B model to reproduce and extend the results.
  • The approach was inspired by neuroanatomy findings from Repeat Yourself, leading the author to build a “reverse LLM” sidecar that injects its output back into the prompt context.
  • The author plans to re-run HumanEval, including the full dataset (not just the first 20), and post cleaned-up materials and code to GitHub after further verification.
  • The work emphasizes syntax-focused refinement, aiming to get much better performance from a very small model by adding a bidirectional-style loop mechanism.

A 1.7B model can actually turn out some code, so I'm running the training for a 9B model, then will re-run HumanEval (a full one this time). I've shown most of my homework in the article, but will be posting to github after I clean things up.

It was inspired by Repeat Yourself's dnhkng.github.io/posts/rys/ neuroanatomy findings... this gave me a start and end point to attach my "reverse LLM" side car model (so it reads from the end, and then injects its output back at the top - in a loop), in this case focusing on syntax - drastically improving a very tiny model.

I'll also go back and run the full HumanEval dataset on both, instead of just the first 20.

submitted by /u/bigattichouse
[link] [comments]