A 1.7B model can actually turn out some code, so I'm running the training for a 9B model, then will re-run HumanEval (a full one this time). I've shown most of my homework in the article, but will be posting to github after I clean things up.
It was inspired by Repeat Yourself's dnhkng.github.io/posts/rys/ neuroanatomy findings... this gave me a start and end point to attach my "reverse LLM" side car model (so it reads from the end, and then injects its output back at the top - in a loop), in this case focusing on syntax - drastically improving a very tiny model.
I'll also go back and run the full HumanEval dataset on both, instead of just the first 20.
[link] [comments]




