Made a tool that builds its own training data and improves each cycle by learning from what it got wrong

Reddit r/artificial / 5/5/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The post describes a workflow where a tool takes a few seed prompts, generates instruction–response pairs, uses an LLM to judge the outputs, and uses both the good examples for training while feeding the bad ones back as seeds for the next iteration.
  • It emphasizes an iterative “practice on what it failed at” approach, effectively creating a self-improving curriculum by focusing on failure cases.
  • The author notes that the judging step can run fully locally using Ollama to avoid sending data to external APIs.
  • The fine-tuning stage is implemented using Unsloth on a free Colab GPU, making the whole process accessible without paid resources.
  • The project is framed as a practical tool rather than a research paper, with an invitation for others to share similar work.
Made a tool that builds its own training data and improves each cycle by learning from what it got wrong

The basic idea is pretty simple. You give it a few seed prompts. It generates instruction-response pairs, an LLM scores each one, the good ones go into your training set and the bad ones become the seeds for the next round. Each cycle the model is essentially practicing on what it failed at before.

You can run the judge completely locally with Ollama if you do not want to send data to any API.

The fine-tuning at the end uses Unsloth on a free Colab GPU so the whole thing is doable without spending money.

It is more of a practical tool than a research project but the idea of using failure cases as curriculum is something I find genuinely interesting.

Would love to hear if anyone has done something similar.

Github project link is in comments below 👇

submitted by /u/gvij
[link] [comments]