I've been going down a rabbit hole tinkering about what actually happens after you ship an LLM-powered app, and I'd love to hear how others here handle it…
A few things I keep getting stuck on:
Continuous optimization. Once your app is in users' hands, how often are you tweaking prompts, swapping models, retraining adapters, or rebuilding RAG pipelines? Is it a constant grind or do you reach a good-enough plateau?
Model bugs vs engineering bugs. When something breaks, how do you even tell whether it's the model hallucinating or regressing vs a plain old code or infra issue? Do you have evals catching it, or is it mostly user reports?
Do you also regularly update your evals or is it once built and forget about it workflow?
Your dev loop. Are you debugging and iterating with local models using harnesses like Pi, Hermes, Aider, or Cline? Or are you just leaning on Claude Code or Cursor and calling it a day? Anyone running a hybrid setup?
Curious whether the local-first crowd here has fundamentally different workflows from the API-only folks, especially around catching model regressions when you swap weights or quantizations.
What's working, what's painful, what would you change?
[link] [comments]



