What do you implement after Llama.cpp?

Reddit r/LocalLLaMA / 3/29/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • A Reddit user describes experimenting with llama-server, flags, models, and runtimes and then asks what to build next for a local “Claude-like” homelab AI stack.
  • They consider adopting Open WebUI for RAG/search capabilities as a next step after llama.cpp.
  • They also float building a workflow using frameworks such as LangGraph to orchestrate multi-step AI behaviors.
  • The post frames the decision as choosing a practical architecture for local hardware rather than focusing on a single model/runtime.

I'm having a lot of fun playing with llama-server testing various flags, models and runtimes. I'm starting to wonder what's next to build out my homelab AI stack. Do I use Open WebUI for RAG/Search? Should I take a stab at something like LangGraph? My goal is to create as something as close to Claude as I can using local hardware.

submitted by /u/ShaneBowen
[link] [comments]