Looking for feedback on my Agentic RAG System

Reddit r/LocalLLaMA / 3/29/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • The author is seeking feedback on an “agentic RAG” system designed to be shippable rather than a basic upload-and-ask demo.
  • The system includes authenticated users with document ownership and document-scoped retrieval to reduce cross-document data leakage.
  • It uses an agent loop with tool calling where the retriever is exposed as a tool, along with query refinement, semantic caching, and pluggable embeddings with optional reranking.
  • The stack combines FastAPI, SQLAlchemy, Postgres (pgvector), and Chroma, with OpenAI/HuggingFace embeddings and an optional Cohere reranker, packaged via Docker.
  • It also features an evaluation pipeline with run history and case inspection plus a built-in UI for asking questions and running evaluations, with the repo provided for review.

Hey everyone,

I've been working on a RAG system and would really appreciate some feedback from people who have built or scaled similar systems.

This isn't just a basic "upload + ask" demo — I tried to design it more like something you'd actually ship.

What it does

  • Authenticated users with document ownership
  • Document-scoped retrieval (to avoid cross-doc leakage)
  • Agent loop with tool calling (retriever as a tool)
  • Query refinement + semantic cache
  • Pluggable embeddings + optional reranking
  • Evaluation pipeline with run history and case inspection
  • Built-in UI for asking questions and running evals

Tech stack

  • FastAPI + SQLAlchemy + Postgres (pgvector)
  • Chroma for vector storage
  • OpenAI / HuggingFace embeddings
  • Optional Cohere reranker
  • Dockerized setup

github repo : https://github.com/mahmoudsamy7729/agentic-rag

submitted by /u/Icy_Ant4265
[link] [comments]