AI Navigate

Agnostic RAG system for full control on security and privacy

Reddit r/LocalLLaMA / 3/11/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • The author developed a modular Retrieval-Augmented Generation (RAG) system that enables seamless switching between local large language models (LLMs) for privacy and cloud APIs for power without changing core logic.
  • The system uses an orchestration layer powered by n8n to achieve provider-agnostic workflows, backed by a NestJS backend for user permissions and FastAPI for local embedding model handling.
  • The search infrastructure employs Qdrant for vector similarity search combined with PostgreSQL for metadata and document integrity, supporting hybrid searches.
  • The routing from embedding to LLM generation is flexible, allowing the use of local instances like Ollama or cloud providers like Gemini or OpenAI, with outputs including clickable citations referencing original documents.
  • The author seeks community feedback on whether using an orchestration tool like n8n offers valuable visual debugging and modularity benefits or constitutes over-engineering in the RAG pipeline context.

Hi everyone,

I’ve been working on a RAG (Retrieval-Augmented Generation) implementation where the core goal is total modularity. I wanted a system that doesn't care which LLM or Vector Store you use, allowing a seamless switch between local models (for privacy) and cloud APIs (for power) without refactoring the core logic.

I used my own RAD (Rapid Application Development) methodology to keep it lean. I’m curious to get some architectural feedback from this community.

The Stack:

  • Orchestration: n8n (This is the "brain" that makes it provider-agnostic).
  • Backend: NestJS (Handles user permissions and secure context access).
  • Vector Store: Qdrant + PostgreSQL for metadata/document integrity.
  • Embedding/Processing: FastAPI (Used to bridge local embedding models).
  • Frontend: Angular.

The Workflow:

  1. Secure Entry: NestJS validates the user and their specific data access permissions.
  2. Orchestration: Request triggers an n8n workflow.
  3. Local Embedding: n8n calls a FastAPI service to convert the query into a vector (supports local models to keep data private).
  4. Hybrid Search: Search in Qdrant (top_k=5) and fetch the actual text/metadata from Postgres.
  5. Agnostic Routing: n8n routes the prompt to the configured LLM (be it a local Ollama instance or a cloud provider like Gemini/OpenAI).
  6. Reliability: The LLM generates the answer with clickable citations [REF-1] mapped back to the source docs.

Community Question: Does using a workflow orchestrator like n8n for the RAG logic seem like over-engineering, or is the benefit of "visual" debugging and provider-agnosticism worth the extra layer?

Full diagrams and the logic are documented here: www.nospace.net

Feedback on the decoupling strategy is welcome!

submitted by /u/Apprehensive_Pear432
[link] [comments]