Agnostic RAG system for full control on security and privacy

Reddit r/LocalLLaMA / 3/11/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

The author developed a modular Retrieval-Augmented Generation (RAG) system that enables seamless switching between local large language models (LLMs) for privacy and cloud APIs for power without changing core logic.
The system uses an orchestration layer powered by n8n to achieve provider-agnostic workflows, backed by a NestJS backend for user permissions and FastAPI for local embedding model handling.
The search infrastructure employs Qdrant for vector similarity search combined with PostgreSQL for metadata and document integrity, supporting hybrid searches.
The routing from embedding to LLM generation is flexible, allowing the use of local instances like Ollama or cloud providers like Gemini or OpenAI, with outputs including clickable citations referencing original documents.
The author seeks community feedback on whether using an orchestration tool like n8n offers valuable visual debugging and modularity benefits or constitutes over-engineering in the RAG pipeline context.

Hi everyone,

I’ve been working on a RAG (Retrieval-Augmented Generation) implementation where the core goal is total modularity. I wanted a system that doesn't care which LLM or Vector Store you use, allowing a seamless switch between local models (for privacy) and cloud APIs (for power) without refactoring the core logic.

I used my own RAD (Rapid Application Development) methodology to keep it lean. I’m curious to get some architectural feedback from this community.

The Stack:

Orchestration: n8n (This is the "brain" that makes it provider-agnostic).
Backend: NestJS (Handles user permissions and secure context access).
Vector Store: Qdrant + PostgreSQL for metadata/document integrity.
Embedding/Processing: FastAPI (Used to bridge local embedding models).
Frontend: Angular.

The Workflow:

Secure Entry: NestJS validates the user and their specific data access permissions.
Orchestration: Request triggers an n8n workflow.
Local Embedding: n8n calls a FastAPI service to convert the query into a vector (supports local models to keep data private).
Hybrid Search: Search in Qdrant (top_k=5) and fetch the actual text/metadata from Postgres.
Agnostic Routing: n8n routes the prompt to the configured LLM (be it a local Ollama instance or a cloud provider like Gemini/OpenAI).
Reliability: The LLM generates the answer with clickable citations [REF-1] mapped back to the source docs.

Community Question: Does using a workflow orchestrator like n8n for the RAG logic seem like over-engineering, or is the benefit of "visual" debugging and provider-agnosticism worth the extra layer?

Full diagrams and the logic are documented here: www.nospace.net

Feedback on the decoupling strategy is welcome!

submitted by /u/Apprehensive_Pear432
[link] [comments]

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

日経XTECH

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

Dev.to

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents

Dev.to

Perplexity Hub

Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

Dev.to

Agnostic RAG system for full control on security and privacy

Key Points

Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents

Perplexity Hub

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer