Hi everyone,
I’ve been working on a RAG (Retrieval-Augmented Generation) implementation where the core goal is total modularity. I wanted a system that doesn't care which LLM or Vector Store you use, allowing a seamless switch between local models (for privacy) and cloud APIs (for power) without refactoring the core logic.
I used my own RAD (Rapid Application Development) methodology to keep it lean. I’m curious to get some architectural feedback from this community.
The Stack:
- Orchestration: n8n (This is the "brain" that makes it provider-agnostic).
- Backend: NestJS (Handles user permissions and secure context access).
- Vector Store: Qdrant + PostgreSQL for metadata/document integrity.
- Embedding/Processing: FastAPI (Used to bridge local embedding models).
- Frontend: Angular.
The Workflow:
- Secure Entry: NestJS validates the user and their specific data access permissions.
- Orchestration: Request triggers an n8n workflow.
- Local Embedding: n8n calls a FastAPI service to convert the query into a vector (supports local models to keep data private).
- Hybrid Search: Search in Qdrant (top_k=5) and fetch the actual text/metadata from Postgres.
- Agnostic Routing: n8n routes the prompt to the configured LLM (be it a local Ollama instance or a cloud provider like Gemini/OpenAI).
- Reliability: The LLM generates the answer with clickable citations [REF-1] mapped back to the source docs.
Community Question: Does using a workflow orchestrator like n8n for the RAG logic seem like over-engineering, or is the benefit of "visual" debugging and provider-agnosticism worth the extra layer?
Full diagrams and the logic are documented here: www.nospace.net
Feedback on the decoupling strategy is welcome!
[link] [comments]