[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

Reddit r/MachineLearning / 3/31/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The post introduces an open-source prototype that applies Unix philosophy to ML retrieval (RAG) pipelines by breaking processing into modular, swappable stages with typed contracts.
  • It treats each pipeline step—PII redaction, chunking, deduplication, embeddings, and evaluation—as an independent plugin, similar to Unix tools connected by pipes.
  • The motivation is improved debuggability: swapping a single component (e.g., the chunker) and re-running evaluation makes it easier to attribute changes in precision/recall to the correct stage.
  • The design encodes stage boundaries into a feature name convention using separators (e.g., `docs__...__evaluated`), ensuring that downstream components remain consistent while upstream options vary.
  • The authors emphasize the project is still a prototype and are seeking feedback on whether the underlying design assumptions hold up in practice.
We built an open-source prototype that applies Unix philosophy to retrieval pipelines. Each stage (PII redaction, chunking, dedup, embeddings, eval) is its own plugin with a typed contract, like pipes between Unix tools. The motivation: we swapped a chunker and retrieval got worse, but could not isolate whether it was the chunking or something breaking downstream. With each stage independently swappable, you change one option, re-run eval, and compare precision/recall directly. ```python Feature("docs__pii_redacted__chunked__deduped__embedded__evaluated", options={ "redaction_method": "presidio", "chunking_method": "sentence", "embedding_method": "tfidf", }) ``` Each `__` is a stage boundary. Swap any piece, the rest stays the same. Still a prototype, not production. Looking for feedback on whether the design assumptions hold up. Repo: [https://github.com/mloda-ai/rag_integration](https://github.com/mloda-ai/rag_integration) 
submitted by /u/coldoven
[link] [comments]