Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

Reddit r/artificial / 4/19/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The article describes an experiment to reduce LLM input context from ~80K tokens to ~2K for large codebases without using embeddings or a vector database.
  • Instead of RAG, it extracts structural signals (functions, classes, routes), builds a lightweight local index, and ranks files per query using token overlap, structural matches, and simple heuristics like recency and dependencies.
  • The “context layer” generated this way is small enough to fit typical model limits while still surfacing relevant files.
  • Reported observations across multiple repositories include about a 97% context reduction, relevant files appearing in the top-5 roughly 70–80% of the time, and a noticeable drop in the number of retries needed.
  • The author concludes that structured context can matter more than raw model size in many practical coding scenarios, while raising open questions about when heuristics break down and how to validate grounding in the provided context.

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models: - miss important files - reason over incomplete information - require multiple retries


Approach I explored

Instead of embeddings or RAG, I tried something simpler:

  1. Extract only structural signals:

    • functions
    • classes
    • routes
  2. Build a lightweight index (no external dependencies)

  3. Rank files per query using:

    • token overlap
    • structural signals
    • basic heuristics (recency, dependencies)
  4. Emit a small “context layer” (~2K tokens instead of ~80K)


Observations

Across multiple repos:

  • context size dropped ~97%
  • relevant files appeared in top-5 ~70–80% of the time
  • number of retries per task dropped noticeably

The biggest takeaway:

Structured context mattered more than model size in many cases.


Interesting constraint

I deliberately avoided: - embeddings - vector DBs - external services

Everything runs locally with simple parsing + ranking.


Open questions

  • How far can heuristic ranking go before embeddings become necessary?
  • Has anyone tried hybrid approaches (structure + embeddings)?
  • What’s the best way to verify that answers are grounded in provided context?

submitted by /u/Independent-Flow3408
[link] [comments]