Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs

Reddit r/artificial / 4/19/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The article describes an experiment to reduce LLM input context from ~80K tokens to ~2K for large codebases without using embeddings or a vector database.
Instead of RAG, it extracts structural signals (functions, classes, routes), builds a lightweight local index, and ranks files per query using token overlap, structural matches, and simple heuristics like recency and dependencies.
The “context layer” generated this way is small enough to fit typical model limits while still surfacing relevant files.
Reported observations across multiple repositories include about a 97% context reduction, relevant files appearing in the top-5 roughly 70–80% of the time, and a noticeable drop in the number of retries needed.
The author concludes that structured context can matter more than raw model size in many practical coding scenarios, while raising open questions about when heuristics break down and how to validate grounding in the provided context.

I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:

Even with good prompts, large repos don’t fit into context, so models: - miss important files - reason over incomplete information - require multiple retries

Approach I explored

Instead of embeddings or RAG, I tried something simpler:

Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)

Observations

Across multiple repos:

context size dropped ~97%
relevant files appeared in top-5 ~70–80% of the time
number of retries per task dropped noticeably

The biggest takeaway:

Structured context mattered more than model size in many cases.

Interesting constraint

I deliberately avoided: - embeddings - vector DBs - external services

Everything runs locally with simple parsing + ranking.

Open questions

How far can heuristic ranking go before embeddings become necessary?
Has anyone tried hybrid approaches (structure + embeddings)?
What’s the best way to verify that answers are grounded in provided context?

submitted by /u/Independent-Flow3408
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/19DailyView insight →

Black Hat USA

AI Business

Black Hat Asia

AI Business

Are we confusing Agent Execution Runtimes with true Agent Runtime Environments? [D]

Reddit r/MachineLearning

How to Debug AI-Generated Code: A Systematic Approach

Dev.to

"Browser OS" implemented by Qwen 3.6 35B: The best result I ever got from a local model

Reddit r/LocalLLaMA