I’ve been experimenting with a problem I kept hitting when using LLMs on real codebases:
Even with good prompts, large repos don’t fit into context, so models: - miss important files - reason over incomplete information - require multiple retries
Approach I explored
Instead of embeddings or RAG, I tried something simpler:
Extract only structural signals:
- functions
- classes
- routes
Build a lightweight index (no external dependencies)
Rank files per query using:
- token overlap
- structural signals
- basic heuristics (recency, dependencies)
Emit a small “context layer” (~2K tokens instead of ~80K)
Observations
Across multiple repos:
- context size dropped ~97%
- relevant files appeared in top-5 ~70–80% of the time
- number of retries per task dropped noticeably
The biggest takeaway:
Structured context mattered more than model size in many cases.
Interesting constraint
I deliberately avoided: - embeddings - vector DBs - external services
Everything runs locally with simple parsing + ranking.
Open questions
- How far can heuristic ranking go before embeddings become necessary?
- Has anyone tried hybrid approaches (structure + embeddings)?
- What’s the best way to verify that answers are grounded in provided context?
[link] [comments]



