| Like many here, I kept running into Claude usage limits when building anything non-trivial. I was working with a job search automation pipeline (based on the Career-Ops project), and the naive flow was burning ~16k tokens per application — completely unsustainable. So I spent some time reworking it with a focus on token efficiency as a first-class concern, not an afterthought. 🚀 Results
⚡ What actually helped (practical takeaways)1. Prompt caching (biggest win)
👉 If you're re-sending the same context every time, you're wasting tokens. 2. Model routing instead of defaulting to Sonnet/Opus
👉 Most steps don’t need expensive models. 3. Precompute anything reusable
👉 Eliminated ~94% of LLM calls during form filling. 4. Avoid duplicate work
👉 Prevents burning tokens on the same content repeatedly. 5. Reduce “over-intelligence”
👉 Not everything needs full LLM reasoning. 🧠 Key insightMost Claude workflows hit limits not because they’re complex — 🧩 Curious about others’ setups
https://github.com/maddykws/jubilant-waddle Inspired by Santiago Fernández’s Career-Ops — this is a fork focused on efficiency + scaling under usage limits. [link] [comments] |
Cut Claude usage by ~85% in a job search pipeline (16k → 900 tokens/app) — here’s what worked
Reddit r/artificial / 4/8/2026
💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical Usage
Key Points
- The article describes how a job search automation pipeline initially consumed about 16k tokens per application and became unsustainable under Claude usage limits.
- By treating token efficiency as a first-class design constraint, the author reduced usage by roughly 85%, reaching about 900 tokens per application with improved stability.
- The biggest improvement came from prompt caching, specifically caching repeated system and profile context (with cache_control: ephemeral), which reduced repeated-operation token spend.
- It also improved cost/performance by routing tasks to different Claude models (Haiku/Sonnet/Opus) based on workload, rather than defaulting to the most expensive model.
- Additional savings were achieved through precomputing reusable “answer bank” responses, deduplicating repeated work (e.g., semantic TF-IDF filtering), and adding a lightweight classifier to avoid unnecessary deep reasoning.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.




