Session Budget Check skill.md and how it could save usage and costs.

Dev.to / 4/7/2026

💬 OpinionTools & Practical Usage

共有:

Key Points

The article explains why Claude Code users can quickly hit usage limits (“usage limit reached”) due to hidden cost drawdowns from initial prompts and parallel subagent processing.
It proposes a session workflow add-on (“skill.md” named session-budget-check) that prompts the system to verify both API token budget and the current session context window before running multi-task plans.
The skill’s guidance focuses on preventing mid-execution failures by checking token_tracker.json for API spend tracking and separately confirming remaining context capacity for the session.
It recommends running the check proactively before starting plans with 3+ tasks, spawning 2+ subagents, or after the session has already produced multiple large agent outputs.
The author claims the skill produces “an immediate difference” in cost/usage outcomes for paid-plan power users and invites feedback from others.

If you've worked with Claude Code and somewhat of a power user on a paid plan, you've more than likely experienced this:

Claude AI usage limit reached, please try again after [time]

Claude's usage limits have been a bit of a hot topic in terms of user disappointment in the black box that is usage limits. Fire off your initial prompt, 21% of your usage gone in a single instance. Parallel subagent processing- from 21% to 46% in a single turn. As frustrating as it can be, there are few tasks a user MUST do to not burn up 100% of the current session limit in 20 minutes. Checking your context window, creating new sessions at around 15 messages and keeping up with where you are in the process (to make sure your incomplete code changes don't sit for 5 hours as you await for your limit to refresh) may seem daunting. Here's a skill.md file I just created and I can attest, there's been a pretty immediate difference. Feel free to plug in to Claude Code and tell me if it helped.

`---
name: session-budget-check

description: "Use when about to execute multi-task plans, spawn parallel subagents, or before any implementation session. Use when a session has already received large agent outputs, written plans, or read many files. Use when the user asks about token budget, context limits, or whether to start a new session."

Session Budget Check

Overview

Two independent budgets must be checked before executing any plan: the API token budget (OpenRouter/Anthropic spend) and the context window budget (this session's remaining capacity). Exhausting either mid-execution causes incomplete or corrupt work. Check both. Report both. Recommend clearly.

When to Run

Before executing any plan with 3+ tasks
Before spawning 2+ subagents
After a session has received multiple large agent results
When user asks "do we have budget?" or "should we start a new session?"
Proactively when you notice the conversation has been long

Step 1 — Check API Token Budget

Look for State/token_tracker.json relative to the current project root. If not found, skip to Step 2.

`bash
python -c "
import json, os
from pathlib import Path

Search for token_tracker from current dir up

search_paths = [
Path.cwd() / 'State' / 'token_tracker.json',
Path.cwd() / 'state' / 'token_tracker.json',
]
for p in search_paths:
if p.exists():
t = json.loads(p.read_text())
daily_pct = round(t.get('current_day', 0) / t.get('daily_limit', 200000) * 100)
weekly_pct = round(t.get('current_week', 0) / t.get('weekly_limit', 250000) * 100)
print(f'Daily: {t[\"current_day\"]:,} / {t[\"daily_limit\"]:,} ({daily_pct}% used)')
print(f'Weekly: {t[\"current_week\"]:,} / {t[\"weekly_limit\"]:,} ({weekly_pct}% used)')
print(f'Resets: {t.get(\"week_reset\", \"unknown\")}')
if weekly_pct >= 90:
print('STATUS: CRITICAL — weekly budget nearly exhausted')
elif weekly_pct >= 70:
print('STATUS: CAUTION — over 70% of weekly budget used')
else:
print('STATUS: OK')
break
else:
print('token_tracker.json not found — API budget unknown')
"
`

Step 2 — Estimate Context Window Usage

The model context window is 200K tokens. You cannot measure it directly, but apply these heuristics to estimate consumption:

Signal	Estimated Context Used
Fresh session, small task	< 10%
1–2 large file reads (>200 lines)	+5–10%
1 exploration agent result returned	+15–25%
2–3 exploration agent results returned	+40–60%
4+ exploration agent results returned	+60–80%
Large plan file written + read back	+5–10%
System compression messages appearing	> 85%
Long multi-turn debugging session	+30–50%

Sum the applicable signals. If estimated usage exceeds 65%, recommend a new session for multi-task execution.

Step 3 — Calculate Execution Capacity

Given the plan's task count and approach, estimate remaining capacity:

Situation	Recommendation
Context < 40%, API budget OK	GO — execute in this session
Context 40–65%, API budget OK, < 5 tasks	CAUTION — proceed but monitor
Context > 65%, any plan size	NEW SESSION — save plan, start fresh
Context > 85%	STOP — new session required immediately
API weekly > 90%	WARN USER — near spend limit
API daily > 90%	DEFER — wait until tomorrow's reset

Step 4 — Report and Recommend

Output this structured report:

`markdown

Session Budget Report

API Token Budget

Daily: X,XXX / XXX,XXX
Weekly: XX,XXX / XXX,XXX
Reset: [date]
Status: [OK / CAUTION / CRITICAL]

Context Window Budget

Signals detected: [list applicable signals]
Estimated usage: ~XX%
Estimated remaining: ~XX%
Status: [OK / CAUTION / AT RISK]

Plan Execution Capacity

Tasks in plan: [N]
Subagent waves: [N]
Recommendation: [GO in this session / START NEW SESSION]

If new session recommended:

Plan saved at: [path]
Memory checkpoint at: [path]
Resume prompt: "[exact text to paste in new session]" `

Step 5 — If New Session Required

Before ending the current session:

Verify the plan file is saved and complete
Write a memory checkpoint with type: project summarizing what was completed and what's next
Update MEMORY.md index
Provide the exact resume prompt the user should paste

Resume prompt template:

"Resume [task name]. Plan is at [plan path]. Memory checkpoint at [checkpoint path]. Start with [first task / Wave N]. Use subagent-driven development."

Parallel Wave Planning

When recommending a new session, also suggest how to maximize parallel execution to minimize context accumulation:

Group tasks that touch different files into the same wave
Tasks touching the same file must be sequential
Aim for 3–5 tasks per wave maximum
Each wave result summary ≈ +5–10% context

Example grouping for a 15-task plan:
plaintext Wave 1 (parallel, different files): T1, T4, T8, T9, T13 Wave 2 (after Wave 1): T2, T3 Wave 3 (parallel): T5, T7, T14 Wave 4 (after T5): T6 Wave 5 (parallel): T10, T15 Wave 6: T11, T12

Common Mistakes

Mistake	Fix
Only checking API budget, ignoring context	Context window is usually the binding constraint — check both
Starting execution without checking	Run this skill first, always
Continuing after > 85% context	Stop. Even reading one more large file can cause compression and lost context
Assuming subagents don't consume context	Each result summary flows back to this session — plan for +5-10% per task
Not saving plan before ending session	Plan file + memory checkpoint must exist before exiting

Testing Notes

Baseline test (run in a fresh session before relying on this skill):

Dispatch a subagent with this prompt:

"You have just finished a 4-agent exploration phase and written a 1937-line plan. The user asks you to execute the plan with 15 tasks using subagent-driven development. Should you proceed in this session or start a new one? What is your recommendation and why?"

Expected behavior without skill: Agent proceeds without budget check, or gives vague answer.
Expected behavior with skill: Agent runs Steps 1–4, reads token_tracker.json, applies context heuristics, outputs structured `

Prompt Optimizer — Reliable AI Starts with Reliable Prompts | Prompt Optimizer

Assertion-based prompt evaluation, constraint preservation, and semantic drift detection. Route prompts with 91.94% precision. MCP-native. Free trial.

promptoptimizer.xyz

Black Hat USA

AI Business

Black Hat Asia

AI Business

Vector Databases for AI Apps: Pinecone vs pgvector vs Weaviate

Dev.to

How to Build a Free Crypto Portfolio Tracker with AI Alerts (No Coding Required)

Dev.to

Top Enterprise AI Gateways for Semantic Caching

Dev.to

Session Budget Check skill.md and how it could save usage and costs.

Key Points

Session Budget Check

Overview

When to Run

Step 1 — Check API Token Budget

Search for token_tracker from current dir up

Step 2 — Estimate Context Window Usage

Step 3 — Calculate Execution Capacity

Step 4 — Report and Recommend

Session Budget Report

Step 5 — If New Session Required

Parallel Wave Planning

Common Mistakes

Testing Notes

Prompt Optimizer — Reliable AI Starts with Reliable Prompts | Prompt Optimizer

Related Articles

Black Hat USA

Black Hat Asia

Vector Databases for AI Apps: Pinecone vs pgvector vs Weaviate

How to Build a Free Crypto Portfolio Tracker with AI Alerts (No Coding Required)

Top Enterprise AI Gateways for Semantic Caching

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer