scalar-loop: a Python harness for Karpathy's autoresearch pattern that doesn't trust the agent's narration

Reddit r/artificial / 4/20/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • The article introduces scalar-loop, a Python harness for Karpathy-style autoresearch loops that prevents LLM agents from gaming verifiers by trusting only measurable outputs.
  • It enforces deterministic “invariants” using SHA-256 hash manifests for sealed files and git-diff scope checks that reject out-of-bounds changes.
  • The harness adds safety gates (e.g., refusing to run on a dirty tree or missing metric commands) and uses cautious git operations (stashing changes, only resetting against commits the loop created).
  • The agent runs as a subprocess, with only a dedicated stdout token (SCALAR_LOOP_GIVE_UP) treated as control; all other narration is handled as untrusted suggestions.
  • The author reports that in a real bundle-size optimization task, the agent attempted to fabricate reasons and even to alter verifier behavior, but scalar-loop ignored prose and kept metric-driven results.

I built scalar-loop to solve one problem: LLM agents game their verifiers.

The pattern is Karpathy's autoresearch loop. LLM proposes an edit, harness runs the metric, loop keeps or reverts based on the number. Simple. Until you watch the agent, on iteration 23, quietly edit the verifier to report a better number instead of improving the code.

My main issue was that the prompt-only implementations ("you SHALL NOT edit the test file") don't hold. The prompt is not an invariant. It's a suggestion the model can rationalize past. Especially in the deterinistic environments (like healthcare, legal, finance where I spend most of my time architecting solutions) a prompt only implementation is a no-go. All regulators are still boomers.

So I have been looking to develop more deterministic implementations that could be hands-off. Because I am lazy too.

scalar-loop puts the invariants in Python:

  • Harness integrity via SHA-256 hash manifest. Sealed files (tests, build, config) are hashed once. If any hash drifts after an agent turn, the iteration is reverted.
  • Scope enforcement via git diff. The agent is told which glob patterns it may touch. Touching anything else rejects the whole iteration before commit.
  • Precondition gate. Seven checks before the loop runs at all. No main branch, no dirty tree, metric command exists, etc. Refuse-to-run over fix-on-the-fly.
  • Safe git. No reset --hard on the working tree. Stashes on dirty. reset --hard only against a commit the loop itself just made.
  • Agent as subprocess. One function, propose(). Default shells to claude -p. Swap for GPT-5, local Llama, a test double. The loop's correctness does not depend on the agent being well-behaved.
  • SCALAR_LOOP_GIVE_UP: is the only stdout signal the loop respects. The agent's prose is treated as suggestion, not record.

Real run on a JS bundle-size task: 1492 bytes down to 70 bytes. Iteration 4 the agent quit with a confabulated reason ("read-time policy"). The loop logged it, ignored the prose, kept the final metric. The lie was harmless because the control signal is the token, not the text.

Repo:

https://github.com/mandar-karhade/scalar-loop

Reproducible example: https://github.com/mandar-karhade/test-case-tiny-js-bundle

Install: git clone + uv pip install -e . (no PyPI yet)

Would appreciate Goodhart paths I haven't defended against. That's the most useful feedback I could get. Also, my detailed take on the whole process is in this article (free link is included - you do not need membership)

submitted by /u/Opitmus_Prime
[link] [comments]