My Team Tracks AI-Generated Code. The Number Shocked Us.

Dev.to / 4/12/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The article describes how a team used Buildermark, an open-source Git-history scanner, to measure the share of AI-generated code and found striking levels (40% overall, with some files at ~90%).
  • The authors highlight three operational risks from heavy AI coding: unclear ownership when bugs appear, a review gap where AI changes get rubber-stamped, and increased “bus factor” risk when an AI provider’s quality changes.
  • After seeing the results, the team added a pre-commit hook to tag AI-generated lines and requires extra review when a PR exceeds 50% AI-coded content.
  • They also introduced an “AI debt” concept to track lines understood by only one person, often because the original prompt context isn’t documented.
  • The piece argues that raw AI-generated line counts are a vanity metric and proposes a more meaningful measure: the percentage of AI-written lines that reach production without human understanding (reported as 12% in their case).

My team tracks how much of our codebase is AI-generated. The number shocked us.

We deployed Buildermark last week. It's an open-source tool that scans Git history and flags AI-written lines.

Why We Started Measuring

Every startup has that moment.

You're reviewing a PR and realize you can't tell who wrote it. The human or the AI.

We hit 40% AI-generated code by volume. Some files were 90%.

The CTO asked for the report. Then asked what it meant.

Nobody had an answer.

The Three Problems Nobody Talks About

Problem 1: Ownership blur

When AI writes the fix, who owns the bug?

We found junior devs treating Claude output as gospel. They'd copy-paste without understanding.

Senior engineers would approve because "it looks fine."

Problem 2: The review gap

Human-written code gets scrutinized. AI-written code gets rubber-stamped.

We caught security issues in AI-generated config files. Stuff a human would never write.

Problem 3: The bus factor

If your AI provider degrades (like Claude did last month), your velocity tanks overnight.

We're now vendor-locked to Codeium's style. Claude's patterns. GitHub Copilot's idioms.

What We Changed This Week

We added a pre‑commit hook that tags AI‑generated lines.

Every PR shows the percentage in the description.

If it's over 50%, it needs extra review. No shortcuts.

We also started tracking "AI debt" – lines that only one person understands because they came from a prompt nobody wrote down.

The Real Metric That Matters

Lines of AI code is vanity.

The real metric is: How many AI‑generated lines survive to production without a human understanding them?

We're at 12%.

That's 12% of our codebase that could break and nobody would know why.

Is your team measuring AI code?

What percentage would surprise you?

👇