The enforcement gap: why finding issues was never the problem

Dev.to / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • Eightfold reports using AI agents to reach WCAG 2.2 AA compliance in two months, but the article argues the real differentiator is post-detection enforcement: human review, criteria-based verification, and tracking to completion.
  • The piece notes a rapid shift where accessibility tooling is embedded directly into coding workflows (e.g., accessibility agents for IDE/chat assistants, real-time DevTools linting, and MCP integrations), reducing reliance on separate after-the-fact scans.
  • It claims “finding issues” is increasingly easy for AI, while the major challenge is proving that findings were fully fixed, verified, and auditable across time and contributors.
  • The article highlights the “enforcement gap” when agents fix partially, lack memory/context across sessions, produce inconsistent finding sets, and fail to generate reliable records of what was reviewed and when.
  • It frames the gap in three observable failure modes beginning with fixes that occur without adequate verification, underscoring that compliance requires governance, not just detection.

Eightfold, a talent intelligence platform, recently shared something remarkable: they used AI agents to achieve WCAG 2.2 AA compliance in two months. The same work would have taken six to ten months manually.

The headline is impressive. But the interesting part isn't how fast they found the issues. It's what happened after they found them: every fix was reviewed by humans, verified against criteria, and tracked through to completion. The AI did the finding. A system did the enforcement.

Most teams trying the same approach today will get the first part right and miss the second entirely.

Everyone is shipping accessibility agents

The landscape changed fast. In the last few months alone:

  • An open-source project called Community Access released 57 accessibility agents for Claude Code, GitHub Copilot, and Claude Desktop. They enforce WCAG 2.2 AA by intercepting every prompt and delegating to specialist reviewers.
  • BrowserStack launched accessibility DevTools that lint code in real time, detecting WCAG violations before a commit even happens.
  • Deque shipped an axe MCP server, connecting their scanning engine directly to AI coding assistants.
  • Siteimprove published a framework for what they call "agentic accessibility," where AI agents handle compliance tasks autonomously.

The pattern is clear: accessibility tooling is moving into the places where code is written. The era of running a separate scan after the fact is ending.

This is genuinely good. AI coding agents are better at finding structural accessibility issues than most developers working from memory. They don't forget to check heading hierarchy. They don't skip alt text on the fifteenth image. They apply rules consistently, across every file, every time.

Finding issues is a solved problem. The hard part was never finding.

What happens after finding

Here's the question nobody is answering well: once the AI finds 40 accessibility issues in your codebase, then what?

The AI fixes them. Some of them. In this session. And tomorrow, a new conversation starts with no memory of what happened. A different developer runs a different agent and gets a different set of findings. Nobody knows which issues from Tuesday were actually resolved. Nobody knows if the fixes introduced new problems. Nobody can produce a list of what was reviewed and when.

This is the enforcement gap: the space between "an AI found issues" and "we can prove those issues were fixed, verified, and tracked."

It shows up in three specific ways.

Fixes without verification

An AI agent adds aria-label="navigation" to a <nav> element. Is that the right label? Does it conflict with an existing aria-labelledby? Does the computed accessible name make sense in context? The AI made a change and moved on. Nobody checked whether the change actually improved anything.

Automated scanning tools can verify some of these fixes. APCA can check contrast ratios against perceptual thresholds. axe-core can validate the rendered DOM. But if these verification steps aren't connected to the fix workflow, they don't happen.

Findings without persistence

Every AI coding session starts from zero. The agent doesn't know that it found a keyboard trap in the modal component last Wednesday, that a developer partially fixed it on Thursday, or that the fix broke focus management in a different component on Friday.

Without persistent tracking, the same issues get rediscovered, re-reported, and re-fixed. Or worse: they get found once, partially addressed, and forgotten.

Evidence without structure

A compliance officer asks: "Which components were reviewed for accessibility? When? What issues were found? Were they resolved?"

If the answer lives in chat transcripts scattered across multiple AI sessions, it's not evidence. It's archaeology.

Why this matters now

Since June 2025, the European Accessibility Act has been in active enforcement across Europe. The penalties are real and vary by country: up to €250,000 in France, up to €600,000 in Spain, with some member states allowing consumers to initiate civil proceedings directly. In the US, ADA lawsuits continue to climb. The UK Equality Act already covers digital services.

The regulatory question has shifted from "should products be accessible?" to "can you demonstrate that they are?"

"We use AI agents that follow WCAG rules" is a statement about intent. Regulators want evidence of outcomes: what was checked, what was found, what was fixed, how it was verified. That requires a system, not a prompt.

What closing the loop looks like

The teams getting this right share a common pattern. They don't just find issues. They enforce a cycle:

Find. AI agents and automated scanners identify accessibility barriers in the code. Static analysis catches structural issues. Runtime scanning catches rendered DOM issues. Guided review handles the 40% that resists automation entirely: focus order, cognitive load, content reflow, color vision accessibility.

Report. Every finding goes into a persistent tracking system with the specific WCAG criterion, the affected component, the severity, and a plain-language explanation of who is impacted. Not a chat transcript. Not a terminal log. A structured record.

Fix. The developer (or the AI, with guidance) addresses the issue. The fix is scoped to the specific finding, not a vague "make it accessible" pass.

Verify. The fix is checked against quality criteria. Did the contrast ratio actually improve? Does the screen reader announce the correct label? Does keyboard navigation work through the component without traps? If the verification fails, the issue stays open.

Evidence. The entire cycle is preserved. What was found, when, by whom, how it was fixed, whether it passed verification. This is what an auditor can review. This is what survives team changes, deadline pressure, and the six months between audits.

Skip any step and the gap reappears. Find without report means invisible work. Fix without verify means false confidence. Verify without evidence means unprovable compliance.

The WCAG 3.0 signal

The W3C published a new Working Draft of WCAG 3.0 in March 2026, introducing 174 new outcomes. The shift from "success criteria" to "outcomes" isn't just naming. It reflects a move toward measuring results, not just checking boxes.

WCAG 3.0 won't be finalized until 2028 at the earliest. But the direction is clear: the next version of the world's accessibility standard will ask not just "does this element have an alt attribute?" but "does this content achieve an accessible outcome for the people who use it?"

That's an enforcement question, not a finding question. And it makes the gap between "AI found issues" and "we can prove outcomes" even wider for teams without a system to close it.

The landscape converging

Something interesting is happening. The accessibility tool market and the AI coding tool market are merging.

Deque, the biggest name in accessibility scanning, ships an MCP server. Community Access builds agents that run inside coding assistants. BrowserStack adds real-time accessibility checking to their DevTools. TestParty embeds remediation into GitHub workflows.

At the same time, MCP itself is maturing. The ecosystem hit 97 million monthly SDK downloads. Marketplace platforms like MCP-Hive are adding billing layers for commercial tool servers. The infrastructure for connecting AI agents to specialized tools is becoming production-ready.

The teams that treat accessibility as something the AI handles "by default" will discover the gap when a regulator or client asks for evidence. The teams that connect their AI to an enforcement system will have the answer ready.

Try it

Ask your AI coding assistant to review a component for accessibility. It will probably find real issues. Now ask it: which components have already been reviewed? What was found last week? Can you show me the evidence?

The silence after those questions is the enforcement gap.

Closing it is what turns "we care about accessibility" from a claim into a fact.

I'm building Jeikin, an accessibility compliance tool that works inside AI coding agents. Instead of overlays or separate audits, it checks your actual code and tracks evidence on a dashboard. Try it with npx jeikin.