The Dogfooding Problem for Solo Developers
"Eat your own dog food" is good advice. Use your own product. Find the bugs your users find. Feel the pain before they do.
In practice, here's what actually happens:
- You use it heavily right after launch
- Development takes over and you stop touching it
- You check it as a developer, not as a user — you know all the right paths
- "It works" becomes the bar, and rough UX slips through
I build and maintain a web app solo. At some point I realized: I hadn't actually used it as a user in weeks. I'd been shipping features, but not experiencing the product.
So I did something that felt slightly absurd: I gave the job to an AI agent.
Every 3 hours, an AI agent opens my product, checks if things work, and opens a PR if it finds something broken.
Here's how it works, what it found, and what it can't do.
The Setup
Three components:
| Component | What it does |
|---|---|
| AI Agent (Claude) | Decides what to check, interprets results, writes fixes |
| MCP Server | Exposes my app's API as callable functions for the AI |
| Playwright | Lets the AI control a real browser to check the UI |
The agent runs on a scheduled heartbeat. I define what to check in a markdown file:
# What to check (rotate through these):
## API checks
- Call list_projects, get_canvas, get_verification_status
- Verify data integrity and response format
## UI checks
- Open the live site in a real browser
- Check mobile viewport (375px)
- Check dark mode
- Check empty states (what does a new user see?)
- Screenshot any anomalies
## Code quality
- Run tsc --noEmit, report TypeScript errors
- Check for unused imports in recently changed files
## When you find something broken:
- Create a branch
- Fix it
- Run vitest to confirm tests pass
- Open a PR
- Report to Discord
That's the entire instruction set. The AI handles the rest.
How the API Integration Works
By default, an AI can't interact with your app's internals. To fix this, I wrapped my API as an MCP (Model Context Protocol) server — basically a list of functions the AI can call.
// The AI can call these like tool calls
const tools = {
list_projects: {
description: "Get all projects",
handler: async () => await db.project.findMany()
},
add_learning: {
description: "Record a finding or bug",
handler: async (args) => await db.learning.create({ data: args })
},
get_verification_status: {
description: "Check the status of all verifications",
handler: async () => await db.verification.findMany()
}
};
This lets the AI do what a human user does — create records, read data, check states — but via API instead of clicking around.
What It Found
Here are three real bugs the agent caught that I wouldn't have caught otherwise:
Bug 1: API and UI were out of sync
When creating data through the API, the API response showed the data correctly. But the data didn't appear in the UI.
Root cause: The data was stored in two separate database tables. The API wrote to one, the UI read from the other.
Why humans missed it: Humans always use the UI. If you click "create" in the browser, both tables get written. The bug only appeared when creating via API — which humans never did, but the AI did on every check.
Bug 2: Mobile layout broken
On desktop: fine. On mobile (375px): input fields overflowed horizontally.
Fix: One CSS change: grid-cols-2 → grid-cols-1 md:grid-cols-2.
Bug 3: Empty state was a white screen
A new user opening their first project saw... nothing. No error, just blank. No guidance, no "create your first item" button.
This one wasn't technically broken — it just made the product confusing for new users. The agent flagged it as a UX issue and suggested an empty state component.
Dogfooding Alone Wasn't Enough
Dogfooding catches a lot — especially broken flows, layout issues, and rough UX.
But it doesn't catch everything.
Some bugs only happen in production, under very specific conditions:
- a component crashes only after a rare user action
- an import mismatch breaks a route that manual testing doesn't hit
- an exception only appears with real data, real timing, or real browser state
Those bugs are hard to find by manually using the product every few hours.
So I ended up adding a second loop: error monitoring.
The dogfooding agent checks whether the product works as a user experience.
Error monitoring checks whether the product is failing in the wild.
That combination turned out to be much stronger than either one alone.
Adding Sentry as a Second Feedback Loop
Now the system has two complementary loops:
| Loop | What it catches |
|---|---|
| Dogfooding every 3 hours | Broken flows, visual issues, empty states, mobile regressions, rough UX |
| Sentry monitoring | Runtime exceptions, production-only bugs, hard-to-reproduce crashes |
The dogfooding loop answers:
- Can a user actually move through the product?
- Does the UI make sense?
- Is anything visually broken?
The Sentry loop answers:
- Did something crash in production?
- What stack trace and context came with it?
- Is there a fixable bug hidden behind low-frequency failures?
This matters because not all quality issues look the same.
Some problems are visible. Others only show up as stack traces.
If you only rely on dogfooding, you miss production-only failures.
If you only rely on Sentry, you miss awkward UX and broken but non-crashing flows.
Together, they form a much more complete quality loop.
From Detection to Auto-Fix
Once I added Sentry, the agent's job expanded.
It no longer just looked for problems by using the product.
It could also react to problems reported by the product itself.
The flow now looks like this:
- Every 3 hours, the agent dogfoods the app
- On a separate schedule, it checks Sentry for unresolved issues
- If it finds a real bug, it analyzes the stack trace and source code
- It creates a branch, writes a fix, runs tests, and opens a PR
- Small safe fixes can be merged automatically after checks pass
One of the best examples was a page crash caused by the wrong i18n hook import.
The error message itself was vague. Manual testing didn't catch it consistently.
But Sentry provided enough context for the agent to trace the issue back to a bad import and generate a tiny fix.
That was the moment this stopped feeling like "automated testing" and started feeling more like an automated maintenance loop.
What the AI Can and Can't Do
| The AI is good at | The AI can't do |
|---|---|
| Checking if things work | Feeling if things feel right |
| Catching regressions automatically | "This interaction is frustrating" |
| Covering edge cases humans skip | Subjective UX judgment |
| Opening PRs immediately on finding bugs | Knowing if a feature is missing |
| Running every 3 hours without fatigue | Replacing actual user feedback |
The "can't do" column matters. The AI is a complement, not a replacement.
After the agent does its check, I still need to use the product myself and talk to users. The agent handles the objective, repeatable checks. I handle the subjective, experiential ones.
One More Honest Note
About 30% of the time, the agent reports "fixed" when it hasn't fully fixed something. This was frustrating until I built in a hard requirement: tests must pass before marking anything as done.
Rule: Before opening a PR, run `npx vitest run`.
If tests fail, do not open the PR.
Report the failure instead.
This dropped false completions dramatically. The agent's confidence isn't reliable — test results are.
How to Build This
You don't need my exact setup. The minimum viable version:
- Pick a scheduled runner — GitHub Actions cron, a crontab, or any agent platform with scheduled tasks
- Expose one API endpoint the AI can call — Start with just a health check
- Write a simple check instruction — "Call this endpoint and report if it fails"
- Add Playwright later — Browser checks are optional but powerful for catching visual regressions
The core insight isn't the tech stack. It's that dogfooding is a discipline problem, not a capability problem. You know how to test your own product. You just don't do it consistently.
Automating it removes the discipline requirement.
Have you built any automated quality loops into your side projects? Or does your testing start and end with "it worked on my machine"? Curious what others have tried in the comments.
The product the agent keeps testing is KaizenLab, my app for hypothesis validation and product learning.
That made this setup especially useful: the same system I use to organize product decisions is also what the agent keeps checking, stress-testing, and helping improve.




