Building a Shopify app with Claude Code — spec-driven development and pricing design

Dev.to / 5/3/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The author describes a solo-developer workflow for building a Shopify app using Claude Code, where they write detailed 200+ line spec documents and then review every line of the generated code.
The post emphasizes spec-driven, non-prompt development, covering architecture decisions like which GraphQL mutations to use, billing logic, usage tracking placement, and retry/error-handling behavior.
For the billing system, the spec included plan comparison details, a Prisma schema for usage tracking, the exact GraphQL query for subscription detection, explicit error-handling/idempotency rules, and UI pseudocode.
The billing implementation required multiple review iterations, catching issues such as quota bypass via mismatched counting, lack of idempotency causing double-counting on retries, and incorrect “soft cap” behavior instead of the intended hard-cap enforcement.
The overall goal is to enable users to properly trial the app before committing by designing pricing and throttling/billing behavior carefully based on real usage and edge cases.

This is part 3 of a series on building MetaBulkify, a Shopify app for bulk editing metaobjects via CSV.

Part 1: Excel data corruption and GID/handle resolution
Part 2: Shopify platform traps (scopes, throttling, dev store billing)

This post is about two things: how I build software as a solo developer with AI, and how I designed pricing to let users try the app properly before committing.

How I actually build things

I have a day job. I code side projects on evenings and weekends. Time is the scarcest resource.

My workflow: I write detailed spec documents, then delegate implementation to Claude Code. Not prompts. Not "build me an app." Actual specs — 200+ line instruction documents covering data models, edge cases, error handling, security considerations.

Then Claude Code writes the code, and I review every line.

This is not "AI wrote my app." I make every architecture decision:

Which GraphQL mutations to use
How billing works
Where to put the usage tracking table
What happens when Cloud Tasks retries a failed job
Whether to fail-open or fail-closed on API errors

Claude Code is a fast pair programmer who never gets tired and never pushes back when I ask for a third rewrite.

What a spec doc looks like

For the billing system, my spec included:

A plan comparison table (Free vs Pro, exact limits)
Prisma schema for the usage tracking table
The exact GraphQL query for subscription detection
Error handling rules (fail-closed, count attempts not successes)
Idempotency requirements (Cloud Tasks retries must not double-count)
UI mockups in pseudocode (which Polaris Web Components to use, where to show banners)
Test steps (including "manually set DB value to 9999 and verify block")

Claude Code implemented the whole thing. I reviewed, found three issues, sent detailed feedback. Claude Code revised. I reviewed again, found one more issue, sent feedback. Third revision was clean. Shipped.

Three review rounds for billing alone. The bugs I caught:

Counting basis mismatch — blocking based on total rows, but recording only successful rows. This lets users retry failed imports to bypass the quota.
No idempotency guard — Cloud Tasks retries could double-count usage. Fixed with an atomic updateMany WHERE clause.
Soft cap instead of hard cap — "warning only" on Pro tier quota exceeded. Changed to hard block because a limit you don't enforce isn't a limit.

Each of these was a product decision, not a coding error. Claude Code implemented what I asked for. I asked for the wrong thing. The review process caught it.

The spec-driven advantage

Writing specs before coding forces you to think about edge cases upfront. "What happens if the import worker crashes after recording usage but before updating the job status?" is a question that naturally comes up when writing the spec. It rarely comes up when coding.

It also creates a paper trail. When something breaks in production, I can look at the spec and see whether the implementation deviated from the plan, or whether the plan itself was wrong.

And honestly, writing specs is faster than writing code. A 200-line spec takes me 2-3 hours. The implementation takes Claude Code 20 minutes. My review takes another hour. Total: ~4 hours for a feature that would take me 2-3 days to code manually.

Pricing design: letting users try before they commit

Bulk operation apps have a specific challenge. People don't use them every day — they need them when they need them. The pricing has to reflect that.

My solution:

	Free	Pro ($9.99/mo)
CSV Export	Unlimited	Unlimited
CSV Import	50 rows (lifetime)	10,000 rows/month

The thinking behind each decision:

Exports are free and unlimited. I want people to try the app with zero risk. Export your data, check the CSV format, see if it fits your workflow. If it doesn't work for you, you haven't paid anything.

50 free import rows. Enough to run a real test — pick a metaobject type, import a few entries, verify everything works in your store. You can confirm the app handles your field types, reference fields, and data format correctly before spending anything.

No free trial on Pro. Instead of a time-limited trial where you're racing the clock, the free tier gives you a permanent sandbox. Take your time. Test on Monday, come back Thursday, test again. No pressure.

$9.99/month for Pro. This is a focused tool that does one thing. The price reflects that — it's not trying to be a Swiss Army knife.

The numbers

Real talk:

Revenue: $9.99 (one subscription — they uninstalled after 2 hours due to a billing detection bug, not a pricing problem)
Infrastructure cost: effectively $0/month at current scale
Development time: a few weeks of evenings and weekends
Spec docs written: probably more words than actual code

Tech stack

For the curious:

React Router v7 + Shopify App Remix
Polaris Web Components (<s-banner>, <s-button> — not the React library)
Neon Postgres + Prisma
GCP Cloud Run + Cloud Tasks
JSON-based i18n (English + Japanese)

The infrastructure cost is near zero at current scale. When that changes, it'll mean the app is making money — which is a good problem to have.

What I'd do differently

Test with Excel-edited CSVs from day one. The normalization issue (Part 1) would have surfaced immediately.
Test billing on a real store, not a dev store. The dev store billing trap (Part 2) cost me a customer.
Ship with fewer features. I18n could have waited. The dry-run preview was essential; everything else was nice-to-have.
Start marketing before the app is "done." It's never done. I should have posted in Shopify community forums and dev.to weeks earlier.