I Built a 100% Browser-Based OCR That Never Uploads Your Documents — Here's How

Dev.to / 4/11/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The article introduces DoctorDocs, a free OCR web platform designed to perform all document processing in the user’s browser so images and sensitive files are never uploaded to external servers.
It explains a “thick-client / thin-server” architecture where Next.js mainly serves the web app, while OpenCV.js and Tesseract.js run as WebAssembly modules on the client.
The OCR pipeline uses client-side preprocessing (binarization, shadow removal, contrast enhancement) followed by Tesseract.js LSTM-based OCR executed with multi-threading via Web Workers.
A key feature, “Enhance,” applies adaptive Gaussian thresholding (local 31×31 neighborhood processing) to handle uneven lighting and improve readability before OCR.
The author shares practical development lessons from shipping a production-grade WebAssembly OCR workflow, emphasizing privacy and data minimization.

Your medical prescriptions, passports, and bank statements deserve better than being uploaded to someone else's server.

I'm a developer from India, and I built DoctorDocs — a free OCR platform where every single byte of processing happens in your browser. No uploads. No servers. No data collection. Your documents never leave your device.

Here's why I built it, how it works under the hood, and what I learned shipping a WebAssembly-powered app to production.

The Problem That Made Me Angry

My grandmother needed to read a doctor's prescription. The handwriting was illegible — even the pharmacist squinted at it. I thought, "surely there's a free tool online for this."

There is. Dozens of them. And every single one requires you to upload your medical prescription to their server. Think about that — your name, your medications, your diagnosis, sitting on some random company's S3 bucket.

Google Lens works great, but it sends your image to Google's servers. Adobe Scan requires an account. Every "free OCR" tool I found was actually "free to upload your sensitive documents to our cloud."

I decided to build one that works differently.

The Architecture: Zero Server Processing

DoctorDocs runs on a thick-client / thin-server architecture built with Next.js 15. The "thin server" part? It just serves the static HTML/JS. All the actual OCR processing runs in your browser using WebAssembly.

Here's the pipeline:

User drops image
    ↓
OpenCV.js (WASM) → Binarization, shadow removal, contrast enhancement
    ↓
Tesseract.js (WASM) → LSTM neural network OCR, multi-threaded via Web Workers
    ↓
Custom text formatter → Noise reduction, error correction
    ↓
Monaco editor → Edit, copy, or export to PDF

Every step runs on the client's CPU. The server never sees the image.

The Magic Enhance Feature

The #1 problem with phone camera OCR is uneven lighting. You photograph a prescription under a desk lamp, and half the page is bright while the other half is in shadow.

Most tools just crank up the brightness globally. That makes the bright parts white and the dark parts... still dark.

I used OpenCV.js to run adaptive Gaussian thresholding — it breaks the image into 31×31 pixel neighborhoods and adjusts each one relative to its local area. Shadows disappear. Text becomes crisp. It's the same algorithm used in industrial document scanners, running in your browser via WebAssembly.

// This runs entirely in the browser via OpenCV.js WASM
cv.adaptiveThreshold(
  grayMat,
  binaryMat,
  255,
  cv.ADAPTIVE_THRESH_GAUSSIAN_C,
  cv.THRESH_BINARY,
  31,  // block size
  15   // constant
);

Multi-Threaded OCR

Tesseract.js is powerful but slow on a single thread. So I query navigator.hardwareConcurrency to detect CPU cores and spin up a worker pool:

const cores = navigator.hardwareConcurrency || 2;
const workerCount = Math.min(Math.max(cores - 1, 1), 4);

// Each worker loads the eng_best LSTM model
const worker = await createWorker('eng', OEM.LSTM_ONLY, {
  corePath: 'tesseract-core-lstm.wasm.js',
  langPath: '4.0.0_best',  // Deep learning model, not the fast one
});

On a modern laptop, this cuts processing time by 60-70% compared to single-threaded OCR.

150+ Tool Pages, One Engine

DoctorDocs has 144 statically generated tool pages — /tools/handwriting-to-text, /tools/prescription-ocr, /tools/receipt-scanner, etc. They all use the same Tesseract.js engine under the hood.

"Isn't that cheating?" — No. It's the exact strategy Smallpdf and ILovePDF use. The OCR engine doesn't change, but the SEO metadata, titles, FAQs, and use-case descriptions do. Each page targets a different search keyword.

// generateStaticParams() SSGs all 144 pages at build time
export async function generateStaticParams() {
  return TOOLS_CATALOG.map((tool) => ({ slug: tool.slug }));
}

Every tool page auto-generates a "You Might Also Like" section linking to 6 related tools, creating an internal link mesh across all pages.

Beyond OCR: The Tools That Run Locally

DoctorDocs isn't just OCR. It includes 9 PDF utilities and 5 image editing tools, all client-side:

PDF Tools (powered by pdf-lib + pdf.js):

Merge, Split, Compress, Watermark, Rotate PDFs
Extract/Remove pages
Image to PDF, PDF to JPG

Image Tools (powered by HTML Canvas):

Crop, Brighten, Black & White, AI Upscale

AI Tools (powered by @xenova/transformers):

AI Text Detector — runs a 300MB RoBERTa model in the browser via WebGL
AI Text Writer
AI Summarizer

Every single one runs without uploading anything.

The Self-Learning OCR Pipeline

This is the part I'm most excited about. DoctorDocs implements a three-tier OCR system that learns from every user interaction:

Tier 1: Gemini 2.5 Flash — When available, the image is sent to Google's Gemini API for enterprise-grade accuracy. This is opt-in and only used when API keys are configured.

Tier 2: TrOCR Vision Transformer — Runs entirely in the browser as a "shadow model." It processes the same image in the background, and its output is compared against Tier 1 for training purposes.

Tier 3: Tesseract.js — The offline fallback. Always works, even without internet.

When a user copies or downloads the text, the system captures the diff between the AI output and the user's corrected version. This ground truth data feeds future model training — making the OCR better over time.

What I Learned

WebAssembly is production-ready for heavy compute. Running a C++ OCR engine in the browser via WASM sounds crazy, but it works reliably across all modern browsers. The eng_best LSTM model uses ~500MB RAM but delivers vastly better results than the fast model.

Privacy is a real feature, not just marketing. When I tell people "your prescription never leaves your phone," they visibly relax. In India especially, where data privacy concerns are high but digital literacy varies, this matters.

SEO takes time. The site has been live for 3+ months and traffic is still building. If you're building a tool site, start promoting it on day one — don't wait until it's "perfect."

Client-side architecture eliminates your biggest cost. My hosting bill is $0. Vercel free tier serves the static assets. All compute runs on the user's device. I could handle 100,000 users without paying a cent for servers.

Try It

doctordocs.in — completely free, no sign-up required.

Drop a photo of a handwritten prescription, an old letter, a receipt, or any document. Watch the text appear — processed entirely on your device.

The entire project is built with Next.js 15, TailwindCSS, Tesseract.js, OpenCV.js, and Transformers.js. If you're interested in the technical architecture, I've documented everything in a detailed project report.

What do you think? Have you built anything with WebAssembly in the browser? I'd love to hear about your experiences in the comments.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/11DailyView insight →

Black Hat USA

AI Business

Black Hat Asia

AI Business

GLM 5.1 tops the code arena rankings for open models

Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

My Bestie Built a Free MCP Server for Job Search — Here's How It Works