Web use agent harness w/ 30x token reduction, 12x TTFT reduction w/ Qwen 3.5 9B on potato device (And no, I did not use vision capabilities)

Reddit r/LocalLLaMA / 3/28/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

Browser-use agents can consume excessive tokens and context when given raw rendered pages, so the author proposes compressing the rendered DOM into a markdown-like format before sending it to the agent.
In experiments with Qwen 3.5 9B on limited “potato” hardware, the approach reportedly cuts token consumption by ~32x versus raw DOM and reduces TTFT from ~106s to ~8.4s, while adding only about ~30ms of parsing time.
The project (“@tidesurf/core”, v0.3) includes 18 interactive page tools that work across models as long as the model supports tool calling, and it supports both CLI and MCP integrations.
The post emphasizes that results are based on early testing/experiments and invites additional community feedback to validate and improve the harness.

Web use agent harness w/ 30x token reduction, 12x TTFT reduction w/ Qwen 3.5 9B on potato device (And no, I did not use vision capabilities)

Browser use agents tend to prefer the models' native multimodality over concrete source, and, even if they do, they still tend to take too much context to even barely function.

I was running into this problem when using LLM Agents; Then I came up with an idea. What if I can just... send the rendered DOM to the agent, but with markdown-like compression?

Turns out, it works! It reduces token consumption by thirty-two times on GitHub (vs. raw DOM), at least according to my experiments, while only taking ~30ms to parse.

Also, it comes with 18 tools for LLMs to work interactively with pages, and they all work with whatever model you're using, as long as they have tool calling capabilities. It works with both CLI and MCP.

It's still an early project though, v0.3, so I'd like to hear more feedback.

npm: https://www.npmjs.com/package/@tidesurf/core
Brief explanation: https://tidesurf.org
GitHub: https://github.com/TideSurf/core
docs : https://tidesurf.org/docs

Expriment metrics
Model: https://huggingface.co/MercuriusDream/Qwen3.5-9B-MLX-lm-nvfp4
- Reasoning off
- Q8 KV Cache quant
- Other configs to default

Tested HW:
- MacBook Pro 14" Late 2021
- MacOS Tahoe 26.2
- M1 Pro, 14C GPU
- 16GB LPDDR5 Unified Memory

Tested env:
- LM Studio 0.4.7-b2
- LM Studio MLX runtime

Numbers (raw DOM v. TideSurf)
Tok/s: 24.788 vs 26.123
TTFT: 106.641s vs 8.442s
Gen: 9.117s vs 6.163s
PromptTok: 17,371 vs 3,312 // including tool def here, raw tokens < 1k
InfTok: 226 vs 161

edit: numbers

submitted by /u/MercuriusDream
[link] [comments]

Black Hat Asia

AI Business

Built a mortgage OCR system that hit 100% final accuracy in production (US/UK underwriting)

Reddit r/LocalLLaMA

# I Created a Pagination Challenge… And AI Missed the Real Problem

Dev.to

Xata Has a Free Serverless Database — PostgreSQL With Built-in Search, Analytics, and AI

Dev.to

The Real Stack Behind AI Agents in Production — MCP, Kubernetes, and What Nobody Tells You

Dev.to

Web use agent harness w/ 30x token reduction, 12x TTFT reduction w/ Qwen 3.5 9B on potato device (And no, I did not use vision capabilities)

Key Points

Related Articles

Black Hat Asia

Built a mortgage OCR system that hit 100% final accuracy in production (US/UK underwriting)

# I Created a Pagination Challenge… And AI Missed the Real Problem

Xata Has a Free Serverless Database — PostgreSQL With Built-in Search, Analytics, and AI

The Real Stack Behind AI Agents in Production — MCP, Kubernetes, and What Nobody Tells You

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer