kreuzcrawl, an open source Rust crawling engine with 11 language bindings

Reddit r/LocalLLaMA / 4/26/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • kreuzcrawl is a high-performance open-source web crawling engine built in Rust, focused on reliably extracting structured data across multiple languages.
  • It integrates an MCP server from the start and supports AI-agent use cases, with streaming crawl events for real-time progress tracking.
  • The engine can run batch crawls concurrently across hundreds of URLs while tolerating partial failures, making it more robust for large-scale crawling.
  • It includes browser rendering suited for JavaScript-heavy SPAs and features WAF detection to handle protected sites.
  • It offers direct language bindings—Rust, Python, TypeScript/Node.js, Go, Ruby, Java, C#, PHP, Elixir, WASM, and C FFI—each wired directly to the core engine.

kreuzcrawl is a high-performance web crawling engine. It was designed to reliably extract structured data, operating natively across multiple languages without enforcing a specific runtime. See here: https://github.com/kreuzberg-dev/kreuzcrawl

The MCP server is integrated from the start, enabling web-crawling AI agents as a primary use case. Streaming crawl events allow real-time progress tracking. Batch operations handle hundreds of URLs concurrently and tolerate partial failures. Browser rendering supports JavaScript-heavy SPAs and includes WAF detection.

Supported language interfaces are Rust, Python, Typescript/Node.js, Go, Ruby, Java, C#, PHP, Elixir, WASM, and C FFI, and each binding connects directly to the core engine.
Kreuzcrawl is part of the Kreuzberg org: https://kreuzberg.dev/

Feedback and contributions are welcome:)

submitted by /u/Eastern-Surround7763
[link] [comments]