Built a zero allocation, header only C++ Qwen tokenizer that is nearly 20x faster than openai Tiktoken

Reddit r/LocalLLaMA / 4/4/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

A developer created “Frokenizer,” a header-only, zero-allocation C++ tokenizer specifically hardcoded for the Qwen tokenizer format and aimed at LLM developers.
The author claims it achieves throughput around 1009 MB/s on a 12-thread Ryzen 5 3600 versus about 50 MB/s for OpenAI’s Tiktoken in a 1 GB English corpus test.
The project is positioned as an educational/HPC-focused benchmark effort to understand BPE tokenization and optimization techniques rather than as a production necessity (tokenization is noted as often under ~2% of inference time).
Benchmarks and implementation details are provided via the GitHub repository linked in the post for others to test and validate.

Built a zero allocation, header only C++ Qwen tokenizer that is nearly 20x faster than openai Tiktoken

I'm into HPC, and C++ static, zero allocation and zero dependancy software. I was studying BPE tokenizers, how do they work, so decided to build that project. I hardcoded qwen tokenizer for LLMs developers.

I really know that whole Tokenization phase in llm inference is worth less than 2% of whole time, so practically negligible, but I just "love" to do that kind of programming, it's just an educational project for me to learn and build some intuition.

Surprisingly after combining multiple different optimization techniques, it scored really high numbers in benchmarks. I thought it was a fluke at first, tried different tests, and so far it completely holds up.

For a 12 threads Ryzen 5 3600 desktop CPU, 1 GB of English Text Corpus:
- Mine Frokenizer: 1009 MB/s
- OpenAI Tiktoken: ~ 50 MB/s

For code, tests and benchmarking:
https://github.com/yassa9/frokenizer

submitted by /u/yassa9
[link] [comments]

Black Hat USA

AI Business

Black Hat Asia

AI Business

Claude Code’s Source Leaks, OpenAI Exits Video Generation, Gemini Adds Music Generation, LLMs Learn at Inference

The Batch

MCP Observability: Logging, Auditing, and Debugging Agent-Server Interactions in Production

Dev.to

Why OpenClaw Agents Lose Their Minds Mid-Session (And What It Takes to Fix It)

Dev.to

Built a zero allocation, header only C++ Qwen tokenizer that is nearly 20x faster than openai Tiktoken

Key Points

Related Articles

Black Hat USA

Black Hat Asia

Claude Code’s Source Leaks, OpenAI Exits Video Generation, Gemini Adds Music Generation, LLMs Learn at Inference

MCP Observability: Logging, Auditing, and Debugging Agent-Server Interactions in Production

Why OpenClaw Agents Lose Their Minds Mid-Session (And What It Takes to Fix It)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer