Peek2: Regex-free Byte-level Byte-Pair Encoding Pretokenizer for LLM Inference on Edge Devices

arXiv cs.CL / 5/4/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • Peek2 is a regex-free, byte-level BPE pretokenizer designed to replace cl100k-like pretoknizers used by GPT-3, LLaMA-3, and Qwen-2.5.
  • The work analyzes the existing cl100k pretokenization logic and proposes a new algorithm with linear time complexity and constant, minimal memory usage to better fit edge-device inference.
  • Benchmarks indicate up to a 2.48× increase in pretokenization microbenchmark throughput.
  • Across end-to-end Byte-level BPE encoding, Peek2 can improve overall throughput by about 1.14× depending on the dataset while matching the baseline’s output exactly.

Abstract

Pretokenization is a crucial, sequential pass in Byte-level BPE tokenizers, yet little work has been done to optimize it for edge-side inference. Our proposed new implementation, Peek2, serves as a drop-in replacement for cl100k-like pretokenizers used in GPT-3, LLaMa-3, and Qwen-2.5. After breaking down and analyzing the logic of the original cl100k pretokenizer, we introduced a new pretokenization algorithm with linear time complexity and constant, trivial memory usage, suited for edge scenarios. Test results show that it increases microbenchmarking throughput by up to 2.48\times and delivers a 1.14\times improvement in overall throughput across the entire Byte-level BPE encoding process, depending on the dataset, while providing identical results as the baseline Regex-based tokenizer.