SSG: Logit-Balanced Vocabulary Partitioning for LLM Watermarking
arXiv cs.AI / 4/27/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies how LLM watermarking schemes like KGW can lose effectiveness in low-entropy generation tasks such as code generation and mathematical reasoning.
- It identifies that the “watermark strength” is governed by the next-token probability distribution, which limits how much token selection can be modified under random vocabulary partitioning.
- The authors propose SSG (Sort-then-Split by Groups), which partitions the vocabulary into two logit-balanced subsets to increase the per-token lower bound of watermark strength.
- Experiments on code and math reasoning datasets show that SSG improves watermark detectability compared with prior KGW-style partitioning approaches.
Related Articles

Subagents: The Building Block of Agentic AI
Dev.to

DeepSeek-V4 Models Could Change Global AI Race
AI Business

Got OpenAI's privacy filter model running on-device via ExecuTorch
Reddit r/LocalLLaMA

The Agent-Skill Illusion: Why Prompt-Based Control Fails in Multi-Agent Business Consulting Systems
Dev.to

We Built a Voice AI Receptionist in 8 Weeks — Every Decision We Made and Why
Dev.to