Efficient Provably Secure Linguistic Steganography via Range Coding

arXiv cs.CL / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses provably secure linguistic steganography for language-model-generated text, aiming to maintain security while improving embedding capacity and efficiency over earlier KL-divergence–perfect methods.
  • It uses range coding as the core mechanism and introduces an additional rotation mechanism to yield an efficient, provably secure steganographic scheme.
  • Experiments across multiple language models show roughly 100% entropy utilization (high embedding efficiency) and better performance than baseline provably secure approaches.
  • Reported embedding speeds reach up to 1554.66 bits/s on GPT-2, indicating the approach is practical in addition to being theoretically grounded.
  • The authors provide released code on GitHub to enable replication and further experimentation.

Abstract

Linguistic steganography involves embedding secret messages within seemingly innocuous texts to enable covert communication. Provable security, which is a long-standing goal and key motivation, has been extended to language-model-based steganography. Previous provably secure approaches have achieved perfect imperceptibility, measured by zero Kullback-Leibler (KL) divergence, but at the expense of embedding capacity. In this paper, we attempt to directly use a classic entropy coding method (range coding) to achieve secure steganography, and then propose an efficient and provably secure linguistic steganographic method with a rotation mechanism. Experiments across various language models show that our method achieves around 100% entropy utilization (embedding efficiency) for embedding capacity, outperforming the existing baseline methods. Moreover, it achieves high embedding speeds (up to 1554.66 bits/s on GPT-2). The code is available at github.com/ryehr/RRC_steganography.