Lossless Compression via Chained Lightweight Neural Predictors with Information Inheritance

arXiv cs.LG / 4/20/2026

💬 OpinionModels & Research

Key Points

  • The paper presents a neural-network-based probability estimation architecture for lossless data compression, using a chain of lightweight predictors with the minimum number of weights needed for Markov sources of a specified order.
  • It argues that the chained design reduces the total number of parameters used in probability estimation by adapting to the statistical characteristics of the input data.
  • To further improve compression, the authors introduce “information inheritance,” where probability estimates produced by a lower-order predictor are passed to the next higher-order predictor.
  • Experiments show that the resulting lossless compressor achieves compression ratios near state-of-the-art PAC, while significantly improving encoding and decoding throughput on a consumer GPU.
  • Overall, the work combines efficient neural parameterization with hierarchical reuse of probabilistic information to deliver both competitive compression and faster processing.

Abstract

This paper is dedicated to lossless data compression with probability estimation using neural networks. First, we propose a probability estimation architecture based on a chain of neural predictors, so that each unit of the chain is defined as a neural network with the minimum possible number of weights, which is sufficient for efficient compression of data generated by Markov sources of a given order. We show that this architecture allows us to minimize the overall number of weights participating in the probability estimation process depending on the statistical properties of the input data. Second, in order to improve compression efficiency, we introduce an information inheritance mechanism, where the probability estimate obtained by a low-order unit is used at the next higher-order unit. Experimental results show that the proposed lossless data compressor equipped with the chained probability estimation architecture provides compression ratios close to the state-of-the-art PAC compressor. At the same time, it outperforms PAC by a factor of 1.2 to 6.3 in encoding throughput and by a factor of 2.8 to 12.3 in decoding throughput on a consumer GPU.