Efficient Logic Gate Networks for Video Copy Detection

arXiv cs.CV / 4/24/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper addresses large-scale video copy detection by improving similarity estimation across diverse visual distortions while meeting strict compute and memory constraints.
  • It introduces a framework using differentiable Logic Gate Networks (LGNs) to replace floating-point feature extractors with compact, logic-based representations.
  • The method combines frame miniaturization and binary preprocessing with a trainable LGN embedding model that learns logical operations and network interconnections.
  • After training, the model can be discretized into a purely Boolean circuit for fast, memory-efficient inference, with reported speeds over 11k samples per second and descriptor sizes several orders of magnitude smaller.
  • Extensive experiments compare similarity strategies, binarization schemes, and LGN architectures, showing competitive or superior accuracy and ranking versus prior approaches across multiple datasets and difficulty levels.

Abstract

Video copy detection requires robust similarity estimation under diverse visual distortions while operating at very large scale. Although deep neural networks achieve strong performance, their computational cost and descriptor size limit practical deployment in high-throughput systems. In this work, we propose a video copy detection framework based on differentiable Logic Gate Networks (LGNs), which replace conventional floating-point feature extractors with compact, logic-based representations. Our approach combines aggressive frame miniaturization, binary preprocessing, and a trainable LGN embedding model that learns both logical operations and interconnections. After training, the model can be discretized into a purely Boolean circuit, enabling extremely fast and memory-efficient inference. We systematically evaluate different similarity strategies, binarization schemes, and LGN architectures across multiple dataset folds and difficulty levels. Experimental results demonstrate that LGN-based models achieve competitive or superior accuracy and ranking performance compared to prior models, while producing descriptors several orders of magnitude smaller and delivering inference speeds exceeding 11k samples per second. These findings indicate that logic-based models offer a promising alternative for scalable and resource-efficient video copy detection.