Efficient Logic Gate Networks for Video Copy Detection

arXiv cs.CV / 4/24/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper addresses large-scale video copy detection by improving similarity estimation across diverse visual distortions while meeting strict compute and memory constraints.
It introduces a framework using differentiable Logic Gate Networks (LGNs) to replace floating-point feature extractors with compact, logic-based representations.
The method combines frame miniaturization and binary preprocessing with a trainable LGN embedding model that learns logical operations and network interconnections.
After training, the model can be discretized into a purely Boolean circuit for fast, memory-efficient inference, with reported speeds over 11k samples per second and descriptor sizes several orders of magnitude smaller.
Extensive experiments compare similarity strategies, binarization schemes, and LGN architectures, showing competitive or superior accuracy and ranking versus prior approaches across multiple datasets and difficulty levels.

Abstract

Video copy detection requires robust similarity estimation under diverse visual distortions while operating at very large scale. Although deep neural networks achieve strong performance, their computational cost and descriptor size limit practical deployment in high-throughput systems. In this work, we propose a video copy detection framework based on differentiable Logic Gate Networks (LGNs), which replace conventional floating-point feature extractors with compact, logic-based representations. Our approach combines aggressive frame miniaturization, binary preprocessing, and a trainable LGN embedding model that learns both logical operations and interconnections. After training, the model can be discretized into a purely Boolean circuit, enabling extremely fast and memory-efficient inference. We systematically evaluate different similarity strategies, binarization schemes, and LGN architectures across multiple dataset folds and difficulty levels. Experimental results demonstrate that LGN-based models achieve competitive or superior accuracy and ranking performance compared to prior models, while producing descriptors several orders of magnitude smaller and delivering inference speeds exceeding 11k samples per second. These findings indicate that logic-based models offer a promising alternative for scalable and resource-efficient video copy detection.

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

Dev.to

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Dev.to

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Reddit r/LocalLLaMA

Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com

Dev.to

Qwen3.6 35B-A3B is quite useful on 780m iGPU (llama.cpp,vulkan)

Reddit r/LocalLLaMA

Efficient Logic Gate Networks for Video Copy Detection

Key Points

Abstract

Related Articles

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com

Qwen3.6 35B-A3B is quite useful on 780m iGPU (llama.cpp,vulkan)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer