Inference-Path Optimization via Circuit Duplication in Frozen Visual Transformers for Marine Species Classification

arXiv cs.CV / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper explores improving label-efficient marine species classification that uses frozen embeddings from self-supervised vision foundation models (e.g., DINOv3) without any fine-tuning or weight changes.
  • It applies “Circuit Duplication,” an inference-time technique from LLMs, duplicating a chosen range of transformer layers during the forward pass to strengthen representations.
  • On the class-imbalanced AQUA20 benchmark, both global and class-specific circuit selection outperform the standard single-pass frozen forward, with class-specific selection performing best.
  • With the highest label budget, class-specific selection achieves macro F1=0.875, nearly matching the fully supervised ConvNeXt benchmark (0.889) and nearly closing the gap without gradient-based training.
  • The results indicate strong class-dependent gains (about 75% of classes benefit from class-specific circuits), and the work claims the first application of Circuit Duplication to computer vision.

Abstract

Automated underwater species classification is constrained by annotation cost and environmental variation that limits the transferability of fully supervised models. Recent work has shown that frozen embeddings from self-supervised vision foundation models already provide a strong label-efficient baseline for marine image classification. Here we investigate whether this frozen-embedding regime can be improved at inference time, without fine-tuning or changing model weights. We apply Circuit Duplication, an inference-time method originally proposed for Large Language Models, in which a selected range of transformer layers is traversed twice during the forward pass. We evaluate on the class-imbalanced AQUA20 benchmark using frozen DINOv3 embeddings under two settings: global circuit selection, where a single duplicated circuit is chosen for the full dataset, and class-specific circuit selection, where each species may receive a different optimal circuit. Both settings use simple semi-supervised downstream classifiers. Circuit Duplication consistently improves over the standard frozen forward pass. At the maximum label budget, class-specific selection reaches a macro F1 of 0.875, closing the gap to the fully supervised ConvNeXt benchmark (0.889) to 1.4 points without any gradient-based training. Four species exceed their fully supervised reference, with octopus improving by +12.1 F1 points. Across all budgets, roughly 75% of classes prefer a class-specific circuit, indicating a genuinely class-dependent benefit. To our knowledge, this is the first application of Circuit Duplication to computer vision.