ProtoTTA: Prototype-Guided Test-Time Adaptation

arXiv cs.LG / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • ProtoTTA is a new test-time adaptation framework for prototypical (prototype-interpretable) deep networks that targets robustness under distribution shift.
  • Instead of adapting using only model outputs, ProtoTTA leverages intermediate prototype-similarity signals and reduces entropy of the prototype-similarity distribution to drive confident, prototype-specific activations on shifted data.
  • To avoid unstable updates, it uses geometric filtering to update only samples with reliable prototype activations, guided by prototype-importance weights and model-confidence scores.
  • Experiments on multiple prototypical backbones and diverse benchmarks (fine-grained vision, histopathology, and NLP) show improved robustness versus standard output-entropy minimization, and better restoration of semantic focus in prototype activations.
  • The paper also proposes interpretability metrics and a VLM-based evaluation framework to analyze TTA dynamics, indicating ProtoTTA realigns prototype semantics with human expectations and correlates with VLM-rated reasoning quality.
  • The authors provide code publicly at the linked GitHub repository.

Abstract

Deep networks that rely on prototypes-interpretable representations that can be related to the model input-have gained significant attention for balancing high accuracy with inherent interpretability, which makes them suitable for critical domains such as healthcare. However, these models are limited by their reliance on training data, which hampers their robustness to distribution shifts. While test-time adaptation (TTA) improves the robustness of deep networks by updating parameters and statistics, the prototypes of interpretable models have not been explored for this purpose. We introduce ProtoTTA, a general framework for prototypical models that leverages intermediate prototype signals rather than relying solely on model outputs. ProtoTTA minimizes the entropy of the prototype-similarity distribution to encourage more confident and prototype-specific activations on shifted data. To maintain stability, we employ geometric filtering to restrict updates to samples with reliable prototype activations, regularized by prototype-importance weights and model-confidence scores. Experiments across four prototypical backbones on four diverse benchmarks spanning fine-grained vision, histopathology, and NLP demonstrate that ProtoTTA improves robustness over standard output entropy minimization while restoring correct semantic focus in prototype activations. We also introduce novel interpretability metrics and a vision-language model (VLM) evaluation framework to explain TTA dynamics, confirming ProtoTTA restores human-aligned semantic focus and correlates reliably with VLM-rated reasoning quality. Code is available at: https://github.com/DeepRCL/ProtoTTA.