The Surprising Effectiveness of Canonical Knowledge Distillation for Semantic Segmentation

arXiv cs.CV / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that prior comparisons of knowledge distillation (KD) for semantic segmentation can be misleading because they often use equal iteration counts despite different per-iteration costs, effectively giving models unequal compute budgets.
When the authors match wall-clock compute, they find that canonical (logit- and feature-based) KD can outperform more recent, segmentation-specific methods that rely on complex hand-crafted objectives.
With extended training, feature-based KD reaches state-of-the-art performance for a ResNet-18 student on Cityscapes and ADE20K.
A PSPNet-based ResNet-18 student using only about one quarter of the teacher’s parameters achieves near-teacher accuracy, reaching 99% of the teacher’s mIoU on Cityscapes (79.0 vs. 79.8) and 92% on ADE20K.
The findings challenge the assumption that segmentation KD must use task-specific mechanisms, instead suggesting that scaling/training budget matters more than adding complexity to the distillation objectives.

Abstract

Recent knowledge distillation (KD) methods for semantic segmentation introduce increasingly complex hand-crafted objectives, yet are typically evaluated under fixed iteration schedules. These objectives substantially increase per-iteration cost, meaning equal iteration counts do not correspond to equal training budgets. It is therefore unclear whether reported gains reflect stronger distillation signals or simply greater compute. We show that iteration-based comparisons are misleading: when wall-clock compute is matched, \textit{canonical} logit- and feature-based KD outperform recent segmentation-specific methods. Under extended training, feature-based distillation achieves state-of-the-art ResNet-18 performance on Cityscapes and ADE20K. A PSPNet ResNet-18 student closely approaches its ResNet-101 teacher despite using only one quarter of the parameters, reaching 99\% of the teacher's mIoU on Cityscapes (79.0 vs.\ 79.8) and 92\% on ADE20K. Our results challenge the prevailing assumption that KD for segmentation requires task-specific mechanisms and suggest that scaling, rather than complex hand-crafted objectives, should guide future method design.

LLMs will be a commodity

Reddit r/artificial

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

Dex lands $5.3M to grow its AI-driven talent matching platform

Tech.eu

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring

Dev.to

The Surprising Effectiveness of Canonical Knowledge Distillation for Semantic Segmentation

Key Points

Abstract

Related Articles

LLMs will be a commodity

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Dex lands $5.3M to grow its AI-driven talent matching platform

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer