HAD: Heterogeneity-Aware Distillation for Lifelong Heterogeneous Learning

arXiv cs.CV / 3/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces “lifelong heterogeneous learning (LHL),” a setting where a model must learn a sequence of tasks with different output structures while retaining prior knowledge.
It instantiates LHL in dense prediction (LHL4DP) as a realistic, challenging scenario involving preservation of heterogeneous knowledge such as across different pixel/region-level output behaviors.
The authors propose Heterogeneity-Aware Distillation (HAD), an exemplar-free self-distillation approach that distills previously learned knowledge at each training phase.
HAD includes a distribution-balanced heterogeneity-aware distillation loss to address global prediction imbalance, and a salience-guided loss that emphasizes informative edge pixels identified via the Sobel operator.
Experiments reported in the work indicate HAD significantly outperforms existing methods on this newly formalized LHL4DP benchmark/task setting.

Abstract

Lifelong learning aims to preserve knowledge acquired from previous tasks while incorporating knowledge from a sequence of new tasks. However, most prior work explores only streams of homogeneous tasks (\textit{e.g.}, only classification tasks) and neglects the scenario of learning across heterogeneous tasks that possess different structures of outputs. In this work, we formalize this broader setting as lifelong heterogeneous learning (LHL). Departing from conventional lifelong learning, the task sequence of LHL spans different task types, and the learner needs to retain heterogeneous knowledge for different output space structures. To instantiate the LHL, we focus on LHL in the context of dense prediction (LHL4DP), a realistic and challenging scenario. To this end, we propose the Heterogeneity-Aware Distillation (HAD) method, an exemplar-free approach that preserves previously gained heterogeneous knowledge by self-distillation in each training phase. The proposed HAD comprises two complementary components, including a distribution-balanced heterogeneity-aware distillation loss to alleviate the global imbalance of prediction distribution and a salience-guided heterogeneity-aware distillation loss that concentrates learning on informative edge pixels extracted with the Sobel operator. Extensive experiments demonstrate that the proposed HAD method significantly outperforms existing methods in this new scenario.

Black Hat Asia

AI Business

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

Simon Willison's Blog

Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026

Dev.to

I missed the "fun" part in software development

Dev.to

The Billion Dollar Tax on AI Agents

Dev.to

HAD: Heterogeneity-Aware Distillation for Lifelong Heterogeneous Learning

Key Points

Abstract

Related Articles

Black Hat Asia

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026

I missed the "fun" part in software development

The Billion Dollar Tax on AI Agents

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer