Knowledge Distillation for Efficient Transformer-Based Reinforcement Learning in Hardware-Constrained Energy Management Systems

arXiv cs.LG / 3/30/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the challenge that transformer-based reinforcement learning (especially Decision Transformer) is too compute- and memory-intensive for deployment on residential energy-management controllers with strict latency constraints.
  • It proposes using knowledge distillation to transfer policies from high-capacity offline Decision Transformer “teacher” models trained on heterogeneous multi-building data to smaller “student” models for embedded use.
  • Experiments on the Ausgrid dataset show that distillation largely preserves control performance, with occasional small gains of up to about 1%.
  • The approach delivers substantial efficiency improvements, reducing parameters by up to 96%, inference memory by up to 90%, and inference time by up to 63%.
  • The authors conclude that knowledge distillation can make Decision Transformer–based control practically deployable in resource-limited residential energy management systems, including cases where the student model matches the teacher’s architecture size.

Abstract

Transformer-based reinforcement learning has emerged as a strong candidate for sequential control in residential energy management. In particular, the Decision Transformer can learn effective battery dispatch policies from historical data, thereby increasing photovoltaic self-consumption and reducing electricity costs. However, transformer models are typically too computationally demanding for deployment on resource-constrained residential controllers, where memory and latency constraints are critical. This paper investigates knowledge distillation to transfer the decision-making behaviour of high-capacity Decision Transformer policies to compact models that are more suitable for embedded deployment. Using the Ausgrid dataset, we train teacher models in an offline sequence-based Decision Transformer framework on heterogeneous multi-building data. We then distil smaller student models by matching the teachers' actions, thereby preserving control quality while reducing model size. Across a broad set of teacher-student configurations, distillation largely preserves control performance and even yields small improvements of up to 1%, while reducing the parameter count by up to 96%, the inference memory by up to 90%, and the inference time by up to 63%. Beyond these compression effects, comparable cost improvements are also observed when distilling into a student model of identical architectural capacity. Overall, our results show that knowledge distillation makes Decision Transformer control more applicable for residential energy management on resource-limited hardware.

Knowledge Distillation for Efficient Transformer-Based Reinforcement Learning in Hardware-Constrained Energy Management Systems | AI Navigate