A Dual-Stream Transformer Architecture for Illumination-Invariant TIR-LiDAR Person Tracking

arXiv cs.RO / 4/2/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces a dual-stream Transformer-based architecture for all-weather person tracking using Thermal-Infrared (TIR) and LiDAR/Depth sensors, targeting failure cases of RGB-D tracking under extreme lighting like darkness and backlighting.
It leverages standard SLAM-capable robot sensor suites (LiDAR and TIR cameras) to build a practical TIR-D tracking system intended for autonomous mobile robots performing reliable human-following.
A key bottleneck addressed is limited annotated multi-modal TIR-D datasets, which the authors tackle via a sequential knowledge transfer method that transfers structural priors from a large-scale thermal-trained model into the TIR-D domain.
The method uses a “Fine-grained Differential Learning Rate Strategy” to retain pre-trained feature extraction while rapidly adapting to geometric depth cues for the tracking task.
Experiments report improved performance over RGB-transfer and single-modality baselines, including an Average Overlap (AO) of 0.700 and a Success Rate (SR) of 58.7%.

Abstract

Robust person tracking is a critical capability for autonomous mobile robots operating in diverse and unpredictable environments. While RGB-D tracking has shown high precision, its performance severely degrades under challenging illumination conditions, such as total darkness or intense backlighting. To achieve all-weather robustness, this paper proposes a novel Thermal-Infrared and Depth (TIR-D) tracking architecture that leverages the standard sensor suite of SLAM-capable robots, namely LiDAR and TIR cameras. A major challenge in TIR-D tracking is the scarcity of annotated multi-modal datasets. To address this, we introduce a sequential knowledge transfer strategy that evolves structural priors from a large-scale thermal-trained model into the TIR-D domain. By employing a differential learning rate strategy -- referred to as ``Fine-grained Differential Learning Rate Strategy'' -- we effectively preserve pre-trained feature extraction capabilities while enabling rapid adaptation to geometric depth cues. Experimental results demonstrate that our proposed TIR-D tracker achieves superior performance, with an Average Overlap (AO) of 0.700 and a Success Rate (SR) of 58.7\%, significantly outperforming conventional RGB-transfer and single-modality baselines. Our approach provides a practical and resource-efficient solution for robust human-following in all-weather robotics applications.

Black Hat Asia

AI Business

v5.5.0

Transformers（HuggingFace）Releases

Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke

Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Inference Engines - A visual deep dive into the layers of an LLM

Dev.to

A Dual-Stream Transformer Architecture for Illumination-Invariant TIR-LiDAR Person Tracking

Key Points

Abstract

Related Articles

Black Hat Asia

v5.5.0

Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Inference Engines - A visual deep dive into the layers of an LLM

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer