Physics-Informed Tracking (PIT)

arXiv cs.CV / 4/21/2026

📰 NewsModels & Research

Key Points

  • The paper introduces Physics-Informed Tracking (PIT), a video-based framework that tracks a single particle by combining a neural network autoencoder with physics-based constraints.
  • An autoencoder produces a particle heatmap peak (landmark), while a differentiable physics module enforces physically consistent landmark trajectories over time without requiring labels.
  • PIT’s Physics-Informed Landmark Loss (PILL) enables unsupervised training by comparing the predicted trajectory back to the landmarks, ensuring physical consistency.
  • A supervised variant, Physics-Informed Landmark Losses with Simulation Supervision (PILLS), trains end-to-end using ground-truth simulation data for position, velocity, and bounce.
  • Experiments using a replicated 26-factorial design show that PILLS achieves sub-pixel tracking accuracy for both bilinear and physics-refined decoder outputs under clean and noisy conditions.

Abstract

We propose Physics-Informed Tracking (PIT), a video-based framework for tracking a single particle from video, where a neural network autoencoder localizes a particle as a heatmap peak (landmark) and a differentiable physics module embedded in the autoencoder constrains several landmarks over time (a trajectory) to satisfy known dynamics. The novel Physics-Informed Landmark Loss (PILL) compares this predicted trajectory back against the landmarks, enforcing physical consistency without labels. Its supervised variant (PILLS) instead compares the prediction against ground-truth position, velocity, and bounce from simulation, enabling end-to-end backpropagation. To support supervised and unsupervised learning, we use an autoencoder with a split bottleneck that separates A) tracking-related structure via landmark heatmaps from B) background noise and subsequent image reconstruction. We evaluate a replicated 26 factorial design (n = 4 replicates, 64 configurations), showing that PILLS consistently achieves sub-pixel tracking accuracy for the bilinear and physics-refined decoder outputs under both clean and noisy conditions.