Star-Fusion: A Multi-modal Transformer Architecture for Discrete Celestial Orientation via Spherical Topology

arXiv cs.AI / 4/30/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Star-Fusion, a multi-modal transformer architecture that estimates spacecraft celestial attitude by casting the problem as discrete topological classification rather than continuous regression.
  • To address the celestial sphere’s non-Euclidean geometry and the periodic wraparound effects of right ascension and declination, it uses spherical K-Means clustering to partition the sphere into K topologically consistent regions.
  • Star-Fusion combines three components in a tripartite fusion design: a SwinV2-Tiny transformer for photometric feature extraction, a convolutional heatmap branch for spatial grounding, and a coordinate-based MLP for geometric anchoring.
  • On a synthetic Hipparcos-derived dataset, the model reports strong accuracy (93.4% Top-1, 97.8% Top-3) and low inference latency (18.4 ms) on resource-constrained commercial off-the-shelf hardware, supporting potential real-time onboard use.
  • The work positions Star-Fusion as a practical candidate to reduce computational overhead and improve robustness compared with traditional “Lost-in-Space” methods that are sensitive to sensor noise.

Abstract

Reliable celestial attitude determination is a critical requirement for autonomous spacecraft navigation, yet traditional "Lost-in-Space" (LIS) algorithms often suffer from high computational overhead and sensitivity to sensor-induced noise. While deep learning has emerged as a promising alternative, standard regression models are often confounded by the non-Euclidean topology of the celestial sphere and by the periodic boundary conditions of Right Ascension (RA) and Declination (Dec). In this paper, we present Star-Fusion, a multi-modal architecture that reformulates orientation estimation as a discrete topological classification task. Our approach leverages spherical K-Means clustering to partition the celestial sphere into K topologically consistent regions, effectively mitigating coordinate wrapping artifacts. The proposed architecture employs a tripartite fusion strategy: a SwinV2-Tiny transformer backbone for photometric feature extraction, a convolutional heatmap branch for spatial grounding, and a coordinate-based MLP for geometric anchoring. Experimental evaluations on a synthetic Hipparcos-derived dataset demonstrate that Star-Fusion achieves a Top-1 accuracy of 93.4% and a Top-3 accuracy of 97.8%. Furthermore, the model exhibits high computational efficiency, maintaining an inference latency of 18.4 ms on resource-constrained COTS hardware, making it a viable candidate for real-time onboard deployment in next-generation satellite constellations.