AI Navigate

M3GCLR: Multi-View Mini-Max Infinite Skeleton-Data Game Contrastive Learning For Skeleton-Based Action Recognition

arXiv cs.CV / 3/11/2026

Ideas & Deep AnalysisModels & Research

Key Points

  • The paper introduces M3GCLR, a novel game-theoretic contrastive learning framework designed to improve self-supervised skeleton-based action recognition by addressing limitations in view discrepancy modeling, adversarial mechanisms, and augmentation control.
  • M3GCLR establishes the Infinite Skeleton-data Game (ISG) model and equilibrium theorem, enabling mini-max optimization for multi-view mutual information to better capture action-discriminative features.
  • The approach generates normal-extreme data pairs using multi-view rotation augmentation and uses a temporally averaged neutral anchor for structural alignment, explicitly controlling perturbation strength.
  • A dual-loss equilibrium optimizer is proposed to achieve an optimal balance by maximizing action-relevant information and minimizing encoding redundancy, with theoretical proof supporting its equivalence to the ISG model.
  • Extensive experiments on datasets NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD demonstrate that M3GCLR matches or surpasses state-of-the-art performance, with ablation studies validating the effectiveness of each component.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2603.09367 (cs)
[Submitted on 10 Mar 2026]

Title:M3GCLR: Multi-View Mini-Max Infinite Skeleton-Data Game Contrastive Learning For Skeleton-Based Action Recognition

View a PDF of the paper titled M3GCLR: Multi-View Mini-Max Infinite Skeleton-Data Game Contrastive Learning For Skeleton-Based Action Recognition, by Yanshan Li and 3 other authors
View PDF
Abstract:In recent years, contrastive learning has drawn significant attention as an effective approach to reducing reliance on labeled data. However, existing methods for self-supervised skeleton-based action recognition still face three major limitations: insufficient modeling of view discrepancies, lack of effective adversarial mechanisms, and uncontrollable augmentation perturbations. To tackle these issues, we propose the Multi-view Mini-Max infinite skeleton-data Game Contrastive Learning for skeleton-based action Recognition (M3GCLR), a game-theoretic contrastive framework. First, we establish the Infinite Skeleton-data Game (ISG) model and the ISG equilibrium theorem, and further provide a rigorous proof, enabling mini-max optimization based on multi-view mutual information. Then, we generate normal-extreme data pairs through multi-view rotation augmentation and adopt temporally averaged input as a neutral anchor to achieve structural alignment, thereby explicitly characterizing perturbation strength. Next, leveraging the proposed equilibrium theorem, we construct a strongly adversarial mini-max skeleton-data game to encourage the model to mine richer action-discriminative information. Finally, we introduce the dual-loss equilibrium optimizer to optimize the game equilibrium, allowing the learning process to maximize action-relevant information while minimizing encoding redundancy, and we prove the equivalence between the proposed optimizer and the ISG model. Extensive Experiments show that M3GCLR achieves three-stream 82.1%, 85.8% accuracy on NTU RGB+D 60 (X-Sub, X-View) and 72.3%, 75.0% accuracy on NTU RGB+D 120 (X-Sub, X-Set). On PKU-MMD Part I and II, it attains 89.1%, 45.2% in three-stream respectively, all results matching or outperforming state-of-the-art performance. Ablation studies confirm the effectiveness of each component.
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.09367 [cs.CV]
  (or arXiv:2603.09367v1 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2603.09367
Focus to learn more
arXiv-issued DOI via DataCite

Submission history

From: Ke Ma [view email]
[v1] Tue, 10 Mar 2026 08:45:14 UTC (3,517 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled M3GCLR: Multi-View Mini-Max Infinite Skeleton-Data Game Contrastive Learning For Skeleton-Based Action Recognition, by Yanshan Li and 3 other authors
  • View PDF
  • TeX Source
Current browse context:
cs.CV
< prev   |   next >
Change to browse by:

References & Citations

export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo
Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
Links to Code Toggle
Papers with Code (What is Papers with Code?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos

Demos

Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers

Recommenders and Search Tools

Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.