FusionAgent: A Multimodal Agent with Dynamic Model Selection for Human Recognition
arXiv cs.CV / 3/31/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- FusionAgent is an agentic multimodal framework for whole-body human recognition that performs dynamic, sample-specific model selection instead of static score fusion across all inputs.
- It treats each expert model as a tool and uses Reinforcement Fine-Tuning with a metric-based reward to learn the optimal model combination per test sample.
- To improve fusion quality under score misalignment and embedding heterogeneity, it introduces Anchor-based Confidence Top-k (ACT) score fusion that anchors on the most confident model and fuses complementary predictions in a confidence-aware way.
- Experiments on multiple whole-body biometric benchmarks report state-of-the-art performance with higher efficiency, attributed to fewer model invocations.
- The work emphasizes dynamic, explainable, and robust model fusion as a key ingredient for real-world recognition in unconstrained settings.
Related Articles

Black Hat Asia
AI Business
[D] How does distributed proof of work computing handle the coordination needs of neural network training?
Reddit r/MachineLearning

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside
Dev.to

BYOK is not just a pricing model: why it changes AI product trust
Dev.to

AI Citation Registries and Identity Persistence Across Records
Dev.to