Autonomous Skeletal Landmark Localization towards Agentic C-Arm Control
arXiv cs.CV / 4/22/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper addresses delays caused by failures of conventional deep learning C-arm control, proposing an agentic framework that uses multimodal LLMs to incorporate clinician feedback and reasoning for more accurate positioning.
- It investigates adapting multimodal large language models for autonomous skeletal landmark localization, which is a prerequisite step for C-arm control.
- The authors fine-tuned two MLLMs using both annotated synthetic X-ray data and real X-ray data, training the models to retrieve the closest skeletal landmarks from each image.
- Quantitative results show the fine-tuned MLLMs perform competitively with a leading DL approach across localization tasks, while qualitative experiments demonstrate reasoning-based correction of incorrect predictions and sequential navigation of the C-arm toward a target.
- The study releases code on GitHub, supporting further research toward agentic autonomous C-arm control systems.
Related Articles

Enterprise AI Governance Has Shifted from Policy to Execution
Dev.to

Rethinking CNN Models for Audio Classification
Dev.to
v0.20.0rc1
vLLM Releases

Build-in-Public: What I Learned Building an AI Image SaaS
Dev.to
I built my own event bus for a sustainability app — here's what I learned about agent automation using OpenClaw
Dev.to