Boosting Vision-Language-Action Finetuning with Feasible Action Neighborhood Prior
arXiv cs.RO / 4/3/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that real-world robotic manipulation naturally allows a feasible action neighborhood (FAN) where multiple actions produce effectively similar progress, rather than a single correct action.
- It introduces a FAN-guided regularizer for vision-language-action (VLA) fine-tuning that reshapes the model’s output distribution using a Gaussian prior to encourage locally smooth, unimodal predictions near the preferred direction and magnitude.
- Experiments show the method improves sample efficiency and success rate in both reinforced finetuning (RFT) and supervised finetuning (SFT).
- Results are reported as strong not only in-distribution but also out-of-distribution (OOD), suggesting better generalization for VLA adaptation.
- The approach is presented as a principled way to match model behavior to the physical manipulation tolerances inherent in robotics, improving both practicality and learning efficiency.




