GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks

arXiv cs.RO / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces GRITS, a spillage-aware guided diffusion policy designed to improve reliability in robotic food scooping under diverse, dynamic food states.
  • GRITS trains a spillage predictor using simulated scenarios built from multiple primitive shapes and varied physical properties, then uses this predictor as differentiable guidance during diffusion sampling at inference.
  • The framework explicitly steers robot trajectories toward safer actions to reduce spillage while maintaining task success, rather than relying solely on imitation or unguided learning.
  • Real-world experiments on a robotic scooping platform show GRITS achieves 82% task success with a 4% spillage rate, cutting spillage by more than 40% versus baselines without guidance.
  • Evaluation includes training on six food categories and testing on ten unseen categories with different shapes and quantities, demonstrating generalization beyond the training distribution.

Abstract

Robotic food scooping is a critical manipulation skill for food preparation and service robots. However, existing robot learning algorithms, especially learn-from-demonstration methods, still struggle to handle diverse and dynamic food states, which often results in spillage and reduced reliability. In this work, we introduce GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks. This framework leverages guided diffusion policy to minimize food spillage during scooping and to ensure reliable transfer of food items from the initial to the target location. Specifically, we design a spillage predictor that estimates the probability of spillage given current observation and action rollout. The predictor is trained on a simulated dataset with food spillage scenarios, constructed from four primitive shapes (spheres, cubes, cones, and cylinders) with varied physical properties such as mass, friction, and particle size. At inference time, the predictor serves as a differentiable guidance signal, steering the diffusion sampling process toward safer trajectories while preserving task success. We validate GRITS on a real-world robotic food scooping platform. GRITS is trained on six food categories and evaluated on ten unseen categories with different shapes and quantities. GRITS achieves an 82% task success rate and a 4% spillage rate, reducing spillage by over 40% compared to baselines without guidance, thereby demonstrating its effectiveness. More details are available on our project website: https://hcis-lab.github.io/GRITS/.