Microsoft Research’s World-R1 Uses Flow-GRPO and 3D-Aware Rewards to Inject Geometric Consistency Into Wan 2.1 Without Architectural Changes

MarkTechPost / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Microsoft Research’s World-R1 uses reinforcement learning to improve geometric (3D) consistency in text-to-video outputs.
  • The approach employs Flow-GRPO and 3D-aware reward signals to encourage stable 3D structure across generated frames.
  • It reportedly enhances 3D consistency in Wan 2.1 without requiring any architectural changes to the underlying text-to-video model.
  • The work targets a common weakness in generative video models: inconsistency that can break geometric plausibility over time.

Microsoft Research's World-R1 Uses Reinforcement Learning to Force 3D Consistency Into Text-to-Video Models

The post Microsoft Research’s World-R1 Uses Flow-GRPO and 3D-Aware Rewards to Inject Geometric Consistency Into Wan 2.1 Without Architectural Changes appeared first on MarkTechPost.