VoxelCodeBench: Benchmarking 3D World Modeling Through Code Generation
arXiv cs.LG / 4/6/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper introduces VoxelCode, a platform that evaluates code generation models for 3D spatial reasoning by executing generated code in Unreal Engine via an API-driven pipeline.
- It presents VoxelCodeBench, a benchmark covering voxel manipulation tasks across symbolic interpretation, geometric construction, and artistic composition to test different reasoning capabilities.
- The evaluation of leading code generation models finds that generating executable code is substantially easier than generating spatially correct outputs, with geometric construction and multi-object composition being especially difficult.
- The platform combines automated metrics with human assessment and supports unified evaluation, aiming to better reflect real-world correctness beyond superficial text-match measures.
- The authors open-source both the platform and benchmark to enable the research community to extend infrastructure for future 3D code generation benchmarks and spatial reasoning studies.




