I ported Microsoft's TRELLIS.2 to run on Apple Silicon via PyTorch MPS. The original depends on five CUDA-only compiled extensions (flex_gemm, flash_attn, o_voxel, cumesh, nvdiffrast) that have no Mac equivalent.
Wrote replacement backends from scratch:
- Pure-PyTorch sparse 3D convolution (replacing flex_gemm)
- Python mesh extraction using spatial hashing (replacing CUDA hashmap ops in o_voxel)
- SDPA attention for sparse transformers (replacing flash_attn)
- GPU-accelerated trilinear voxel sampling via torch.grid_sample on MPS
Generates ~400K vertex meshes from a single photo in about 3.5 minutes on M4 Pro (24GB). Texture baking takes about 18 seconds using MPS GPU acceleration. Not as fast as H100 but works offline with zero Cloud cost.
[link] [comments]



