Loading...
Loading...
A developer has ported Microsoft’s TRELLIS.2, a 4B-parameter image-to-3D model, to run on Apple Silicon Macs using PyTorch’s MPS backend, removing the original CUDA-only dependencies. TRELLIS.2 previously relied on flash_attn, nvdiffrast, and custom sparse convolution kernels that do not work on macOS. The port replaces these with pure-PyTorch implementations, including a gather-scatter sparse 3D convolution, SDPA attention for sparse transformers, and a Python mesh-extraction path replacing CUD
A developer ported Microsoft’s TRELLIS.2 image-to-3D system to run natively on Apple Silicon using PyTorch MPS, removing the need for NVIDIA GPUs. They reimplemented five CUDA-only extensions—flex_gemm, flash_attn, o_voxel, cumesh, nvdiffrast—with new backends: a pure-PyTorch sparse 3D convolution, Python mesh extraction via spatial hashing, and replacements for other CUDA components. The port enables TRELLIS.2 to run on Macs without vendor-specific binaries, expanding access to 3D reconstruction workflows on Apple laptops and desktops and lowering GPU hardware barriers. This matters for developers, researchers, and creators who want on-device 3D inference and experimentation on Apple Silicon. The work highlights portability gains from PyTorch MPS and custom CPU/GPU fallbacks.
A developer has ported Microsoft’s TRELLIS.2 image-to-3D model to run natively on Apple Silicon using PyTorch MPS, removing the need for NVIDIA CUDA GPUs. The Mac port replaces CUDA-only components (flex_gemm, CUDA hashmaps, flash_attn, cumesh, nvdiffrast) with pure-PyTorch or Python fallbacks, patches .cuda() calls, and reimplements sparse 3D convolution, mesh extraction, and attention to work on M1/M2/M3/M4 chips. On an M4 Pro (24GB), it generates 400K+ vertex meshes from a single image in about 3.5 minutes and peaks near 18GB unified memory; output is vertex-colored OBJ/GLB (no texture baking). Limitations include slower performance (~10× vs CUDA), no texture export, hole-filling disabled, and inference-only support. This makes advanced image-to-3D tooling accessible to Mac users without Nvidia GPUs.
A developer on Hacker News said they ported Microsoft’s TRELLIS.2, a 4B-parameter image-to-3D model, to run on Apple Silicon Macs using PyTorch’s MPS backend, removing the original CUDA dependency. The upstream TRELLIS.2 relies on CUDA-only components such as flash_attn, nvdiffrast, and custom sparse convolution kernels, which do not work on macOS. The port replaces these with pure-PyTorch implementations, including a gather-scatter sparse 3D convolution, SDPA attention for sparse transformers, and a Python mesh-extraction path in place of CUDA hashmap operations. The author reports generating ~400K-vertex meshes from a single photo in about 3.5 minutes on an M4 Pro with 24GB RAM, enabling offline use without cloud GPUs.
A developer has ported Microsoft’s TRELLIS.2, a 4B-parameter image-to-3D model, to run on Apple Silicon Macs using PyTorch’s MPS backend, removing the original CUDA-only dependencies. TRELLIS.2 previously relied on flash_attn, nvdiffrast, and custom sparse convolution kernels that do not work on macOS. The port replaces these with pure-PyTorch implementations, including a gather-scatter sparse 3D convolution, SDPA attention for sparse transformers, and a Python mesh-extraction path replacing CUDA hashmap operations. The author says the changes span a few hundred lines across nine files. Performance is reported at about 3.5 minutes to generate ~400,000-vertex meshes from a single photo on an M4 Pro with 24GB RAM, enabling offline use without cloud GPUs. Code is on GitHub.