Skip to content

Training stops with "Floating point exception" during stage-2 VAE training #340

@yuriai96

Description

@yuriai96

Hello, first of all thanks a lot for your amazing work on this project. I’m currently testing stage-2 training of Trellis using the sample ObjaverseXL_sketchfab dataset generated step-by-step according to the dataset preparation instructions.
Additional details:
– I am running this on

GPU: A100 GPU PCIE
CPU: AMD Ryzen Threadripper PRO 5955WX 16-Cores
RAM: 128 GB

– I’m using the default configuration from the codebase without modifications
– My env: (The installation process of the dependencies completed without serious issues)

# Name                         Version          Build            Channel
_libgcc_mutex                  0.1              main
_openmp_mutex                  5.1              1_gnu
absl-py                        2.3.1            pypi_0           pypi
addict                         2.4.0            pypi_0           pypi
asttokens                      3.0.1            pypi_0           pypi
attrs                          25.4.0           pypi_0           pypi
blinker                        1.9.0            pypi_0           pypi
bzip2                          1.0.8            h5eee18b_6
ca-certificates                2025.11.4        h06a4308_0
ccimport                       0.4.4            pypi_0           pypi
certifi                        2025.11.12       pypi_0           pypi
charset-normalizer             3.4.4            pypi_0           pypi
click                          8.3.1            pypi_0           pypi
coloredlogs                    15.0.1           pypi_0           pypi
comm                           0.2.3            pypi_0           pypi
configargparse                 1.7.1            pypi_0           pypi
contourpy                      1.3.2            pypi_0           pypi
cumm-cu120                     0.4.11           pypi_0           pypi
cycler                         0.12.1           pypi_0           pypi
dash                           3.3.0            pypi_0           pypi
dataclasses-json               0.6.7            pypi_0           pypi
decorator                      5.2.1            pypi_0           pypi
deprecated                     1.3.1            pypi_0           pypi
diff-gaussian-rasterization    0.0.0            pypi_0           pypi
diffoctreerast                 0.0.0            pypi_0           pypi
easydict                       1.13             pypi_0           pypi
einops                         0.8.1            pypi_0           pypi
entrypoints                    0.4              pypi_0           pypi
exceptiongroup                 1.3.1            pypi_0           pypi
executing                      2.2.1            pypi_0           pypi
fastjsonschema                 2.21.2           pypi_0           pypi
filelock                       3.19.1           pypi_0           pypi
fire                           0.7.1            pypi_0           pypi
flash-attn                     2.8.3            pypi_0           pypi
flask                          3.1.2            pypi_0           pypi
flatbuffers                    25.9.23          pypi_0           pypi
fonttools                      4.60.1           pypi_0           pypi
fsspec                         2025.9.0         pypi_0           pypi
ftfy                           6.3.1            pypi_0           pypi
glcontext                      3.0.0            pypi_0           pypi
gputil                         1.4.0            pypi_0           pypi
grpcio                         1.76.0           pypi_0           pypi
hf-xet                         1.2.0            pypi_0           pypi
huggingface-hub                0.36.0           pypi_0           pypi
humanfriendly                  10.0             pypi_0           pypi
idna                           3.11             pypi_0           pypi
igraph                         1.0.0            pypi_0           pypi
imageio                        2.37.2           pypi_0           pypi
imageio-ffmpeg                 0.6.0            pypi_0           pypi
importlib-metadata             8.7.0            pypi_0           pypi
ipycanvas                      0.14.2           pypi_0           pypi
ipyevents                      2.0.4            pypi_0           pypi
ipython                        8.37.0           pypi_0           pypi
ipywidgets                     8.1.8            pypi_0           pypi
itsdangerous                   2.2.0            pypi_0           pypi
jedi                           0.19.2           pypi_0           pypi
jinja2                         3.1.6            pypi_0           pypi
joblib                         1.5.2            pypi_0           pypi
jsonschema                     4.25.1           pypi_0           pypi
jsonschema-specifications      2025.9.1         pypi_0           pypi
jupyter-client                 7.4.9            pypi_0           pypi
jupyter-core                   5.9.1            pypi_0           pypi
jupyterlab-widgets             3.0.16           pypi_0           pypi
kaolin                         0.18.0           pypi_0           pypi
kiwisolver                     1.4.9            pypi_0           pypi
lark                           1.3.1            pypi_0           pypi
lazy-loader                    0.4              pypi_0           pypi
ld_impl_linux-64               2.44             h153f514_2
libffi                         3.3              he6710b0_2
libgcc                         15.2.0           h69a1729_7
libgcc-ng                      15.2.0           h166f726_7
libgomp                        15.2.0           h4751f2c_7
libstdcxx                      15.2.0           h39759b7_7
libstdcxx-ng                   15.2.0           hc03a8fd_7
libuuid                        1.41.5           h5eee18b_0
libxcb                         1.17.0           h9b100fa_0
libzlib                        1.3.1            hb25bd0a_0
llvmlite                       0.45.1           pypi_0           pypi
loguru                         0.7.3            pypi_0           pypi
lpips                          0.1.4            pypi_0           pypi
markdown                       3.10             pypi_0           pypi
markupsafe                     2.1.5            pypi_0           pypi
marshmallow                    3.26.1           pypi_0           pypi
matplotlib                     3.10.7           pypi_0           pypi
matplotlib-inline              0.2.1            pypi_0           pypi
moderngl                       5.12.0           pypi_0           pypi
mpmath                         1.3.0            pypi_0           pypi
mypy-extensions                1.1.0            pypi_0           pypi
narwhals                       2.12.0           pypi_0           pypi
nbformat                       5.10.4           pypi_0           pypi
ncurses                        6.5              h7934f7d_0
nest-asyncio                   1.6.0            pypi_0           pypi
networkx                       3.3              pypi_0           pypi
ninja                          1.13.0           pypi_0           pypi
numba                          0.62.1           pypi_0           pypi
numpy                          2.1.2            pypi_0           pypi
nvdiffrast                     0.3.5            pypi_0           pypi
nvidia-cublas-cu12             12.1.3.1         pypi_0           pypi
nvidia-cuda-cupti-cu12         12.1.105         pypi_0           pypi
nvidia-cuda-nvrtc-cu12         12.1.105         pypi_0           pypi
nvidia-cuda-runtime-cu12       12.1.105         pypi_0           pypi
nvidia-cudnn-cu12              9.1.0.70         pypi_0           pypi
nvidia-cufft-cu12              11.0.2.54        pypi_0           pypi
nvidia-curand-cu12             10.3.2.106       pypi_0           pypi
nvidia-cusolver-cu12           11.4.5.107       pypi_0           pypi
nvidia-cusparse-cu12           12.1.0.106       pypi_0           pypi
nvidia-nccl-cu12               2.20.5           pypi_0           pypi
nvidia-nvjitlink-cu12          12.9.86          pypi_0           pypi
nvidia-nvtx-cu12               12.1.105         pypi_0           pypi
objaverse                      0.1.7            pypi_0           pypi
onnxruntime                    1.23.2           pypi_0           pypi
open-clip-torch                3.2.0            pypi_0           pypi
open3d                         0.19.0           pypi_0           pypi
opencv-python-headless         4.12.0.88        pypi_0           pypi
openssl                        1.1.1w           h7f8727e_0
packaging                      25.0             pypi_0           pypi
pandas                         2.3.3            pypi_0           pypi
parso                          0.8.5            pypi_0           pypi
pccm                           0.4.16           pypi_0           pypi
pexpect                        4.9.0            pypi_0           pypi
pillow                         11.3.0           pypi_0           pypi
pip                            25.3             pyhc872135_0
platformdirs                   4.5.0            pypi_0           pypi
plotly                         6.5.0            pypi_0           pypi
plyfile                        1.1.3            pypi_0           pypi
pooch                          1.8.2            pypi_0           pypi
portalocker                    3.2.0            pypi_0           pypi
prompt-toolkit                 3.0.52           pypi_0           pypi
protobuf                       6.33.1           pypi_0           pypi
psutil                         7.1.3            pypi_0           pypi
pthread-stubs                  0.3              h0ce48e5_1
ptyprocess                     0.7.0            pypi_0           pypi
pure-eval                      0.2.3            pypi_0           pypi
pyarrow                        22.0.0           pypi_0           pypi
pybind11                       3.0.1            pypi_0           pypi
pygltflib                      1.16.5           pypi_0           pypi
pygments                       2.19.2           pypi_0           pypi
pymatting                      1.1.14           pypi_0           pypi
pymeshfix                      0.17.1           pypi_0           pypi
pyparsing                      3.2.5            pypi_0           pypi
pyquaternion                   0.9.9            pypi_0           pypi
python                         3.10.0           h12debd9_5
python-dateutil                2.9.0.post0      pypi_0           pypi
pytz                           2025.2           pypi_0           pypi
pyvista                        0.46.4           pypi_0           pypi
pyyaml                         6.0.3            pypi_0           pypi
pyzmq                          27.1.0           pypi_0           pypi
readline                       8.3              hc2a1206_0
referencing                    0.37.0           pypi_0           pypi
regex                          2025.11.3        pypi_0           pypi
rembg                          2.0.68           pypi_0           pypi
requests                       2.32.5           pypi_0           pypi
retrying                       1.4.2            pypi_0           pypi
rpds-py                        0.29.0           pypi_0           pypi
safetensors                    0.7.0            pypi_0           pypi
scikit-image                   0.25.2           pypi_0           pypi
scikit-learn                   1.7.2            pypi_0           pypi
scipy                          1.15.3           pypi_0           pypi
scooby                         0.11.0           pypi_0           pypi
setuptools                     80.9.0           py310h06a4308_0
six                            1.17.0           pypi_0           pypi
spconv-cu120                   2.3.6            pypi_0           pypi
sqlite                         3.51.0           h2a70700_0
stack-data                     0.6.3            pypi_0           pypi
sympy                          1.14.0           pypi_0           pypi
tensorboard                    2.20.0           pypi_0           pypi
tensorboard-data-server        0.7.2            pypi_0           pypi
termcolor                      3.2.0            pypi_0           pypi
texttable                      1.7.0            pypi_0           pypi
threadpoolctl                  3.6.0            pypi_0           pypi
tifffile                       2025.5.10        pypi_0           pypi
timm                           1.0.22           pypi_0           pypi
tk                             8.6.15           h54e0aa7_0
tokenizers                     0.22.1           pypi_0           pypi
torch                          2.4.0+cu121      pypi_0           pypi
torchaudio                     2.4.0+cu121      pypi_0           pypi
torchvision                    0.19.0+cu121     pypi_0           pypi
tornado                        6.5.2            pypi_0           pypi
tqdm                           4.67.1           pypi_0           pypi
traitlets                      5.14.3           pypi_0           pypi
transformers                   4.57.3           pypi_0           pypi
trimesh                        4.10.0           pypi_0           pypi
triton                         3.0.0            pypi_0           pypi
typing-extensions              4.15.0           pypi_0           pypi
typing-inspect                 0.9.0            pypi_0           pypi
tzdata                         2025.2           pypi_0           pypi
urllib3                        2.5.0            pypi_0           pypi
usd-core                       25.11            pypi_0           pypi
utils3d                        0.0.2            pypi_0           pypi
vox2seq                        0.0.0            pypi_0           pypi
vtk                            9.5.2            pypi_0           pypi
warp-lang                      1.10.0           pypi_0           pypi
wcwidth                        0.2.14           pypi_0           pypi
werkzeug                       3.1.3            pypi_0           pypi
wheel                          0.45.1           py310h06a4308_0
widgetsnbextension             4.0.15           pypi_0           pypi
wrapt                          2.0.1            pypi_0           pypi
xatlas                         0.0.11           pypi_0           pypi
xformers                       0.0.27.post2     pypi_0           pypi
xorg-libx11                    1.8.12           h9b100fa_1
xorg-libxau                    1.0.12           h9b100fa_0
xorg-libxdmcp                  1.1.5            h9b100fa_0
xorg-xorgproto                 2024.1           h5eee18b_1
xz                             5.6.4            h5eee18b_1
zipp                           3.23.0           pypi_0           pypi
zlib                           1.3.1            hb25bd0a_0

I’m running the following command:

python train.py --config configs/vae/slat_vae_dec_mesh_swin8_B_64l8_fp16.json --output_dir outputs/slat_vae_dec_mesh_swin8_B_64l8_fp16_1node --data_dir datasets/ObjaverseXL_sketchfab

And this is the relevant output:

...
Trainer initialized.
SLatVaeMeshDecoderTrainer
  - Models:
    - decoder: ElasticSLatMeshDecoder
  - Dataset: Slat2RenderGeo
    - Total instances: 114
    - Sources:
      - ObjaverseXL_sketchfab:
        - Total: 168307
        - With latent: 115
        - Aesthetic score >= 4.5: 115
        - Num voxels <= 32768: 114
  - Dataloader:
    - Sampler: ResumableSampler
    - Num workers: 32
  - Number of steps: 1000000
  - Number of GPUs: 1
  - Batch size: 4
  - Batch size per GPU: 4
  - Batch split: 4
  - Optimizer: AdamW
  - Learning rate: 0.0001
  - Elastic memory: LinearMemoryController(target_ratio=0.75, available_memory=79.1510009765625)
  - Gradient clip: AdaptiveGradClipper(max_norm=1.0, clip_percentile=95)
  - EMA rate: [0.9999]
  - FP16 mode: inflat_all
/root/miniconda3/envs/trellis2/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(

Starting training...

Sampling 64 images...Floating point exception

The training stops there and does not proceed beyond the Floating point exception message.

Could someone please help me understand what causes this issue, and how to resolve it?

Many thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions