ROCm with PyTorch and PyTorch Lightning seems to still suck for research [D]

“`html A British AI enthusiast found that running a small codebase for training SANA models on ROCm with PyTorch and PyTorch Lightning…

By AI Maestro May 16, 2026 1 min read
ROCm with PyTorch and PyTorch Lightning seems to still suck for research [D]

“`html

  • A British AI enthusiast found that running a small codebase for training SANA models on ROCm with PyTorch and PyTorch Lightning resulted in NaNs everywhere, despite forward passes being fine.
  • The issue persisted even after switching between different precision modes (bf16, fp32) and tweaking various environment variables. The user concluded that ROCm is still notably fragile when used on uncommon codebases.

“`

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top