Any good MOE ~60B models? I have 64GB vram

“`html

A user on Reddit is seeking advice regarding the use of a specific hardware configuration for running larger models like MOE. The user owns a system with two NVIDIA A100 GPUs (each with 32GB VRAM) and an additional 64GB of DDR4 memory, totaling approximately 96GB available for model execution.

The primary issue is that the current setup allows running only smaller models due to limited VRAM, which restricts the use of larger models like MOE.
They are looking for a MOE model with around 60 billion parameters but are struggling to find one that can be run within their available resources.
This situation is problematic as running smaller models feels wasteful, while larger models exceed the available VRAM and cause performance issues.

The user has tried a model named Gemma 4 at q4 quantization but finds it too slow for both prompt processing and throughput.
They are open to suggestions on how they might better utilize their hardware or find alternative models that fit within their constraints.

“`

### Takeaways
– The user is facing a challenge with limited VRAM, which restricts the use of larger models like MOE.
– They need guidance on finding an appropriate model for their current setup and are looking for alternatives if larger models are not feasible.
– Their situation highlights the importance of balancing available resources with the requirements of different AI tasks.

Source Read original →