I just bought Asus Ascent : Nvidia GB10 (DGX) and It is slower than my Ryzen Ai Max

“`html

I just read a post on Reddit where someone mentioned they purchased an Asus Ascent with the Nvidia GB10 DGX and are experiencing performance issues. They were expecting it to be 2-4x faster than their current setup but found it only achieving around 6TK/s on the Gemma4-31B model, which is slower than their previous AI-Max system that runs at about 7.10 TK/s.

The user provided details of their inference engine and tested models, including comparisons with another DGX setup running Apex-I-Quality at a rate of 27TK/s versus the user’s Gemma4-31B model which is performing at around 6.2TK/s. They are unsure what might be causing this discrepancy in performance.

The user has built their own LLaMA engine using the ggml.ai/dgx-spark.sh script.
They tested models including Step3.5-Apex-I-Quality and Gemma4-31B-it-UD-Q8_K_XL, with the DGX setup achieving 27TK/s compared to their AI-Max system at 30TK/s for similar models.
The command they used was: `llama-server –models-preset /home/dgx/models/models.ini –models-dir /home/dgx/models/ –host 0.0.0.0 –port 8080 –models-max 1 –parallel 1`.

This situation highlights the variability in performance across different hardware and software setups, especially when dealing with AI models like LLaMA or other language models.

“`

### Takeaways
– The user is seeing significantly lower throughput compared to their previous system despite having a more powerful GPU.
– Performance differences can be attributed to various factors including model architecture, inference engine optimizations, and specific hardware configurations.
– This instance underscores the need for detailed benchmarking when upgrading AI infrastructure to ensure expected performance levels.

Source Read original →