Opinions/improvements for my Qwen3.6-35B-A3B-FP8 + Hermes Agent setup on NVIDIA DGX Spark?

“`html

I’ve seen a user on Reddit sharing their setup for running Qwen, a large language model from Alibaba Cloud, with the Hermes agent. This is done using Docker and NVIDIA DGX Spark, leveraging VLLM as the inference backend.

The user has configured Qwen to run in an aggressive mode, focusing on performance metrics like throughput and memory utilization.
They have provided specific model parameters such as attention-backend set to FlashInfer for efficient computation.
This setup aims to handle long-context interactions efficiently with a focus on stability and performance.

The main point of this post is seeking feedback from other users who are running similar setups. They want insights into how their configuration might be improved or if they’re encountering any issues that others have faced.

“`

### Takeaways
– The user is looking for feedback on an aggressive Qwen3.6-35B-A3B-FP8 setup with Hermes Agent.
– This involves optimizing performance parameters like memory utilization and attention backend.
– Users are encouraged to share their experiences or suggestions for improvement.

Source Read original →

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Opinions/improvements for my Qwen3.6-35B-A3B-FP8 + Hermes Agent setup on NVIDIA DGX Spark?

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

How to Fine-Tune LFM2…

Google Is Quietly Buying…

Microsoft’s new MAI models