Now that MTP is merged... What's the best outputs you're getting on Qwen 3.6 35B on 2x3090s?

We’ve got great outputs for 27B via club 3090, but what about those of us who love the blazing speed of 35B on dual 3090s?

I was getting 1500 p/p and 120 t/g with split layers, but MTP slowed it down to 80 t/g when I tested last week. I’m sticking with my CPU overflow fallback of 3500 p/p and 80 t/g until someone cooks up something ala the geniuses over at club 3090.

What have you tried so far with the new llama.cpp MTP merge? Any big jump over your previous best build for 35B?

submitted by /u/youcloudsofdoom

Source Read original →

Now that MTP is merged… What’s the best outputs you’re getting on Qwen 3.6 35B on 2x3090s?

Empowering Businesses with AI: Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Some of the nation’s…

Meituan Releases LongCat-2.0: A…

Amazon will stop accepting…