We’ve got great outputs for 27B via club 3090, but what about those of us who love the blazing speed of 35B on dual 3090s?
I was getting 1500 p/p and 120 t/g with split layers, but MTP slowed it down to 80 t/g when I tested last week. I’m sticking with my CPU overflow fallback of 3500 p/p and 80 t/g until someone cooks up something ala the geniuses over at club 3090.
What have you tried so far with the new llama.cpp MTP merge? Any big jump over your previous best build for 35B?
submitted by /u/youcloudsofdoom
[link] [comments]
Originally published at reddit.com. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




