“`html
- A user on Reddit shared their experience trying to run the MTP (Model Token Penalty) and unsloth 27B models on an Apple Silicon M2 Max with 96GB of RAM. They found it difficult to achieve a high throughput, even worse than without using any custom optimizations.
- The user had tried different configurations such as `spec-draft-n-max` set to 2,3, and 6 but still struggled to get meaningful performance improvements. They provided their model parameters and noted that they have a decent acceptance rate of over 70%, which suggests the models are functional.
“`
### Takeaways
– The user is facing significant challenges in achieving high throughput with MTP on Apple Silicon.
– Custom optimizations like `spec-draft-n-max` were applied but did not yield expected performance improvements.
– High acceptance rates suggest the models work, indicating no fundamental issues with model functionality.
Source Read original →
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




