Anyone running Mimo-v2.5 quants with multimodal and MTP?

**Editorial Brief**

A user on Reddit is seeking assistance in running the Mimo-v2.5 quantized model with both multimodal and Multi-Task Pretraining (MTP) capabilities using llamacpp, a popular LLaMA engine. This request highlights ongoing challenges in integrating newer models into existing infrastructure, particularly those with advanced multimodal features that are still under development. The user is looking for guidance on how to proceed with limited VRAM but ample system RAM.

**Takeaways:**

– There is interest among users in running the Mimo-v2.5 model with both multimodal and MTP capabilities.
– Current versions of this model do not fully support these features, requiring use of draft branches or alternative methods.
– Users are seeking advice on how to utilize available resources effectively for inference without significant performance gains.

Source Read original →

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.