Pushing the limit: minimax m2.7 q8_0 128k on 2×3090, 256GB DDR4

“`html A user shared details about running a large-scale language model, specifically the minimax m2.7 q8_0 128k variant on two NVIDIA RTX…

By AI Maestro May 18, 2026 1 min read
Pushing the limit: minimax m2.7 q8_0 128k on 2×3090, 256GB DDR4

“`html

A user shared details about running a large-scale language model, specifically the minimax m2.7 q8_0 128k variant on two NVIDIA RTX 3090 GPUs with 256GB of DDR4 RAM.

  • The CPU is an old 10900x processor, used as a secondary component for this task.
  • Context length is set to 128k tokens, and the model operates without quantization on its key-value cache.
  • Accuracy is prioritized over speed; the user aims for usable performance rather than high throughput.

The model’s execution speed is relatively slow at approximately 50 token-per-second (TPS) per process and around 10 TPS for generating text. Despite this, it’s considered usable for tasks like coding agent workflows.

  • Some users are running similar models on low-end hardware, which might be of interest to the community.
  • The user seeks recommendations for other models within their constraints and suggestions for further optimizations.
  • They also mention wishing they could access a draft model for MTP (Model Training Pipeline) as it was declined for this size class.

“`


Originally published at reddit.com. Curated by AI Maestro.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top