we really all are going to make it, aren't we? 2x3090 setup.

“`html

I recently came across a post on r/LocaLLaMA where someone shared their experience of setting up an AI model locally. The key points were the successful implementation and performance improvements, particularly with the use of a 2×3090 GPU setup.

The writer noted significant speedups in processing power (PP/s) and tool calls per second (TK/s), achieving rates of over 4000 PP/s and 113 TK/s without NVLink, which they claim would make it even faster.
They highlighted the use of a Qwen 3.6 model with 27 billion parameters running on 48 GB VRAM, describing this as “almost-sonnet level” performance and much faster than cloud-based models.
The writer is now working on integrating their local AI for handling SSH sessions on Linux machines, indicating the practical applications of this new model setup.

This development suggests a promising future for local AI setups, especially with improvements in model efficiency and performance. The potential to reach frontier-class intelligence within smaller models in the near term is also noted as an exciting possibility.

“`

### Takeaways
– Local AI setups are now capable of achieving impressive speeds without relying on cloud services.
– Smaller models like Qwen 3.6 can offer “almost-sonnet level” performance, which was previously only possible with larger models in the cloud.
– Integrating local AI for tasks such as SSH sessions is becoming more feasible and practical.

Source Read original →