![]() | Thanks everyone for the advice on my previous post (24/7 Headless AI Server on Xiaomi 12 Pro (Snapdragon 8 Gen 1 + Ollama/Gemma4). You really inspired me, and I completely redesigned the cooling and power supply for this setup. What’s new:
Here is how it looks now: https://reddit.com/link/1tlgxms/video/ul2iivua3w2h1/player https://reddit.com/link/1tlgxms/video/xiuyt9wk3w2h1/player Benchmarks (gemma-4-E4B):
https://reddit.com/link/1tlgxms/video/v0t8t5n54w2h1/player
https://reddit.com/link/1tlgxms/video/1cbz7rk85w2h1/player
GPU Struggles I tried running LiteRT on the GPU, but unfortunately, Google AI Edge hasn’t released an APK for my Snapdragon 8 Gen 1. Swapping library files from the Qualcomm site didn’t work either. I also tried running a Vulkan build of llama.cpp but ran into issues. I’ll post updated benchmarks once I manage to get it working. Conclusion If anyone asks if it was worth it: If you have a powerful spare phone lying around and want a great DIY project, definitely yes. But if you just need an LLM server and don’t want the hassle, you’re better off just buying a Mini PC. Thanks again to this sub for the inspiration—I wouldn’t have committed to such a massive rebuild without your feedback! |
Key Takeaways
- The custom Xiaomi 12 Pro server ran Llama.cpp slightly faster than LiteRT on the Snapdragon 8 Gen 1.
- LiteRT required more CPU power and drew a higher amp draw compared to Llama.cpp, but it was still significantly quicker in generation speed.
- Running LiteRT on the GPU proved challenging due to lack of an appropriate APK for Snapdragon 8 Gen 1 devices.
Note: The above key takeaways are based on the benchmarks provided and may not reflect all possible scenarios.
Originally published at reddit.com. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




