| Wanted a real head to head on the two TTS models that actually run well on CPU. Couldn’t find one with proper numbers, so I ran one. Posting because the result was not what I expected going in. Quick context for anyone who hasn’t seen Supertonic 3 yet: it’s a flow-matching TTS where you can dial down inference steps to trade quality for speed. Default is 5 steps, "speed mode" is 2. Kokoro 82M everyone here knows by now. Hardware: AMD EPYC 7763, 4 vCPUs, 16GB RAM, no GPU. Roughly comparable to a Ryzen 5600 or a decent N100 box. Setup: 6 text lengths from 12 chars to 1712 chars, 5 runs each, 120 timed runs total. CUDA explicitly disabled. Warmup run discarded. Mean RTF (lower is faster):
Wall-clock latency on the medium text (196 chars, about 13 seconds of audio):
Long and Extended text details in the Github Repo below. Throughput in chars per second at steady state: Supertonic 2-step gets to ~111, Supertonic 5-step ~55, Kokoro hovers around 33 to 36 regardless of backend. The quality side, which actually flips the ranking: Supertonic at 2 steps is fast, but the audio is rough. Words slur, prosody is mechanical, not something I’d ship. At 5 steps it cleans up a lot and is genuinely usable. Kokoro at either backend still produces the most natural speech of anything I’ve tested in this size class. It’s #1 on the TTS Arena leaderboard for a reason. So the practical ranking is more like:
Two things that surprised me:
Detailed write up and Github Repo with all 24 audio samples, and the benchmarks are mentioned in comments below 👇 This evaluation of both TTS models was performed using Neo AI Engineer that built the eval harness, handled model runtime issues, and consolidated results. I reviewed everything manually. If anyone has an N100 or a Pi 5 lying around and runs this, I’d love to see the numbers. That’s the tier I actually want to deploy on. submitted by /u/gvij |
Originally published at reddit.com. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




