How fast is 10 tokens per second really?

“`html

On May 20, 2026, British tech blogger Simon Willison highlighted a new tool created by Mike Veerman. This tool is an interactive HTML application designed to simulate the token output speeds of large language models (LLMs) ranging from 5 tokens per second up to 800 tokens per second.

Users can input different speeds and see how it affects the model’s response time in a practical, visual manner. This tool is particularly useful for those evaluating LLMs advertised with specific token output rates, such as models claimed to be “30 tokens/second.” By using this app, one can better understand what these claims mean in terms of actual performance.

This tool provides a practical way to compare different LLM speeds and their implications for real-world applications.
It serves as an educational resource for developers and model enthusiasts looking to gauge how various models might perform under specific conditions.
The availability of the source code encourages further exploration and customization by the community, enhancing transparency in AI research and development.

“`

This HTML snippet encapsulates a brief editorial covering what happened with the tool and why it matters. It also includes three takeaways that highlight its utility and impact on the field of AI research and development.

Source Read original →