Playing One Night Werewolf (Gemma4 & Qwen3.6)

“`html A British user shared their experience playing One Night Werewolf (ONW) with multiple large language models (LLMs): Qwen3.6, Gemma4 26B, and…

By AI Maestro May 14, 2026 1 min read
Playing One Night Werewolf (Gemma4 & Qwen3.6)

“`html

A British user shared their experience playing One Night Werewolf (ONW) with multiple large language models (LLMs): Qwen3.6, Gemma4 26B, and Gemma4 31B. The user used a custom UI on llama.cpp to switch between these LLMs for the game.

  • The user assigned each model a role—werewolf, seer, villager, or troublemaker—and had them privately write their observations and thoughts based on those roles.
  • During daytime, they shared their findings in public chat. Each turn, models read their private notes aloud to the group before making decisions and asking questions.
  • The user noted differences in performance across different LLMs: Gemma4 31B was seen as the best liar due to clear thinking; Gemma4 26B struggled with tool usage but had quick insights; Qwen3.6 showed a tendency to think they were playing as a villager and got outsmarted by their peers.

The post concludes by asking if anyone has suggestions for other models that could be added to the game, suggesting it might not be an effective way of using LLMs in this context.

“`

### Takeaways
– Playing One Night Werewolf with multiple large language models demonstrated varied performance across different models.
– Some models excelled at specific roles like lying or using tools, while others struggled with basic interactions.
– The setup allowed for a private/private-private/public dynamic, but it didn’t seem to be an efficient way of leveraging the strengths of these models in such games.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top