UK gov's Mythos AI tests help separate cybersecurity threat from hype

UK Gov’s Mythos AI Tests Help Separate Cybersecurity Threat from Hype

Last week, Anthropic announced it was restricting the initial release of its Mythos Preview model to a limited group of critical industry partners. The UK government’s AI Security Institute (AISI) has now published an initial evaluation of Mythos’ cyberattack capabilities, adding independent public verification to Anthropic’s claims.

AISI Evaluates Mythos’ Cyber Capabilities

The AISI conducted a series of Capture the Flag (CTF) challenges since early 2023 and found that Mythos Preview performs at a similar level to other recent models. However, Mythos showed promise in chaining multiple tasks together into complex multi-step attacks, which could potentially bypass defenses.

Mythos Outshines Competitors

In the “The Last Ones” (TLO) test, designed to simulate a 32-step data extraction attack on a corporate network, Mythos Preview managed to complete significantly more steps compared to previous models. While Anthropic’s new model only succeeded in 3 out of 10 attempts, even the average Mythos Preview run completed 22 out of the 32 required infiltration steps.

Limitations and Future Potential

Despite its impressive performance, Mythos still struggles with more complex tests like “Cooling Tower,” designed to simulate an attempted disruption in a power plant. The AISI expects improvements as it continues to evaluate the model within increased compute budgets. However, the group emphasizes that real-world systems often have active defenders and defensive tools not present in simulated environments.

Key Takeaways

Mythos Preview performs similarly to other recent AI models but excels at chaining multiple tasks together for complex attacks.
The TLO test highlights Mythos’ potential as an autonomous attacker of small, vulnerable systems.
AISI suggests that future models could outperform Mythos, prompting cybersecurity professionals to utilize AI models in their defense strategies.

UK gov’s Mythos AI tests help separate cybersecurity threat from hype