Cloudflare just published what they found after running Anthropic's Mythos Preview against 50+ of their own repos and the results are worth reading

If you missed the Project Glasswing announcement last month: Anthropic built a security-focused model that autonomously found thousands of high-severity vulnerabilities across every major OS and web browser, then decided it was too dangerous to release publicly. Instead they gave access to ~40 organizations to use it defensively .

Cloudflare just posted their honest breakdown of the experience.

The genuinely impressive part:

the model can take several exploit primitives and reason about how to chain them into a working proof. The reasoning looks like the work of a senior researcher, not an automated scanner

The catch:

its built-in guardrails aren’t consistent. The same task framed differently could produce completely different outcomes. Cloudflare’s point is that this inconsistency is exactly why any future public release needs hardened safeguards layered on top.

They also acknowledge the same capabilities that helped them find bugs in their own code will, in the wrong hands, accelerate attacks against every application on the internet.

Worth a read if you’ve been following the Glasswing story.

submitted by /u/Direct-Attention8597

Source Read original →

Cloudflare just published what they found after running Anthropic’s Mythos Preview against 50+ of their own repos and the results are worth reading

Empowering Businesses with AI: Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

tencent/Hy3

US investors will soon…

The ‘first’ AI-run ransomware…