What happened after 2,000 people tried to hack my AI assistant

Fernando Irarrázaval published a challenge on hackmyclaw.com inviting the public to breach his OpenClaw test instance by sending malicious emails. The setup involved a single OpenAI Opus 4.6 model running under strict constraints forbidding it from revealing secrets or executing external commands. Despite 6,000 attempts costing roughly $500 in tokens and resulting in a suspended Google account, no participant succeeded in leaking the protected data.

This outcome suggests recent training efforts by major labs to resist prompt injection are yielding tangible results. Security teams now observe that frontier models are significantly harder to trick than they were only months ago. However, the lack of success here does not guarantee safety for critical production systems where a breach could cause irreversible harm. Experts remain cautious about relying solely on model training to prevent sophisticated attacks.

* 6,000 failed attempts required significant token spend
* A suspended Google account resulted from the testing volume
* The underlying model was Opus 4.6 with strict anti-injection rules

Source Read original →

What happened after 2,000 people tried to hack my AI assistant

Empowering Businesses with AI: Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

OpenAI Previews GPT-5.6 With…

Perplexity Launches Computer for…

OpenAI’s GPT-5.6 Sol launches…