For makers and artists navigating the digital landscape, the release of Anthropic’s latest model, Fable, signals a troubling shift where legitimate security work is being treated as a threat. While billed as a public iteration of the powerful Mythos model, the tool is currently so restrictive that it blocks innocuous requests, such as reading a blog post, simply because they touch tangentially on cyber topics. When a prompt triggers these safety protocols, the chat halts immediately with a generic warning that “safety measures flagged this message for cybersecurity or biology topics.”
The friction between safety and utility
These guardrails were implemented to mitigate the risk of the model being weaponised to develop malware or compromise software, a longstanding anxiety for Anthropic. The biological restrictions stem from parallel fears regarding the creation of biological weapons. However, the implementation has drawn sharp criticism from the community.
Valentina “Chompie” Palmiotti, a senior security researcher at IBM X-Force, highlighted the absurdity of the current setup. “Fable rejects any request that could be tangentially cyber related,” she noted. “Even innocuous tasks like reading a blog post.” This over-aggressive filtering forces the AI to default to a less capable version, Claude Opus 4.8, whenever a guardrail is hit.
Another veteran, Matt Suiche, who works with the AI cybersecurity startup Tolmo, explained the practical consequences for developers. “If you ask it to write secure code, it assumes it is cybersecurity related work instead of software engineering best practices, and you get downgraded,” he said. Suiche described the detection mechanism as overly reliant on keywords; anything within the lexical field of “cybersecurity” triggers the block. Even a simple request for a code review sets off the alarm.
Context from the Mythos launch
The backlash against Fable mirrors issues seen when Anthropic launched Mythos in April. That model was initially restricted to a select group of entities under Project Glasswing, a controlled rollout aimed at securing critical infrastructure. Last week, Anthropic widened access to Mythos, allowing hundreds of organisations across 15 countries to use it. Despite these efforts to control distribution, the haphazard nature of the safety filters remains a deterrent for professionals.
“It seems to be keyword based, so anything in the lexical field of ‘cybersecurity’ triggers the guardrails,” said Suiche.
A path forward for collaboration
Anthropic has not yet responded to requests for comment regarding these specific complaints. To access the model for security work, professionals must apply to the Cyber Verification Program; approved applicants face fewer limitations than the general public. This mirrors OpenAI’s Trusted Access for Cyber initiative.
While the current friction is frustrating, Suiche believes the system will mature. “It is understandable as we are still in the early days and they are still adapting their guardrails,” he said. “I am sure they are going to evolve over time as Anthropic and other frontier model companies will collaborate more with the current new generation of cybersecurity companies.” He added that it is better to cast a wide net initially and relax the constraints later, rather than releasing a tool that cannot be trusted.
Key takeaways
- Fable currently blocks legitimate security tasks, including code reviews and reading standard content, if they mention cybersecurity topics.
- When guardrails are triggered, the model degrades to the less capable Claude Opus 4.8, hindering the quality of secure coding assistance.
- Access to the model is gated by the Cyber Verification Program, which offers fewer restrictions to approved professionals.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.



