Import AI 453: Breaking AI agents; MirrorCode; and ten views on gradual disempowerment

Disclosure: Some links in this article are affiliate links. AI Maestro may earn a commission if you make a purchase, at no…

By AI Maestro May 10, 2026 3 min read
Import AI 453: Breaking AI agents; MirrorCode; and ten views on gradual disempowerment

“`html




Import AI 453: Breaking AI agents and MirrorCode

Welcome to Import AI

MirrorCode: AI Can Reimplement Complex Software

AI can reverse engineer software containing thousands of lines of code. MirrorCode, a benchmark designed to test AI models’ ability to autonomously reimplement complex existing software, has shown that AI systems are capable at certain types of coding tasks.

The MirrorCode benchmark includes over 20 target programs across various areas of computing. For example, Claude Opus 4.6 successfully reimplemented `gotree`, a bioinformatics toolkit with approximately 16,000 lines of Go and 40+ commands. This task would take a human engineer without AI assistance between 2-17 weeks.

However, it’s important to note that this benchmark isn’t like traditional coding tests. The tasks involve cloning programs which produce canonical outputs, and some may have memorization on basic programs. This only covers a slice of the large universe of potential software projects.

Policy Atlas: Tools to Navigate AI Policy Responses

The Windfall Trust has published a “Windfall Policy Atlas” to help people understand and navigate various policy proposals related to transformative AI. The atlas contains 48 distinct ideas, grouped into five categories:

  • Public & Social Investments: Funding initiatives that benefit society as a whole.
  • Labor Market Adaptation: Programs designed to help workers adapt to new job markets.
  • Regulation and Market Design: Rules and mechanisms aimed at regulating AI behavior.
  • Global Coordination: Efforts to coordinate international responses to the challenges posed by transformative AI.

The atlas provides a navigable interface that helps users explore these different proposals. For instance, “long term” solutions for labor might include shortened workweeks, while “medium term” ones could involve workforce training and reskilling programs.

How to Break AI Agents: Six Genres of Attack

A new paper from Google DeepMind outlines six genres of attack that can be mounted against AI agents. These attacks exploit various vulnerabilities in the way these intelligences interact with their environment.

  • Content Injection: Embed commands into metadata like CSS or HTML to alter how an agent perceives its world.
  • Semantic Manipulation: Inject malicious instructions within educational materials or hypothetical scenarios to confuse the AI and steer its behavior.
  • Cognitive State: Place fabricated statements in retrieval corpora to trigger specific behaviors or leaks sensitive data.
  • Behavioral Control: Embed adversarial prompts into external resources to trick an agent into performing harmful actions.
  • Systemic: Disrupt the equilibrium of a system by sending multiple agents on unrelated tasks, causing cascading failures.
  • Human-in-the-Loop: Use cognitive biases in human overseers to manipulate their decisions and guide an AI agent’s behavior towards harmful outcomes.

To mitigate these attacks, several technical, ecosystem-level, legal, and ethical frameworks are recommended. These include pre-training models for robustness, standardizing digital ecosystems, refining liability laws, and conducting systematic evaluations of agents.

AI Forecast: Higher Probability of Full Automation by 2028

AI forecaster Ryan Greenblatt has updated his forecast. He now believes there is a higher probability—30%—that AI research will be fully automated by the end of 2028, based on model performance and reliability over time.

  • Model Performance: Models like Opus 4.5 and Codex have exceeded his expectations in tasks that were previously considered complex or lengthy for humans.
  • Time Factors: He notes improvements in the ability of AI to perform reliable tasks over time, such as those taking between a month and several years.

This suggests that with continued advancements, it may become possible to fully automate AI research within a few years.

© 2026 Import AI. All rights reserved.

“`

This HTML is structured as per the provided text, maintaining key facts and figures while rephrasing where necessary to fit a British English style.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top