Microsoft uses more than 100 AI agents to find Windows vulnerabilities

Key Points

A Microsoft security system called MDASH employs over 100 specialized AI agents to automatically discover software vulnerabilities in Windows.
MDASH has identified and reported 16 new security issues, with four classified as critical. These include remote code execution vulnerabilities in key components like tcpip.sys and netlogon.dll.
The system scored 88.45 percent on the CyberGym benchmark—its highest score to date—but Microsoft has not disclosed which specific AI models are used within MDASH.

Microsoft’s multi-model security system, MDASH, is designed to automatically find vulnerabilities in software. Unlike simpler approaches that rely on a single AI model like Claude Mythos, MDASH uses over 100 specialized agents across an ensemble of models and techniques.

In the pipeline for detecting vulnerabilities, MDASH first analyzes source code to map attack surfaces. Then, it employs specialized auditor agents to scan for suspicious areas. A second group of debaters evaluates each finding by arguing whether a vulnerability is exploitable. Finally, evidence leaders attempt to trigger these findings through specific inputs.

The system’s pipeline is model-agnostic: new models can be integrated easily by changing configuration settings. Plugins allow experts to feed in domain-specific knowledge like kernel calling conventions or IPC trust boundaries, which cannot be assumed by foundation models alone.

More than 100 agents debate whether vulnerabilities are real

The MDASH system operates through a four-stage pipeline: first analyzing the source code and mapping attack surfaces; then using auditor agents to scan for suspicious areas; next, employing debaters to argue about exploitability of each finding; finally, evidence leaders attempt to trigger these findings with specific inputs.

MDASH’s success is measured by its high score on the CyberGym benchmark—88.45 percent, significantly outperforming other models—and this reflects a comprehensive approach that leverages multiple AI agents and techniques.

A top benchmark score, but not an apples-to-apples comparison

On the public CyberGym benchmark with 1,507 real vulnerabilities, MDASH scored 88.45 percent—the highest result on the leaderboard—yet this is misleading since Microsoft is comparing its framework against individual models rather than directly. The company has not revealed which specific AI models are used.

MDASH employs “SOTA models” as heavy reasoners, “distilled models” as low-cost debaters, and a second separate SOTA model as an independent counterpart. The identities of these models remain undisclosed, though they could come from OpenAI, Anthropic, Microsoft’s own labs, or third-party providers.

MDASH is part of the Autonomous Code Security Team at Microsoft. Some members are from Team Atlanta, winners of the DARPA AI Cyber Challenge. MDASH is currently available in a limited private preview for external customers and has a detailed technical report on its capabilities.

Other companies like OpenAI and Anthropic also engage in AI cybersecurity research

As other major players such as OpenAI and Anthropic continue to explore the use of their models for defending against threats, Microsoft’s approach with MDASH represents one path forward. This dual-use nature highlights the complex landscape of AI-driven security.

Key Takeaways

Microsoft has introduced MDASH, a multi-model system using over 100 specialized agents to find vulnerabilities in Windows.
The system identified and reported 16 new security issues with four classified as critical, including remote code execution vulnerabilities in key components like tcpip.sys and netlogon.dll.
MDASH scored the highest on the CyberGym benchmark (88.45 percent), indicating its effectiveness but also highlighting that this score is based on a comparison against individual models rather than directly comparing frameworks.

Originally published at the-decoder.com. Curated by AI Maestro.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Microsoft pits more than 100 AI agents against each other to find Windows vulnerabilities

Microsoft uses more than 100 AI agents to find Windows vulnerabilities

Key Points

More than 100 agents debate whether vulnerabilities are real

A top benchmark score, but not an apples-to-apples comparison

Other companies like OpenAI and Anthropic also engage in AI cybersecurity research

Key Takeaways

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Americans do not want…

I think “human-in-the-loop” may…

The people most qualified…

Microsoft uses more than 100 AI agents to find Windows vulnerabilities

Key Points

More than 100 agents debate whether vulnerabilities are real

A top benchmark score, but not an apples-to-apples comparison

Other companies like OpenAI and Anthropic also engage in AI cybersecurity research

Key Takeaways

More in AI Research & Science

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Americans do not want…

I think “human-in-the-loop” may…

The people most qualified…