For the makers and artists watching the generative AI space, Build 2026 signals a shift from simple prompt-response interactions to persistent, autonomous work partners. Microsoft has moved beyond static image generators to unveil a suite of seven in-house models, including its first dedicated reasoning engine, alongside a new operating system and hardware designed to run these agents locally. The focus is no longer just on creating content, but on embedding AI directly into the workflow to handle scheduling, coding, and complex multi-step logic without constant human intervention.
The reasoning leap and the cost of tuning
The headline announcement is MAI-Thinking-1, a 1-trillion-parameter model with 35 billion active parameters. It features a 128,000-token context window and is built specifically for multi-step instructions, long-context analysis, and code generation. Mustafa Suleyman, Microsoft’s AI chief, stated it matches leading models on key software engineering benchmarks and was preferred over Anthropic‘s Sonnet 4.6 in internal blind tests. Notably, the model was trained from scratch on clean data, avoiding distillation from third-party sources.
In independent benchmarking, however, the model sits roughly on par with Deepseek V3.2. Beyond reasoning, the MAI family expands into six specific areas. MAI-Code-1-Flash is a 5-billion-parameter agentic coding model integrated into GitHub Copilot and Visual Studio Code, pitched as cheaper to run than Anthropic’s Haiku. MAI-Image-2.5 handles text-to-image generation and editing, securing second place on the Arena-Score benchmark behind GPT-Image-2 but ahead of Google’s Nano-Banana models. MAI-Transcribe-1.5 offers the fastest transcription speeds across 43 languages, while MAI-Voice-2 generates speech in 15 languages and can clone voices from short audio samples.
All systems share a common data foundation and evaluation pipeline, available via Azure Foundry. For the first time, developers can fine-tune the model weights directly.
Frontier Tuning
Microsoft is introducing “Frontier Tuning,” a method allowing companies to adapt models to their specific workflows using reinforcement learning. The company argues that the most valuable training data comes from the actual work traces an agent leaves behind within an organisation. In internal testing, a MAI model tuned for Excel matched GPT-5.4 performance while running up to ten times more efficiently. At McKinsey, a customised MAI model achieved the highest win rate of any system tested, again at roughly one-tenth the cost.
Scout: the always-on autonomous agent
The strategy also launches “Scout,” Microsoft’s first “always-on” background agent. Part of a new category called “Autopilots,” these persistent agents possess their own identity and work autonomously in the background across Teams, Outlook, OneDrive, and SharePoint. Scout is designed to coordinate meetings across time zones, prepare briefing materials, schedule deliverables, and flag stalled decisions before they become blockers.
Using a component called Work IQ, the agent builds a context memory of how you work and what you prioritise. Security measures include running under a unique Entra identity, sandboxed execution via Microsoft Execution Containers, and mandatory human approval for sensitive actions. Credentials are scoped to each task and scrubbed from logs. Whether these safeguards are sufficient remains to be seen, as previous agent systems have consistently failed at the point where language models meet external data.
Scout is currently available as an experimental release through the Frontier program, requiring an Intune configuration and a GitHub Copilot license.
Hardware, OS, and clinical expansion
Accompanying the software is Project Solara, an Android-based operating system co-developed with Qualcomm and MediaTek. It is designed to run agents across devices, with Microsoft showing a desktop hub and digital badge as potential form factors. For local development, the Surface RTX Spark Dev Box is launching, equipped with Nvidia’s Arm-based Spark RTX chip and 128 GB of unified memory, though pricing and full specs are yet to be announced.
In healthcare, Microsoft has partnered with the Mayo Clinic to co-develop a clinical foundation model. This will initially deploy within Mayo Clinic’s own operations before becoming available on Azure Foundry, with the Mayo Clinic retaining ownership.
Microsoft frames this overarching goal as “Humanist Superintelligence,” defining AI systems as tools that remain under human control. Suleyman indicates plans to rapidly scale compute and capabilities over the coming year, supported by Microsoft’s own Maia 200 chips.
Key takeaways
- Microsoft has released MAI-Thinking-1, its first in-house reasoning model, which benchmarks against Deepseek V3.2 and outperformed Anthropic’s Sonnet 4.6 in internal engineering tests.
- The new Frontier Tuning method allows organisations to align models with their specific workflows using reinforcement learning, reportedly achieving GPT-5.4 level performance at one-tenth the cost.
- “Scout” introduces a new class of persistent, always-on agents integrated into core productivity tools, designed to autonomously manage scheduling, meetings, and decision tracking.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




