Google bakes computer control directly into Gemini 3.5 Flash, letting the model see and operate your screen

Google has integrated Computer Use directly into Gemini 3.5 Flash, allowing the model to observe and manipulate screens on computers, browsers, and mobile devices without prior external configuration. Previously, this capability existed only as a separate Gemini 2.5 model, but it is now embedded within the standard Flash variant. Developers can combine this feature with existing tools like function calls, Search, and Maps to build agents that automate tasks across desktop, mobile, and web environments. The update improves performance on the OSWorld benchmark, where Gemini 3.5 Flash scores 78.4, surpassing Gemini 3 Flash at 65.1 and GPT-5.4 mini at 72.1. Although GPT-5.5 remains slightly ahead at 78.7 and Anthropic Opus 4.8 leads at 83.4, the integration simplifies access for teams building software testing or office automation solutions.

The release addresses security concerns by incorporating adversarial training and two optional enterprise safeguards to prevent prompt injection attacks. One safeguard requires user confirmation for sensitive or irreversible actions, while the other halts tasks upon detecting indirect injections. Google recommends sandboxing, human oversight, and strict access controls alongside these built-in measures. The feature is available through the Gemini API and the Gemini Enterprise Agent Platform, with supporting demos and code on GitHub.

* Scores 78.4 on OSWorld benchmark
* Includes user confirmation for sensitive actions
* Accessible via Gemini API and Enterprise Agent Platform

Source Read original →

Google bakes computer control directly into Gemini 3.5 Flash, letting the model see and operate your screen

Empowering Businesses with AI: Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

The Rolling Stones’ Keith…

Anthropic’s Claude is winning…

DeepReinforce Releases Ornith-1.0: An…