I/O 2026: Welcome to the agentic Gemini era
It has been an extraordinary year since our last I/O. We have seen relentless progress in technology and innovation. Now, we are at a stage where people want to see how AI is delivering value in their daily lives. This focus is reflected in the products and features we are announcing today.
AI momentum across the full stack
The stories of how people are using AI are our best measure of progress. To understand the scale at which people are adopting AI, another great proxy is tokens — the fundamental units of data that our models process, many representing a problem being solved.
Two years ago, we were processing 9.7 trillion tokens per month across our surfaces. Last year, this number grew to roughly 480 trillion tokens. Today, it has jumped seven times to over 3.2 quadrillion tokens processed monthly.
This tells an important story about our products and how others are building as well — especially developers and enterprises:
- Over 8.5 million developers are now building new apps and experiences with our models each month.
- Our model APIs process roughly 19 billion tokens per minute.
- In the past year, over 375 Google Cloud customers processed more than one trillion tokens, demonstrating significant demand for AI across various industries.
Momentum with our products
Today we have 13 products with over a billion users each. Five of these products now have more than 3 billion users.
AI Overviews has grown to over 2.5 billion monthly active users, and AI Mode has surpassed one billion monthly active users in just a year. People love the upgrade to Search ever since it was introduced.
AI Overviews now brings generative AI benefits to more people than any other product in the world. It makes Search feel like an ongoing conversation, providing deeper insights and connecting you with vast information on the web.
Natural, conversational AI in products
We have been rapidly innovating in places where we can bring more natural conversations to our products. For example, Maps got its biggest upgrade with a new feature called Ask Maps. People use Ask Maps for complex and longer questions.
Ask YouTube
People come to YouTube every day to ask lots of questions. Sometimes it can be hard to know where to start.
Ask YouTube reimagines this experience, making information much more digestible and easy to navigate. It shows you videos that best match your interests, and most importantly, jumps right to the part of the video most relevant to you.
We are now testing Ask YouTube in the U.S., and it will be broadly available for users this summer.
Voice-powered Docs Live
Having more natural conversations with Gemini directly inside our products is a big step. A new feature called Docs Live takes this to another level. To create a doc, you can now verbally “brain dump” what’s on your mind and let Gemini handle the rest.
In the future, you’ll be able to create new docs and edit them directly with your voice. This feature is rolling out for subscribers this summer, and it will also come to Gmail and Keep.
Infrastructure supporting innovation at scale
To support all of this scale for our users while serving enterprises and developers around the world, we need massive investments in infrastructure. In 2022, we were spending $31 billion annually on capex. This year, that number is expected to be about six times that — approximately $180 to $190 billion.
A key part of this investment is our custom silicon. A decade ago, we announced our first commercial tensor processing unit (TPU) at I/O. Since then, we have transformed how the industry builds for AI. We recently announced the eighth generation of TPUs at Cloud Next.
- TPU 8t is optimized for large-scale pretraining and has nearly tripled in raw computing power compared to our previous generation. This allows us to distribute training across multiple sites, scaling it globally with more than 1 million TPUs. For model builders, this means creating the largest training cluster in the world.
- TPU 8i is designed for inference and has dramatically improved speed at every step. If we learned anything over 27 years of working on Search, it’s that latency matters.
In addition to speed, we’re also thinking about scaling sustainably. Both chips are more energy efficient, delivering up to two times better performance per watt.
Gemini Omni
This progress with TPUs is how we can make compute advances across models and coding agents. With world models, AI is moving from predicting text to simulating reality. We have been working to push the boundaries of what these models can do.
Gemini Omni is our new model capable of generating samples in any output modality from any input. We’re starting with video outputs and over time we’ll enable image and text. This new model combines Gemini’s intelligence with our generative media models — a huge leap forward in world understanding. We are launching the first model in the Omni family: Gemini Omni Flash.
Gemini Omni Flash is available today. You will be able to try it on the Gemini app, Google Flow, and on YouTube Shorts. It will also be made available via APIs for developers and enterprise customers in the coming weeks.
New SynthID updates and partners
As generative AI gets better, so does the need for greater transparency. Research shows that people can correctly identify high-quality deepfake videos only about a quarter of the time. Three years ago, we launched SynthID, our watermark that is invisible to the naked eye.
We have now watermarked over one hundred billion images and videos with SynthID, along with sixty thousand years of audio assets. Millions of people are using our SynthID detector in the Gemini app to verify AI-generated content. Now we’re going a step further by adding Content Credentials verification across products. This will show you if the origin of the content was AI or a camera and if it’s been edited with generative AI tools.
We are thrilled to announce that OpenAI, Kakao, and Eleven Labs have adopted SynthID. It’s great to see cross-industry collaboration. We’re looking forward to expanding this to more partners and setting the standard of transparency for the AI era.
Gemini 3.5 Flash
Gemini 3 launched a few months ago, with a full family of models. It’s our most adopted series yet. We’ve loved seeing developers use Flash as their daily driver and build incredible experiences with Pro’s deep reasoning and multimodal capabilities.
We have been hard at work improving these models, especially focused on agentic coding, long-horizon tasks, and real-world workflows. Today, we’re introducing Gemini 3.5 Flash, our first in a series of models combining frontier intelligence with action.
- When compared to the previous version (3.1 Pro), Gemini 3.5 Flash is better across almost all benchmarks, particularly in coding tasks and GDPVal captures many real-world economically valuable tasks.
- Gemini 3.5 Flash is a very capable model at the frontier, comparable to the best models available today but still much faster. This makes it stand out in terms of output tokens per second — four times faster than other frontier models.
The new model has been a game changer for us internally at Google. We’ve been using 3.5 Flash with our reimagined agent-first development platform Antigravity, which dramatically accelerated how we build AI tools. In March, we were processing half a trillion tokens per day across our AI developer tools, and now we’re processing more than three trillion tokens per day. This scale created a powerful feedback loop helping us improve 3.5.
What’s amazing about Flash is how it delivers frontier-level capabilities at less than half the price of comparable frontier models. Many companies are already blowing through their annual token budgets, and this only gets more challenging as we move into May. If companies used a mix of Flash and other frontier models, they could potentially reduce costs without compromising on performance.
Key Takeaways
- The number of tokens processed by Google’s models has grown dramatically in recent years, reflecting the increasing adoption of AI.
- Gemini Flash and Gemini Omni are key advancements that enable more natural conversations with AI and allow for new types of media generation.
- Transparency features like SynthID and Content Credentials verification are being expanded to provide users with greater assurance about the authenticity of content.
- The investment in custom silicon, such as TPUs, is enabling significant improvements in both training and inference speeds for AI models.
Originally published at blog.google. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




