Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

Google’s Gemini Omni Turns Images, Audio, and Text into Video — and That’s Just the Start Key Takeaways Google has released Gemini…

By AI Maestro May 19, 2026 1 min read
Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start



Google’s Gemini Omni Turns Images, Audio, and Text into Video — and That’s Just the Start

Key Takeaways

  • Google has released Gemini Omni, a new family of multimodal models aimed at creating any content from various inputs.
  • The first release is Gemini Omni Flash, which can generate short videos (10 seconds) using text and images as input. Longer videos are planned for the future.
  • One notable feature is the use of a digital watermark to verify if the video was generated by Google’s Gemini products.
  • The model has broad applications, including generating images from audio or audio from video, and could be transformative for advertising and filmmaking.

When Google first announced Gemini, it aimed to create a single neural network capable of handling text, image, audio, and video. Today, at its I/O conference, the company has taken a step closer to this goal with Gemini Omni.

The new models are designed to reason across all their inputs to produce coherent outputs. One example is generating a stop-motion animation explaining protein folding using only text input. This demonstrates how they can understand and simulate complex scenarios beyond simple image or audio processing.

Google already has tools like Veo for creating videos from text and images, but Omni takes this further by enabling more natural interactions between different modalities. Users can now edit photos with plain text commands, a feature similar to Google’s Nano Banana tool.

The long-term vision involves extending the capabilities of these models so they can generate content in various forms—such as audio from video or images—and even within existing media like avatars and digital personas.

For consumers, Omni Flash is designed to be user-friendly for creating short videos. For businesses, Google plans to make the Omni API available soon, allowing them to integrate these capabilities into their workflows. This could revolutionize how content creators work by enabling end-to-end multimodal processes.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top