Google Gemini Omni Leaked — Native Video Model Shocks AI World

Right now, inside the Gemini app, there’s a new option. A tab that says “Video Generation.” And if you tap it, you see: “Start with an idea or try a template. Powered by Omni.”

That’s the leak. And it changes the video AI game more than any model release this year.

What Actually Happened

It started on Reddit and X around May 6-7. Users opening the Gemini app were greeted with a full-screen prompt: “Create with Gemini Omni — Meet our new video generation model. Remix your videos, edit directly in chat, try a template, and more.”

The prompt offered three options: Create a Video, Create with a Template, or browse pre-made examples. Some users actually got access to the model before it was pulled back. The results they posted are mind-blowing compared to what we’ve seen before.

The Demos That Matter

One user asked for: “A scene with two men at a seaside restaurant eating spaghetti.” The output wasn’t just a video — it showed two distinct characters approaching a table, exchanging greetings, sitting down, and eating together. Multiple subjects interacting naturally. AI video’s biggest hurdle has always been keeping characters consistent across frames. Omni apparently handles it.

Another demo: “A professor writes out a mathematical proof for trigonometric identities on a chalkboard.” The generated video shows the professor actually writing on the board, explaining step by step while the text appears behind him. Text generation in AI video has always been terrible — garbled nonsense that looks like alien script. This one apparently gets it right.

Why This Isn’t Just Another Veo Update

Google’s current video setup is fragmented. Veo handles video generation. Image generation tools handle images. Gemini handles text. They’re separate systems duct-taped together.

“Omni” is literally in the name. The industry reading is that this is Google’s attempt at a truly unified model — one that generates text, images, AND video natively. The way GPT-4o unified text and images, but now with video added.

If true, no current competitor offers this. ByteDance’s Seedance 2.0? Specialised video model. Alibaba’s Wan 2.7? Video only. Kling, Sora, Runway — all video specialists. Omni would be the first major unified model with video output.

What Users Actually Saw

The UI leak showed a usage tab alongside Omni. Two video generations consumed 86% of the daily allowance on an AI Pro plan. That suggests Google is being careful about compute — this model is expensive to run.

Google’s description in the app: “Remix your videos, edit directly in chat, try a template, and more.” That’s not just generation. That’s editing. That’s templates. That’s an entire video production pipeline inside a chat interface.

What This Means

Google I/O 2026 opens on May 19. The timing of this leak — two weeks out — follows Google’s standard playbook. A UI string surfaces, the community speculates, the keynote provides the big reveal.

But here’s what makes this different from previous leaks: users actually got to use it. The model went live briefly, generated real demos, and demonstrated capabilities that current video models struggle with — multiple consistent characters, legible text, natural interaction.

If Google ships Omni at I/O, standalone AI video tools are going to have a bad time. When high-quality video generation is just another feature inside a chat app you already use, paying extra for a specialised tool becomes harder to justify.

One week to go. The demos are real. The UI is live. Now we wait for Google to confirm what everyone already knows.


Posted

in

by