Google Responds To OpenAI’s Sora Launch

The DeepMind team at Google introduced its latest video generation model, Veo 2, on Monday. This new model has the capability to create videos that can last up to two minutes and deliver resolutions as high as 4K—representing a significant upgrade from its predecessor, Sora, which could only produce 20-second clips at a lower 1080p resolution.

It’s important to note that these figures represent the maximum potential of Veo 2. Currently, the model is only accessible via VideoFX, Google’s experimental video generation platform, which restricts clips to a maximum length of eight seconds and a resolution of 720p. Access to VideoFX is also on a waitlist basis, limiting who can experiment with Veo 2 for now. However, the company plans to broaden access in the near future. A Google spokesperson mentioned that Veo 2 will also be integrated into the Vertex AI system once the model’s capabilities can be effectively scaled.

“In the upcoming months, we will keep refining the model based on user input,” stated Eli Collins in an interview with TechCrunch. “We aim to incorporate Veo 2’s enhanced features into exciting applications throughout the Google ecosystem… Expect to see more updates next year.”

Today, we’re announcing Veo 2: our state-of-the-art video generation model which produces realistic, high-quality clips from text or image prompts. 🎥

We’re also releasing an improved version of our text-to-image model, Imagen 3 – available to use in ImageFX through… pic.twitter.com/h6ejHaMUM4

— Google DeepMind (@GoogleDeepMind) December 16, 2024

Veo 2 is said to have several advantages over earlier versions. It boasts a better grasp of physical dynamics, producing more realistic fluid movements and improved lighting and shadow effects. Additionally, it generates video clips with increased clarity, meaning that textures and images appear sharper and are less likely to blur during movement. The updated model also enhances camera controls, allowing users to maneuver the virtual camera with greater finesse.

However, TechCrunch points out that while Veo 2 has made strides, there is still room for improvement in the video generation process compared to competitors like Sora, Kling, Movie Gen, or Gen 3 Alpha, particularly in areas like coherence and consistency. “While Veo can maintain adherence to prompts for a couple of minutes, it struggles with more complex requests over longer durations,” Collins explained. “Moreover, maintaining character consistency and intricate detailing, especially during fast and dynamic motions, remains a challenge as well.”

On the same day, Google announced enhancements to Imagen 3, allowing the commercial image generation model to produce “brighter and better-composed” visuals. This model, available in ImageFX, will introduce additional descriptive suggestions based on keywords from user prompts, thereby enriching the user experience.

Leave a Reply Cancel reply