Google has announced the release of Veo 3.1, an updated iteration of its generative artificial intelligence video model, effective October 15, 2025. This new version features enhanced audio output, more granular editing controls, and improved output for image-to-video conversions. According to Google, Veo 3.1 builds upon the functionalities introduced with its Veo 3 release in May, delivering more realistic video clips and demonstrating greater adherence to user-provided prompts. The model is being integrated into Google's Flow video editor, the Gemini App, and its Vertex and Gemini APIs.
A primary new capability in Veo 3.1 allows users to introduce an object into a generated video clip, with the model ensuring it blends seamlessly into the video's existing aesthetic. The company also indicated plans to soon enable users to remove existing objects from videos within the Flow editor, expanding the suite of direct manipulation tools available. These features aim to provide creators with more precise control over their generated content.
Veo 3.1 extends the utility of several features that were part of the earlier Veo 3 model. These include the ability to add reference images to guide the generation of character movements, providing initial and concluding frames to facilitate AI-driven clip generation, and extending the duration of an existing video based on its final frames. With Veo 3.1, Google has integrated audio capabilities across these functionalities, intending to make the resulting clips more dynamic and immersive.
The expanded deployment strategy ensures Veo 3.1's capabilities are accessible across multiple Google platforms. Its integration into the Flow video editor, a dedicated AI-powered tool, the widely used Gemini App, and the developer-centric Vertex and Gemini APIs, signifies a broad distribution. Google reported substantial user engagement with Flow since its May launch, with users having created more than 275 million videos on the application to date.
This release underscores the ongoing development in generative AI for multimedia content, focusing on realism, user control, and integration across established digital ecosystems.