Google DeepMind has unveiled a research preview of SIMA 2, its next-generation generalist AI agent, which integrates the language and reasoning capabilities of Google's Gemini large language model. This integration enables SIMA 2 to move beyond simple instruction following, demonstrating advanced understanding and interaction within virtual 3D environments.
SIMA 2 represents an advancement over its predecessor, SIMA 1, which was trained on video game data to learn how to play multiple 3D games. While SIMA 1, launched in March 2024, could follow basic instructions, it achieved only a 31% success rate for completing complex tasks, compared to a 71% success rate for humans. According to Joe Marino, senior research scientist at DeepMind, SIMA 2 marks a "step change and improvement in capabilities," doubling SIMA 1's performance by uniting Gemini's language and reasoning with embodied skills developed through training.
The agent is powered by the Gemini 2.5 flash-lite model and features self-improvement capabilities. DeepMind states SIMA 2 can complete complex tasks in previously unseen environments and enhance its performance based on its own experience. Marino described this as a step toward "more general-purpose robots and AGI systems more generally." DeepMind defines Artificial General Intelligence (AGI) as a system capable of a wide range of intellectual tasks, learning new skills, and generalizing knowledge across different areas.
DeepMind researchers emphasize that working with "embodied agents," which interact with a world via a body by observing inputs and taking actions, is crucial for generalized intelligence. Jane Wang, a research scientist at DeepMind, noted that SIMA 2 aims to understand situations and user requests, then respond with common sense. Demonstrations included SIMA 2 describing its surroundings in "No Man's Sky," determining next steps by recognizing a distress beacon, and interpreting instructions like "go to the house that's the color of a ripe tomato" by reasoning that ripe tomatoes are red.
The agent also uses Gemini to generate its own tasks and employs a separate reward model to score attempts, allowing it to learn from mistakes and develop new behaviors with AI-based feedback. DeepMind views SIMA 2 as a step toward unlocking more general-purpose robots, with Frederic Besse, senior staff research engineer, highlighting its contribution to high-level understanding and reasoning. DeepMind did not provide a specific timeline for implementing SIMA 2 in physical robotics systems or for a general release of the preview.