Google DeepMind has revealed SIMA 2, the newest version of its Scalable Instructable Multiworld Agent. It marks a significant step forward in training AI systems to reason through tasks, adapt to new environments and interact naturally with human instructions. The upgrade builds on the first SIMA model introduced in March 2024 and is powered by Google’s Gemini models, with an emphasis on planning and continual learning.
DeepMind says SIMA 2 can now analyse its actions and determine the steps needed to complete a given task. The agent receives a visual feed from a three-dimensional game world along with a user-defined objective such as “build a shelter” or “locate the red house”. It then breaks that goal into smaller actions and executes them using inputs similar to a keyboard and mouse. This approach allows the system to map instructions to meaningful behaviour based on what it observes on screen.
One of the standout advances is its improved performance in unfamiliar games. DeepMind tested SIMA 2 in environments it had never encountered before, including Minedojo, a research-focused adaptation of Minecraft, and ASKA, a Viking-themed survival game. In both cases, SIMA 2 outperformed the original version by demonstrating better adaptability and higher task success rates. The system also handles multimodal prompts, allowing users to give it instructions through sketches, emojis or different languages. Concepts learned in one game can transfer into another, enabling more efficient learning across varied virtual worlds.
Training the model involves a blend of human demonstrations and automatically generated annotations from the Gemini models. Whenever SIMA 2 picks up a new skill or movement in a fresh environment, that experience is recorded and fed back into the training process. DeepMind says this reduces the amount of human-labelled data required and allows the agent to refine its abilities as it explores new scenarios.
Despite the progress, the system is not without limits. DeepMind acknowledges that SIMA 2 still struggles with long-term memory, complex multi-step reasoning and extremely precise low-level control. These constraints make it unsuitable for direct integration with physical robotics at this stage.
However, DeepMind is clear about its long-term objective. The company sees three-dimensional game environments as a practical proving ground for AI agents that could eventually control real-world machines. By developing systems capable of understanding natural language, making plans and executing tasks in complex virtual spaces, DeepMind hopes to lay the groundwork for general-purpose robots that can operate in everyday physical settings.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.