Google has taken a step forward in the world of AI development with its all-new Gemini Robotics and Gemini Robotics-ER AI models. Google DeepMind has introduced Gemini, the AI models based on Gemini 2.0 that are designed to implement AI into real-world robotics These models, according to the blog post, bring vision-language-action (VLA) capabilities and embodied reasoning (ER) to robots, enabling them to perform complex physical tasks with greater adaptability, interactivity, and dexterity. Google DeepMind is partnering with Apptronik to integrate these advancements into humanoid robots.
Gemini Robotics: Vision-language-action model
Gemini Robotics extends the capabilities of Gemini 2.0 by integrating physical actions as an output modality. The company says that It allows robots to interact dynamically with their environment and adapt to changes in real-time.
The model generalises across new environments, objects, and instructions, more than doubling performance on generalization benchmarks compared to previous VLA models. It also understands and responds to natural language commands in multiple languages, adapting its actions based on changing conditions.
Moreover, the model enables fine motor control, allowing robots to perform precise tasks such as origami folding or packing items into a bag.
Initially trained on the ALOHA 2 bi-arm robotic platform, Gemini Robotics is adaptable to various robot types, including Franka-based systems and the Apptronik Apollo humanoid robot.
Gemini Robotics-ER: Enhancing spatial reasoning
On the other hand, the Gemini Robotics-ER focuses on spatial reasoning, allowing roboticists to integrate it with low-level controllers for real-world applications. The model improves 2D and 3D object detection, state estimation, and spatial understanding for better robotic navigation and object interaction. It can autonomously generate control code, achieving 2x-3x higher success rates than previous models. The model leverages in-context learning, refining its responses based on human demonstrations.
Safety and responsible AI
Google DeepMind has not forgotten the responsible use of AI with its two new AI models. The company has integrated the safety measures into Gemini Robotics-ER by enabling it to assess whether an action is safe before execution. The company is also releasing the ASIMOV dataset to evaluate the semantic safety of robotic actions.
Gemini Robotics-ER is being tested by trusted partners, including Agile Robots, Agility Robots, Boston Dynamics, and Enchanted Tools. Google DeepMind aims to refine these models to advance AI-driven robotics applications.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
