Google DeepMind Unveils On-Device Gemini Robotics Model for Local Task Execution

Google DeepMind has introduced a new language model called Gemini Robotics On-Device, which can run tasks locally on robots without an internet connection. This model builds on the company’s previous Gemini Robotics AI model, released in March, and is designed to control a robot’s movements. The vision-language-action model (VLA) is small and efficient enough to run directly on a robot, allowing developers to control and fine-tune the model using natural language prompts.
According to
, the new model performs at a level close to the cloud-based Gemini Robotics model in benchmarks and outperforms other on-device models in general benchmarks. Carolina Parada, Head of Robotics at Google DeepMind, noted that while the hybrid model is still more powerful, the on-device model is surprisingly strong and can be used as a starter model or for applications with poor connectivity.In a demonstration, robots running the local model were shown unzipping bags and folding clothes. The model, initially trained for ALOHA robots, was adapted to work on a bi-arm Franka FR3 robot and the Apollo humanoid robot by Apptronik. The bi-arm Franka FR3 successfully tackled scenarios and objects it hadn’t seen before, such as doing assembly on an industrial belt. Developers can train robots on new tasks using the models on the MuJoCo physics simulator by showing them 50 to 100 demonstrations of tasks.
Google DeepMind also released a software development kit called the Gemini Robotics SDK. This SDK provides full lifecycle tooling necessary for using Gemini Robotics models, including accessing checkpoints, serving a model, evaluating the model on the robot and in the
, uploading data, and fine-tuning it. The on-device Gemini Robotics model and its SDK will be available to a group of trusted testers while Google continues to work toward minimizing safety risks.Other tech companies are also showing interest in robotics. Nvidia is building a platform to create foundational models for humanoids, with its CEO, Jensen Huang, noting that building foundation models for general humanoid robots is one of the most exciting problems to solve in AI today. Nvidia has been championing robotic innovation through initiatives like Isaac and Jetson, and last year joined the humanoid race with Project GROOT, a general-purpose foundation model for humanoid robots.
Hugging Face is developing open models and datasets for robotics and has revealed an OpenAI model for robotics called SmolVLA. The model is trained on community-shared datasets and outperforms much larger models for robotics in both virtual and real-world environments. Hugging Face aims to democratize access to vision-language-action (VLA) models and accelerate research toward generalist robotic agents. The firm also launched LeRobot, a collection of robotics-focused models, datasets, and tools, and recently acquired Pollen Robotics, a robotics startup, revealing several inexpensive robotics systems for purchase.

Comments
No comments yet