The gap between giving a robot a command and having it execute that command with human-like intuition has long been a hurdle in robotics. For years, “asking properly” meant writing rigid lines of code. While the industry has moved toward more flexible interfaces, there has remained a persistent struggle: the more complex the task, the harder it is to ensure the robot understands the nuance of the request.
That dynamic is shifting through a new partnership between Boston Dynamics and Google DeepMind. The company has announced that its quadruped robot, Spot, is now equipped with Gemini Robotics-ER 1.6. This high-level embodied reasoning model is designed to bring a higher degree of usability and intelligence to complex, real-world tasks, effectively teaching Spot to reason through its environment.
While early demonstrations show Spot performing household chores, the primary commercial application of this integration is industrial inspection. By imbuing the robot with embodied AI, Boston Dynamics aims to move beyond simple pre-programmed paths and toward a system that can autonomously identify hazards and interpret complex environmental data without constant human intervention.
According to Marco da Silva, Vice President and General Manager of Spot at Boston Dynamics, these advances mark a critical step toward robots that can operate in the physical world with greater autonomy. He notes that capabilities such as instrument reading and more reliable task reasoning will allow Spot to notice, understand, and react to real-world challenges autonomously.
Bridging the Gap Between Reasoning and Action
In the field of robotics, “reasoning” and “understanding” are often used loosely. However, for these systems to be commercially viable, they must align with human expectations. Carolina Parada, Head of Robotics at Google DeepMind, explains that the benchmark for “understanding” is whether the system answers or reacts in a way a human would. Without this alignment, a robot might technically complete a task but do so in a way that is impractical or unsafe.
A prime example of this disconnect is seen when Spot is told to “recycle any cans in the living room.” While the robot can identify and move the cans, it may grip them sideways. A human knows that a can with leftover liquid should be held upright to avoid a spill, but robots currently lack that inherent “world knowledge.”

To address this, Gemini Robotics-ER 1.6 utilizes a safety-first approach. Parada mentions the utilize of the ASIMOV benchmark, which consists of numerous natural language examples of actions a robot should avoid. For instance, the model can reason that a cup of water should not be placed on the edge of a table where it could easily fall. While current versions of Spot do not yet use these semantic safety models for all manipulation tasks, the goal is to integrate this reasoning into future iterations to ensure safer object handling.
The Challenge of Physical Data and Vision-Only Models
One of the most significant hurdles in developing embodied AI is the lack of physical data. Most AI models are trained on internet data, which is overwhelmingly visual. While there are millions of images and videos showing how to pick up a pen, there is very little digital data regarding the “feel” or tactile pressure required to do so.
Currently, Gemini Robotics-ER 1.6 is a vision-only model. To compensate for the lack of touch sensors in the AI’s reasoning process, the 1.6 version introduces “success detection.” This feature uses multiple camera angles to determine if Spot has successfully grasped an object. While robots often have physical force and touch sensors, the current AI model does not yet integrate this data because the necessary large-scale tactile datasets do not exist on the web.
To bridge this gap, Boston Dynamics is implementing a data-sharing requirement. Customers using these new inspection capabilities with Spot will be required to share their operational data with the company, providing the real-world physical feedback necessary to train future, more tactile AI models.
Commercial Viability in Industrial Inspection
Unlike many AI-driven robotics projects that remain in the research phase, Boston Dynamics has already deployed several thousand Spot robots commercially. The focus of the Gemini integration is to enhance the robot’s utility in industrial facilities, where it can autonomously search for dangerous debris or spills and read complex gauges and sight glasses.
The integration allows Spot to utilize vision-language-action models when it encounters an environment it doesn’t immediately understand. This reduces the need for humans to “babysit” the robot during a patrol. However, the company acknowledges the risk of AI “hallucinations”—where the AI perceives something that isn’t there.

To mitigate this, da Silva explains that new DeepMind capabilities are rolled out via beta programs to a small set of customers. The company only advertises features once they are confident in their reliability. In industrial settings, the “threshold of usefulness” is key. da Silva suggests that a reliability rate north of 80 percent is necessary to prevent operators from ignoring the robot’s alerts, which would be akin to the robot “crying wolf.”
This real-world experience with Spot serves as a scalable laboratory. The lessons learned from Gemini Robotics-ER 1.6’s performance in the field will eventually be applied to other platforms, including the humanoid robot Atlas. While Atlas may not become a primary inspection tool, the reasoning capabilities developed through Spot’s commercial deployment will pave the way for more reliable robots capable of complex tasks, from clearing soda cans to managing laundry.
Key Takeaways: Spot and Gemini Robotics-ER 1.6
- Enhanced Reasoning: Spot now uses Google DeepMind’s Gemini Robotics-ER 1.6 to better understand and execute complex tasks autonomously.
- Industrial Focus: The primary application is industrial inspection, including reading gauges and detecting spills or debris.
- Vision-Centric: The current model relies on vision and “success detection” via multiple camera angles due to a lack of internet-scale tactile data.
- Safety Benchmarking: The ASIMOV benchmark is used to teach the AI what not to do, such as placing liquids on the edge of tables.
- Data Loop: Commercial users will share data with Boston Dynamics to help train future models with physical and tactile information.
As Boston Dynamics continues to refine the integration of embodied AI, the next phase will involve moving these reasoning capabilities from beta programs to wider commercial availability. Further updates on the deployment of Gemini Robotics-ER 1.6 and its impact on industrial safety are expected as more customer data is integrated into the model.
What do you think about the use of AI-powered robots in industrial safety? Share your thoughts in the comments below or share this article with your network.