RT-2 A Breakthrough For Robot Control
Written by Mike James   
Wednesday, 16 August 2023

RT-2 is the new version of what Google calls its vision-language-action (VLA) model and at last we can start to see the potential in general purpose robots. While everyone is hyping AI chatbots, this is where the real revolution is happening.

Google DeepMind has announced Robotic Transformer 2 (RT-2), a "first-of-its-kind" vision-language-action (VLA) model that uses data scraped from the Internet to enable better robotic control through plain language commands. The ultimate goal is to create general-purpose robots that can navigate human environments.

RT-2Ketchup-Cube

The key idea is that many things can be cast into a form of language and then they can be implemented using Large Language Models LLMs which are driving the success of AI at the moment. In this case robot actions are treated as a language. The language tokens are translated into actions and this how the robot performs a task. The second component is the Vision Language Module VLM which is trained on images presented to a convolutional network and a language model trained using language describing the images. VLMs can be used to caption images, describe images and so on - once trained you present them with an image and they generate text. In this case the text also includes robot instructions.

 

 

 

RT-1 was based on PaLM-E and one version of RT-2 still uses it and a second version uses PaLI-X - the two have slightly different strengths and weaknesses.

 

RT-2commandexamples

RT-2’s ability to transfer information to actions shows promise for robots to more rapidly adapt to novel situations and environments. In testing RT-2 models in more than 6,000 robotic trials, the team found that RT-2 functioned as well as the previous model, RT-1, on tasks in its training data, or “seen” tasks. And it almost doubled its performance on novel, unseen scenarios to 62% from RT-1’s 32%.

RT-2Rock Hammer

In other words, with RT-2, robots are able to learn more like we do — transferring learned concepts to new situations. This ability to generalize is very important. Unlike GPT3 and other language models, RT-2 cannot afford to hallucinate and return untruthful results - just think what it would mean. It is why it is very likely that the next big breakthrough in AI will come from "embodied" AI. That is, putting AI agents in control of a physical body ensures that reality rules and hallucinations are quickly beaten out of the system.

Not only does RT-2 show how advances in AI are cascading rapidly into robotics, it shows enormous promise for more general-purpose robots. While there is still work to be done to enable helpful robots in human-centered environments, RT-2 shows us an exciting future for robotics just within grasp. You can't help but wonder what the result of using this sort of system with Spot or Atlas not to mention self driving cars. All of our current line up of top robots started out long before this sort of approach was possible.

 

More Information

What is RT-2: Speaking robot: Our new AI model translates vision and language into robotic actions

RT-2: New model translates vision and language into action

Paper: RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Related Articles

Google's Large Language Model Takes Control

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


Gifts For Geeks 2024
22/11/2024

Are you ready for Thanksgiving, when overeating remorse and a surfeit of being thankful causes the unsettling thought that there are only four weeks till the Xmas break? So here is a mix of weird [ ... ]



Copilot Improves Code Quality
27/11/2024

Findings from GitHub show that code authored with Copilot has increased functionality and improved readability, is of better quality, and receives higher approval rates than code authored without it.

 [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 16 August 2023 )