RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Paper • Jul 28, 2023 • arXiv • Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Irpan, Google DeepMind
We introduce Robotic Transformer 2 (RT-2), a novel Vision-Language-Action (VLA) model that learns from both vast web datasets and specialized robotics data. We show that high-capacity vision-langua...