Quick answer
AI Summary: Introduces Gato, a single transformer-based generalist agent capable of performing over 600 diverse tasks across multiple modalities, including playing games and controlling robots.
AI Summary: Introduces Gato, a single transformer-based generalist agent capable of performing over 600 diverse tasks across multiple modalities, including playing games and controlling robots.
Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model, the data, and the training procedure, and evaluate Gato's capabilities across over 600 distinct tasks.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for A Generalist Agent.