← Home

Quick answer

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases.

Claim

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team·
Rohan Anil·
Sebastian Borgeaud·
Jean-Baptiste Alayrac·
Jiahui Yu·
Radu Soricut·
Johan Schalkwyk·
Andrew M. Dai·
Anja Hauth·
Katie Millican·
David Silver·
Melvin Johnson·
Ioannis Antonoglou·
Julian Schrittwieser·
Amelia Glaese·
Jilin Chen·
Emily Pitler·
Timothy Lillicrap·
Angeliki Lazaridou·
Orhan Firat·
James Molloy·
Michael Isard·
Paul R. Barham·
Tom Hennigan·
Benjamin Lee·
Fabio Viola·
Malcolm Reynolds·
Yuanzhong Xu·
Ryan Doherty·
Eli Collins·
Clemens Meyer·
Eliza Rutherford·
Erica Moreira·
Kareem Ayoub·
Megha Goel·
Jack Krawczyk·
Cosmo Du·
Ed Chi·
Heng-Tze Cheng·
Eric Ni·
Purvi Shah·
Patrick Kane·
Betty Chan·
Manaal Faruqui·
Aliaksei Severyn·
Hanzhao Lin·
YaGuang Li·
Yong Cheng·
Abe Ittycheriah·
Mahdis Mahdieh·
Mia Chen·
Pei Sun·
Dustin Tran·
Sumit Bagri·
Balaji Lakshminarayanan·
Jeremiah Liu·
Andras Orban·
Fabian Güra·
Hao Zhou·
Xinying Song·
Aurelien Boffy·
Harish Ganapathy·
Steven Zheng·
HyunJeong Choe·
Ágoston Weisz·
Tao Zhu·
Yifeng Lu·
Siddharth Gopal·
Jarrod Kahn·
Maciej Kula·
Jeff Pitman·
Rushin Shah·
Emanuel Taropa·
Majd Al Merey·
Martin Baeuml·
Zhifeng Chen·
Laurent El Shafey·
Yujing Zhang·
Olcan Sercinoglu·
George Tucker·
Enrique Piqueras·
Maxim Krikun·
Iain Barr·
Nikolay Savinov·
Ivo Danihelka·
Becca Roelofs·
Anaïs White·
Anders Andreassen·
Tamara von Glehn·
Lakshman Yagati·
Mehran Kazemi·
Lucas Gonzalez·
Misha Khalman·
Jakub Sygnowski·
Alexandre Frechette·
Charlotte Smith·
Laura Culp·
Lev Proleev·
Yi Luan·
Xi Chen·
James Lottes·
Nathan Schucher·
Federico Lebron·
Alban Rrustemi·
Natalie Clay·
Phil Crone·
Tomas Kocisky·
Jeffrey Zhao·
Bartek Perz·
Dian Yu·
Heidi Howard·
Adam Bloniarz·
Jack W. Rae·
Han Lu·
Laurent Sifre·
Marcello Maggioni·
Fred Alcober·
Dan Garrette·
Megan Barnes·
Shantanu Thakoor·
Jacob Austin·
Gabriel Barth-Maron·
William Wong·
Rishabh Joshi·
Rahma Chaabouni·
Deeni Fatiha·
Arun Ahuja·
Gaurav Singh Tomar·
Evan Senter·
Martin Chadwick·
Ilya Kornakov·
Nithya Attaluri·
Iñaki Iturrate·
Ruibo Liu·
Yunxuan Li·
Sarah Cogan·
Jeremy

ABSTRACT

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.

Review Snapshot

Explore ratings

0.0
★★★★★
0 ratings
5 star
0%
4 star
0%
3 star
0%
2 star
0%
1 star
0%

Recommendation

0%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Gemini: A Family of Highly Capable Multimodal Models.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful