Quick answer

Paper2024-12-16•Source ↗•10 attns9,273 checkouts

Claim

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors

Discuss with Grok

Gemini Team·

Petko Georgiev·

Ving Ian Lei·

Ryan Burnell·

Libin Bai·

Anmol Gulati·

Garrett Tanzer·

Damien Vincent·

Zhufeng Pan·

Shibo Wang·

Soroosh Mariooryad·

Yifan Ding·

Xinyang Geng·

Fred Alcober·

Roy Frostig·

Mark Omernick·

Lexi Walker·

Cosmin Paduraru·

Christina Sorokin·

Andrea Tacchetti·

Colin Gaffney·

Samira Daruki·

Olcan Sercinoglu·

Zach Gleicher·

Juliette Love·

Paul Voigtlaender·

Rohan Jain·

Gabriela Surita·

Kareem Mohamed·

Rory Blevins·

Junwhan Ahn·

Tao Zhu·

Kornraphop Kawintiranon·

Orhan Firat·

Yiming Gu·

Yujing Zhang·

Matthew Rahtz·

Manaal Faruqui·

Natalie Clay·

Justin Gilmer·

JD Co-Reyes·

Ivo Penchev·

Rui Zhu·

Nobuyuki Morioka·

Kevin Hui·

Krishna Haridasan·

Victor Campos·

Mahdis Mahdieh·

Mandy Guo·

Samer Hassan·

Kevin Kilgour·

Arpi Vezer·

Heng-Tze Cheng·

Raoul de Liedekerke·

Siddharth Goyal·

Paul Barham·

DJ Strouse·

Seb Noury·

Jonas Adler·

Mukund Sundararajan·

Sharad Vikram·

Dmitry Lepikhin·

Michela Paganini·

Xavier Garcia·

Fan Yang·

Dasha Valter·

Maja Trebacz·

Kiran Vodrahalli·

Chulayuth Asawaroengchai·

Roman Ring·

Norbert Kalb·

Livio Baldini Soares·

Siddhartha Brahma·

David Steiner·

Tianhe Yu·

Fabian Mentzer·

Antoine He·

Lucas Gonzalez·

Bibo Xu·

Raphael Lopez Kaufman·

Laurent El Shafey·

Junhyuk Oh·

Tom Hennigan·

George van den Driessche·

Seth Odoom·

Mario Lucic·

Becca Roelofs·

Sid Lall·

Amit Marathe·

Betty Chan·

Santiago Ontanon·

Luheng He·

Denis Teplyashin·

Jonathan Lai·

Phil Crone·

Bogdan Damoc·

Lewis Ho·

Sebastian Riedel·

Karel Lenc·

Chih-Kuan Yeh·

Aakanksha Chowdhery·

Yang Xu·

Mehran Kazemi·

Ehsan Amid·

Anastasia Petrushkina·

Kevin Swersky·

Ali Khodaei·

Gowoon Chen·

Chris Larkin·

Mario Pinto·

Geng Yan·

Adria Puigdomenech Badia·

Piyush Patil·

Steven Hansen·

Dave Orr·

Sebastien M. R. Arnold·

Jordan Grimstad·

Andrew Dai·

Sholto Douglas·

Rishika Sinha·

Vikas Yadav·

Xi Chen·

Elena Gribovskaya·

Jacob Austin·

Jeffrey Zhao·

Kaushal Patel·

Paul Komarek·

Sophia Austin·

Sebastian Borgeaud·

Linda Friso·

Abhimanyu Goyal·

Ben Caine·

Kris Cao·

Da-Woon Chung·

Matt

ABSTRACT

In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.

Review Snapshot

Explore ratings

0.0

★★★★★

0 ratings

5 star

4 star

3 star

2 star

1 star

Recommendation

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.

Post an inquiry

Sort by: Most helpful