Quick answer

AI Summary: A monumental 118-page technical report detailing the training, dataset curation, and extensive evaluation of Gopher, a 280-billion parameter language model that advanced the understanding of scaling limits.

Paper2021-12-08•Source ↗•118 attns430 checkouts

Claim

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Authors

Discuss with Grok

Jack W. Rae·

Sebastian Borgeaud·

Trevor Cai·

Katie Millican·

Jordan Hoffmann·

Francis Song·

John Aslanides·

Sarah Henderson·

Roman Ring·

Susannah Young·

Eliza Rutherford·

Tom Hennigan·

Jacob Menick·

Albin Cassirer·

Richard Powell·

George van den Driessche·

Lisa Anne Hendricks·

Maribeth Rauh·

Po-Sen Huang·

Amelia Glaese·

Johannes Welbl·

Sumanth Dathathri·

Saffron Huang·

Jonathan Uesato·

John Mellor·

Irina Higgins·

Antonia Creswell·

Nat McAleese·

Amy Wu·

Eleni Elia·

Danilo J. Rezende·

Vinyals·

Simonyan

ABSTRACT

Language modelling provides a step towards intelligent communication systems by harnessing large datasets and expressive models. We provide an analysis of Transformer-based language model architectures and training datasets, including a 280 billion parameter model called Gopher. Gopher is evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority of them. More importantly, we provide a holistic analysis of the training process, exploring the limits of scale on reading comprehension, fact-checking, and toxic output mitigation. This extensive 118-page report details our findings on architectural design choices, dataset curation (MassiveText), and the emergent capabilities of models at the 200B+ parameter scale.

#llms #cs-lg lab:deep-mind-ai #scaling-laws #cs-cl

Review Snapshot

Explore ratings

4.2

★★★★★

5 ratings

5 star

40%

4 star

40%

3 star

20%

2 star

1 star

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Scaling Language Models: Methods, Analysis & Insights from Training Gopher.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.

Post an inquiry

Sort by: Most helpful