← Home

Quick answer

AI Summary: Introduces RETRO, a highly efficient language model architecture that utilizes a trillion-token retrieval database to achieve state-of-the-art performance with 25x fewer parameters than massive dense models.

Claim

Improving language models by retrieving from trillions of tokens

Sebastian Borgeaud·
Arthur Mensch·
Jordan Hoffmann·
Trevor Cai·
Eliza Rutherford·
Katie Millican·
George van den Driessche·
Jean-Baptiste Lespiau·
Bogdan Damoc·
Aidan Clark·
Diego de Las Casas·
Aurelia Guy·
Jacob Menick·
Roman Ring·
Tom Hennigan·
Saffron Huang·
Loren Maggiore·
Chris Jones·
Albin Cassirer·
Andy Brock·
Michela Paganini·
Geoffrey Irving·
Oriol Vinyals·
Simon Osindero·
Karen Simonyan·
Jack W. Rae·
Erich Elsen·
Laurent Sifre

ABSTRACT

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains performance comparable to GPT-3 and Jurassic-1 on the Pile, despite using 25x fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen BERT retriever, a differentiable encoder, and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Improving language models by retrieving from trillions of tokens.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful
Improving language models by retrieving from trillions of tokens | Attendemia