Quick answer
AI Summary: Introduces RETRO, a highly efficient language model architecture that utilizes a trillion-token retrieval database to achieve state-of-the-art performance with 25x fewer parameters than massive dense models.
AI Summary: Introduces RETRO, a highly efficient language model architecture that utilizes a trillion-token retrieval database to achieve state-of-the-art performance with 25x fewer parameters than massive dense models.
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains performance comparable to GPT-3 and Jurassic-1 on the Pile, despite using 25x fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen BERT retriever, a differentiable encoder, and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Improving language models by retrieving from trillions of tokens.