← Home

Quick answer

AI Summary: Introduces GPT-3, a 175-billion parameter language model that demonstrated remarkable few-shot learning capabilities, proving that massive scale enables models to perform tasks without specific fine-tuning.

Claim

Language Models are Few-Shot Learners

Tom B. Brown·
Benjamin Mann·
Nick Ryder·
Melanie Subbiah·
Jared Kaplan·
Prafulla Dhariwal·
Arvind Neelakantan·
Pranav Shyam·
Girish Sastry·
Amanda Askell·
Sandhini Agarwal·
Ariel Herbert-Voss·
Gretchen Krueger·
Tom Henighan·
Rewon Child·
Aditya Ramesh·
Daniel Ziegler·
Jeffrey Wu·
Clemens Winter·
Christopher Hesse·
Mark Chen·
Eric Sigler·
Mateusz Litwin·
Scott Gray·
Benjamin Chess·
Jack Clark·
Christopher Berner·
Sam McCandlish·
Alec Radford·
Ilya Sutskever·
Dario Amodei

ABSTRACT

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. We train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Language Models are Few-Shot Learners.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful