← Home

Quick answer

5 billion parameter model that demonstrated SOTA zero-shot task performance simply by predicting the next word on a massive, diverse dataset of internet text.

Claim

Language Models are Unsupervised Multitask Learners

Alec Radford·
Jeffrey Wu·
Rewon Child·
David Luan·
Dario Amodei·
Ilya Sutskever

ABSTRACT

Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on task-specific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText. Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits WebText. These findings suggest a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Language Models are Unsupervised Multitask Learners.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful