Quick answer
5 billion parameter model that demonstrated SOTA zero-shot task performance simply by predicting the next word on a massive, diverse dataset of internet text.
5 billion parameter model that demonstrated SOTA zero-shot task performance simply by predicting the next word on a massive, diverse dataset of internet text.
Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on task-specific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText. Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits WebText. These findings suggest a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Language Models are Unsupervised Multitask Learners.