Quick answer
AI Summary: Introduces GPT-3, a 175-billion parameter language model that demonstrated remarkable few-shot learning capabilities, proving that massive scale enables models to perform tasks without specific fine-tuning.
AI Summary: Introduces GPT-3, a 175-billion parameter language model that demonstrated remarkable few-shot learning capabilities, proving that massive scale enables models to perform tasks without specific fine-tuning.
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. We train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Language Models are Few-Shot Learners.