← Home

Quick answer

AI Summary: Establishes the foundational 'Scaling Laws' of language models, mathematically proving that model performance predictably improves as a power-law relationship with compute, dataset size, and parameter count.

Claim

Scaling Laws for Neural Language Models

Jared Kaplan·
Sam McCandlish·
Tom Henighan·
Tom B. Brown·
Benjamin Chess·
Rewon Child·
Scott Gray·
Alec Radford·
Jeffrey Wu·
Dario Amodei

ABSTRACT

We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. These relationships allow us to determine the optimal allocation of a fixed compute budget, demonstrating that larger models are significantly more sample-efficient and should be trained on a relatively modest amount of data and stopped significantly before convergence.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Scaling Laws for Neural Language Models.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful