← Home

Quick answer

AI Summary: Expands the concept of AI 'Scaling Laws' to prove that the performance of autoregressive Transformers improves predictably as a function of compute across all data modalities, including images, video, and math.

Claim

Scaling Laws for Autoregressive Generative Modeling

Tom Henighan·
Jared Kaplan·
Mor Katz·
Mark Chen·
Christopher Hesse·
Jacob Jackson·
Heewoo Jun·
Tom B. Brown·
Prafulla Dhariwal·
Scott Gray·
Chris Hallacy·
Benjamin Mann·
Alec Radford·
Aditya Ramesh·
Nick Ryder·
Daniel M. Ziegler·
John Schulman·
Dario Amodei·
Sam McCandlish

ABSTRACT

Building upon previous work establishing scaling laws for language models, we investigate whether similar power-law scaling relationships hold across other data modalities. We train autoregressive Transformer models on a diverse set of domains, including images, video, multi-modal image-text representations, and mathematical equations. We discover that cross-entropy loss smoothly improves as a power-law function of model size and compute budget across every single modality tested. Furthermore, we establish that the optimal aspect ratio (depth vs. width) of Transformers remains relatively constant regardless of the data type. These findings suggest a profound universality in how autoregressive models learn, providing a unified predictive framework for scaling generative AI.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Scaling Laws for Autoregressive Generative Modeling.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful
Scaling Laws for Autoregressive Generative Modeling | Attendemia