Quick answer
AI Summary: Expands the concept of AI 'Scaling Laws' to prove that the performance of autoregressive Transformers improves predictably as a function of compute across all data modalities, including images, video, and math.
AI Summary: Expands the concept of AI 'Scaling Laws' to prove that the performance of autoregressive Transformers improves predictably as a function of compute across all data modalities, including images, video, and math.
Building upon previous work establishing scaling laws for language models, we investigate whether similar power-law scaling relationships hold across other data modalities. We train autoregressive Transformer models on a diverse set of domains, including images, video, multi-modal image-text representations, and mathematical equations. We discover that cross-entropy loss smoothly improves as a power-law function of model size and compute budget across every single modality tested. Furthermore, we establish that the optimal aspect ratio (depth vs. width) of Transformers remains relatively constant regardless of the data type. These findings suggest a profound universality in how autoregressive models learn, providing a unified predictive framework for scaling generative AI.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Scaling Laws for Autoregressive Generative Modeling.