← Home

Quick answer

AI Summary: Documents 'grokking,' a surprising phenomenon where neural networks suddenly learn to generalize perfectly on algorithmic tasks long after they have completely overfitted the training data.

Claim

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Alethea Power·
Yuri Burda·
Harri Edwards·
Igor Babuschkin·
Vedant Misra

ABSTRACT

We demonstrate a striking phenomenon in the training dynamics of neural networks on small algorithmic datasets: networks that initially severely overfit the training data can, after continued training for long periods, suddenly undergo a phase transition and perfectly generalize to the validation set. We call this phenomenon 'grokking'. We study this behavior across various binary operations (e.g., modular arithmetic, polynomial evaluation) and analyze how data dataset size and weight decay interact with this phase transition. Our findings challenge standard intuitions about early stopping and overfitting, suggesting that neural networks may discover generalizing representations long after memorization occurs.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful