Quick answer
AI Summary: Documents 'grokking,' a surprising phenomenon where neural networks suddenly learn to generalize perfectly on algorithmic tasks long after they have completely overfitted the training data.
AI Summary: Documents 'grokking,' a surprising phenomenon where neural networks suddenly learn to generalize perfectly on algorithmic tasks long after they have completely overfitted the training data.
We demonstrate a striking phenomenon in the training dynamics of neural networks on small algorithmic datasets: networks that initially severely overfit the training data can, after continued training for long periods, suddenly undergo a phase transition and perfectly generalize to the validation set. We call this phenomenon 'grokking'. We study this behavior across various binary operations (e.g., modular arithmetic, polynomial evaluation) and analyze how data dataset size and weight decay interact with this phase transition. Our findings challenge standard intuitions about early stopping and overfitting, suggesting that neural networks may discover generalizing representations long after memorization occurs.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets.