← Home

Quick answer

AI Summary: Details Image GPT (iGPT), proving that applying a standard, unmodified language-model Transformer to sequences of raw pixels results in powerful, self-supervised visual understanding and image generation.

Claim

Generative Pretraining from Pixels

Mark Chen·
Alec Radford·
Rewon Child·
Jeffrey Wu·
Heewoo Jun·
David Luan·
Ilya Sutskever

ABSTRACT

Inspired by the success of unsupervised representation learning in natural language processing with models like GPT-2, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to autoregressively predict pixels, without incorporating any prior knowledge of the 2D spatial structure of images. Despite operating on low-resolution sequences of pixels, our model, Image GPT (iGPT), discovers highly robust semantic representations. When linear probes or fine-tuning are applied to these learned representations, iGPT achieves state-of-the-art performance on low-data classification benchmarks and generates highly coherent, novel image completions.

Review Snapshot

Explore ratings

4.4
★★★★
5 ratings
5 star
60%
4 star
20%
3 star
20%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Generative Pretraining from Pixels.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful