Quick answer

AI Summary: Details Image GPT (iGPT), proving that applying a standard, unmodified language-model Transformer to sequences of raw pixels results in powerful, self-supervised visual understanding and image generation.

Paper2020-06-17•Source ↗•22 attns453 checkouts

Claim

Generative Pretraining from Pixels

Authors

Discuss with Grok

Mark Chen·

Alec Radford·

Rewon Child·

Jeffrey Wu·

Heewoo Jun·

David Luan·

Ilya Sutskever

ABSTRACT

Inspired by the success of unsupervised representation learning in natural language processing with models like GPT-2, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to autoregressively predict pixels, without incorporating any prior knowledge of the 2D spatial structure of images. Despite operating on low-resolution sequences of pixels, our model, Image GPT (iGPT), discovers highly robust semantic representations. When linear probes or fine-tuning are applied to these learned representations, iGPT achieves state-of-the-art performance on low-data classification benchmarks and generates highly coherent, novel image completions.

#transformers #self-supervised-learning #image-gpt company:openai-research #cs-cv

Review Snapshot

Explore ratings

4.4

★★★★★

5 ratings

5 star

60%

4 star

20%

3 star

20%

2 star

1 star

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Generative Pretraining from Pixels.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.

Post an inquiry

Sort by: Most helpful