← Home

Quick answer

AI Summary: Introduces Flamingo, a foundational Visual Language Model that pioneered the ability to perform few-shot learning on interleaved image, video, and text data.

Claim

Flamingo: a Visual Language Model for Few-Shot Learning

Jean-Baptiste Alayrac·
Jeff Donahue·
Pauline Luc·
Antoine Miech·
Iain Barr·
Karen Simonyan

ABSTRACT

Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this capability. Flamingo models incorporate architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endowing them with in-context few-shot learning capabilities. We demonstrate that a single Flamingo model achieves a new state of the art in few-shot learning on a wide array of open-ended vision and language tasks.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Flamingo: a Visual Language Model for Few-Shot Learning.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful
Flamingo: a Visual Language Model for Few-Shot Learning | Attendemia