Quick answer

AI Summary: Presents the Perceiver, a highly flexible transformer architecture that uses a latent bottleneck to process massive, multi-modal, and arbitrary data inputs without domain-specific engineering.

Paper2021-03-04•Source ↗•20 attns495 checkouts

Claim

Perceiver: General Perception with Iterative Attention

Authors

Discuss with Grok

Andrew Jaegle·

Felix Gimeno·

Andrew Brock·

Oriol Vinyals·

Andrew Zisserman·

Joao Carreira

ABSTRACT

Biological systems perceive the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, and touch. We introduce the Perceiver, an architecture that builds upon Transformers to handle arbitrary configurations of different modalities using a single network. The Perceiver utilizes an asymmetric attention mechanism to iteratively distill high-dimensional inputs (e.g., millions of pixels or audio samples) into a tight, fixed-size latent bottleneck, overcoming the quadratic scaling bottleneck of standard self-attention. This allows the model to achieve state-of-the-art results on ImageNet and AudioSet without relying on modality-specific priors like convolutions.

#transformers #cs-lg lab:deep-mind-ai #cs-cv #multimodal-ai

Review Snapshot

Explore ratings

4.6

★★★★★

5 ratings

5 star

60%

4 star

40%

3 star

2 star

1 star

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Perceiver: General Perception with Iterative Attention.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.

Post an inquiry

Sort by: Most helpful