Quick answer
AI Summary: Presents the Perceiver, a highly flexible transformer architecture that uses a latent bottleneck to process massive, multi-modal, and arbitrary data inputs without domain-specific engineering.
AI Summary: Presents the Perceiver, a highly flexible transformer architecture that uses a latent bottleneck to process massive, multi-modal, and arbitrary data inputs without domain-specific engineering.
Biological systems perceive the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, and touch. We introduce the Perceiver, an architecture that builds upon Transformers to handle arbitrary configurations of different modalities using a single network. The Perceiver utilizes an asymmetric attention mechanism to iteratively distill high-dimensional inputs (e.g., millions of pixels or audio samples) into a tight, fixed-size latent bottleneck, overcoming the quadratic scaling bottleneck of standard self-attention. This allows the model to achieve state-of-the-art results on ImageNet and AudioSet without relying on modality-specific priors like convolutions.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Perceiver: General Perception with Iterative Attention.