Topic: Multimodal AI

Short answer

This page shows the most relevant public items for Multimodal AI, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

WeeklyMonthlyAll time

← Back to home

  1. Scaling Laws for Autoregressive Generative Modeling

    PaperOct 28, 2020arXivTom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish

    Building upon previous work establishing scaling laws for language models, we investigate whether similar power-law scaling relationships hold across other data modalities. We train autoregressive ...

  2. Multimodal Neurons in Artificial Neural Networks

    PaperMar 4, 2021DistillGabriel Goh, Nick Cammarata, Chelsea Voss, Shan Carter, Michael Petrov, Ludwig Schubert, Alec Radford, Chris Olah

    We investigate the internal representations of the CLIP model and discover the presence of 'multimodal neurons'. These neurons fire not only for specific visual features (like a spider) but also fo...

  3. Learning Transferable Visual Models From Natural Language Supervision

    PaperFeb 26, 2021arXivAlec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever

    State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories, restricting their generality. We demonstrate that the simple pre-training task of pre...

  4. Perceiver: General Perception with Iterative Attention

    PaperMar 4, 2021arXivAndrew Jaegle, Felix Gimeno, Andrew Brock, Oriol Vinyals, Andrew Zisserman, Joao Carreira

    Biological systems perceive the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, and touch. We introduce the Perceiver, an architecture tha...

  5. Flamingo: a Visual Language Model for Few-Shot Learning

    PaperApr 28, 2022arXivJean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Karen Simonyan

    Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family ...

Related Topics

cs.CV (4)company:openai-research (3)cs.LG (2)lab:deep-mind-ai (2)CLIP (2)