Topic: Transformers

Short answer

This page shows the most relevant public items for Transformers, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

WeeklyMonthlyAll time

← Back to home

  1. Attention Is All You Need

    PaperJun 12, 2017arXivAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

    The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder ...

  2. Generating Long Sequences with Sparse Attention

    PaperApr 23, 2019arXivRewon Child, Scott Gray, Alec Radford, Ilya Sutskever

    Transformers are powerful sequence models, but their self-attention mechanism scales quadratically with sequence length, making them computationally prohibitive for long inputs like high-resolution...

  3. Improving Language Understanding by Generative Pre-Training

    PaperJun 11, 2018OpenAIAlec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever

    Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large un...

  4. RT-1: Robotics Transformer for Real-World Control at Scale

    PaperDec 13, 2022arXivAnthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Google DeepMind

    By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets. We i...

  5. Perceiver: General Perception with Iterative Attention

    PaperMar 4, 2021arXivAndrew Jaegle, Felix Gimeno, Andrew Brock, Oriol Vinyals, Andrew Zisserman, Joao Carreira

    Biological systems perceive the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, and touch. We introduce the Perceiver, an architecture tha...

  6. Generative Pretraining from Pixels

    PaperJun 17, 2020OpenAIMark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever

    Inspired by the success of unsupervised representation learning in natural language processing with models like GPT-2, we examine whether similar models can learn useful representations for images....

Related Topics

company:openai-research (3)cs.LG (2)NLP (2)lab:deep-mind-ai (2)cs.CV (2)