Topic: cs.CV

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for cs.CV, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

WeeklyMonthlyAll time

← Back to home

  1. Zoom In: An Introduction to Circuits

    PaperMar 10, 2020DistillChris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, Shan Carter

    Neural networks are generally regarded as opaque black boxes. However, if we zoom in and carefully examine the weights and activations of convolutional neural networks, we find highly interpretable...

  2. Diffusion Models Beat GANs on Image Synthesis

    PaperMay 11, 2021arXivPrafulla Dhariwal, Alex Nichol

    We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better archi...

  3. Multimodal Neurons in Artificial Neural Networks

    PaperMar 4, 2021DistillGabriel Goh, Nick Cammarata, Chelsea Voss, Shan Carter, Michael Petrov, Ludwig Schubert, Alec Radford, Chris Olah

    We investigate the internal representations of the CLIP model and discover the presence of 'multimodal neurons'. These neurons fire not only for specific visual features (like a spider) but also fo...

  4. Improving Image Generation with Better Captions

    PaperOct 19, 2023OpenAIJames Betker, Gabriel Goh, Li Jing, Tim Brooks, Jianfeng Wang, Linjie Li, Long Ouyang, Juntang Zhuang, Joyce Lee, Yufei Guo, Wesam Manassra, Prafulla Dhariwal, Casey Chu, Yunxing Jiao, Aditya Ramesh

    Current text-to-image models often struggle to faithfully follow detailed or complex prompts, frequently ignoring specific attributes or object relationships. We propose that this issue stems from ...

  5. Shap-E: Generating Conditional 3D Implicit Functions

    PaperMay 3, 2023arXivHeewoo Jun, Alex Nichol

    We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of...

  6. Sora: Video generation models as world simulators

    PaperFeb 15, 2024OpenAI Technical ReportTim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, Clarence Ng, Ricky Wang, Aditya Ramesh

    We explore the large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of highly variable durations, resolutio...

  7. Improved Denoising Diffusion Probabilistic Models

    PaperFeb 18, 2021arXivAlex Nichol, Prafulla Dhariwal

    Denoising diffusion probabilistic models (DDPMs) have recently demonstrated high-quality image generation, but they suffer from notoriously slow sampling times and sub-optimal log-likelihoods. We p...

  8. Zero-Shot Text-to-Image Generation

    PaperFeb 24, 2021arXivAditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever

    Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. We describe a simple approach for this task based on a transformer that au...

  9. Hierarchical Text-Conditional Image Generation with CLIP Latents

    PaperApr 13, 2022arXivAditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen

    Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a tw...

  10. Learning Transferable Visual Models From Natural Language Supervision

    PaperFeb 26, 2021arXivAlec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever

    State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories, restricting their generality. We demonstrate that the simple pre-training task of pre...

  11. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

    PaperMay 22, 2017arXivJoao Carreira, Andrew Zisserman

    Video action recognition is a crucial challenge in computer vision, but progress has been hindered by the lack of large-scale, comprehensive datasets comparable to ImageNet. We introduce the Kineti...

  12. Matching Networks for One Shot Learning

    PaperJun 13, 2016arXivOriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, Daan Wierstra

    Deep learning algorithms typically require vast amounts of data to achieve high performance, contrasting sharply with human ability to learn new concepts from a single example. We introduce Matchin...

  13. Large Scale GAN Training for High Fidelity Natural Image Synthesis

    PaperSep 28, 2018arXivAndrew Brock, Jeff Donahue, Karen Simonyan

    Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train ...

  14. Perceiver: General Perception with Iterative Attention

    PaperMar 4, 2021arXivAndrew Jaegle, Felix Gimeno, Andrew Brock, Oriol Vinyals, Andrew Zisserman, Joao Carreira

    Biological systems perceive the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, and touch. We introduce the Perceiver, an architecture tha...

  15. Flamingo: a Visual Language Model for Few-Shot Learning

    PaperApr 28, 2022arXivJean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Karen Simonyan

    Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family ...

  16. Generative Pretraining from Pixels

    PaperJun 17, 2020OpenAIMark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever

    Inspired by the success of unsupervised representation learning in natural language processing with models like GPT-2, we examine whether similar models can learn useful representations for images....

← PreviousPage 1Next →

Top Entities In This Topic

Related Topics

FAQ

What does this cs.CV page rank?

It ranks public content for cs.CV using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to cs.CV topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to cs.CV topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in cs.CV?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to cs.CV topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to cs.CV topic page on Attendemia and is written so it still makes sense without reading other sections on the page.