Topic: cs.CV

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for cs.CV, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

Weekly Monthly All time

← Back to home

ImagineNav++: Prompting VLMs as Embodied Navigator through Scene Imagination
Paper • Jan 8, 2026 • arXiv • Zhang et al., Chen et al.
Visual navigation in home environments often fails because textual planning cannot capture scene geometry. We propose ImagineNav++, which uses a VLM to 'imagine' future viewpoints from candidate ro...
FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation
Paper • Feb 13, 2026 • arXiv • Ruiteng Zhao, Wenshuo Wang, Marcelo H. Ang Jr., Haiyue Zhu
Current VLA models primarily rely on visual feedback, which is insufficient for contact-rich tasks like precision assembly or handling delicate objects. We introduce FD-VLA, a force-distilled frame...
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Paper • Mar 20, 2017 • arXiv • Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, Pieter Abbeel
Bridging the 'reality gap' between simulated environments and the physical world is a major challenge in robotics. We introduce domain randomization, a simple yet powerful technique for training ne...
Zoom In: An Introduction to Circuits
Paper • Mar 10, 2020 • Distill • Chris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, Shan Carter
Neural networks are generally regarded as opaque black boxes. However, if we zoom in and carefully examine the weights and activations of convolutional neural networks, we find highly interpretable...
Diffusion Models Beat GANs on Image Synthesis
Paper • May 11, 2021 • arXiv • Prafulla Dhariwal, Alex Nichol
We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better archi...
Multimodal Neurons in Artificial Neural Networks
Paper • Mar 4, 2021 • Distill • Gabriel Goh, Nick Cammarata, Chelsea Voss, Shan Carter, Michael Petrov, Ludwig Schubert, Alec Radford, Chris Olah
We investigate the internal representations of the CLIP model and discover the presence of 'multimodal neurons'. These neurons fire not only for specific visual features (like a spider) but also fo...
Improving Image Generation with Better Captions
Paper • Oct 19, 2023 • OpenAI • James Betker, Gabriel Goh, Li Jing, Tim Brooks, Jianfeng Wang, Linjie Li, Long Ouyang, Juntang Zhuang, Joyce Lee, Yufei Guo, Wesam Manassra, Prafulla Dhariwal, Casey Chu, Yunxing Jiao, Aditya Ramesh
Current text-to-image models often struggle to faithfully follow detailed or complex prompts, frequently ignoring specific attributes or object relationships. We propose that this issue stems from ...
Shap-E: Generating Conditional 3D Implicit Functions
Paper • May 3, 2023 • arXiv • Heewoo Jun, Alex Nichol
We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of...
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Paper • Dec 20, 2021 • arXiv • Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen
Diffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guiding technique to trade off diversity for fidelity. We explore diffusion models...
Sora: Video generation models as world simulators
Paper • Feb 15, 2024 • OpenAI Technical Report • Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, Clarence Ng, Ricky Wang, Aditya Ramesh
We explore the large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of highly variable durations, resolutio...
Improved Denoising Diffusion Probabilistic Models
Paper • Feb 18, 2021 • arXiv • Alex Nichol, Prafulla Dhariwal
Denoising diffusion probabilistic models (DDPMs) have recently demonstrated high-quality image generation, but they suffer from notoriously slow sampling times and sub-optimal log-likelihoods. We p...
Zero-Shot Text-to-Image Generation
Paper • Feb 24, 2021 • arXiv • Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever
Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. We describe a simple approach for this task based on a transformer that au...
Hierarchical Text-Conditional Image Generation with CLIP Latents
Paper • Apr 13, 2022 • arXiv • Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen
Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a tw...
Learning Transferable Visual Models From Natural Language Supervision
Paper • Feb 26, 2021 • arXiv • Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories, restricting their generality. We demonstrate that the simple pre-training task of pre...
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Paper • May 22, 2017 • arXiv • Joao Carreira, Andrew Zisserman
Video action recognition is a crucial challenge in computer vision, but progress has been hindered by the lack of large-scale, comprehensive datasets comparable to ImageNet. We introduce the Kineti...
Matching Networks for One Shot Learning
Paper • Jun 13, 2016 • arXiv • Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, Daan Wierstra
Deep learning algorithms typically require vast amounts of data to achieve high performance, contrasting sharply with human ability to learn new concepts from a single example. We introduce Matchin...
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Paper • Sep 28, 2018 • arXiv • Andrew Brock, Jeff Donahue, Karen Simonyan
Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train ...
Perceiver: General Perception with Iterative Attention
Paper • Mar 4, 2021 • arXiv • Andrew Jaegle, Felix Gimeno, Andrew Brock, Oriol Vinyals, Andrew Zisserman, Joao Carreira
Biological systems perceive the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, and touch. We introduce the Perceiver, an architecture tha...
Flamingo: a Visual Language Model for Few-Shot Learning
Paper • Apr 28, 2022 • arXiv • Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Karen Simonyan
Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family ...
Generative Pretraining from Pixels
Paper • Jun 17, 2020 • OpenAI • Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever
Inspired by the success of unsupervised representation learning in natural language processing with models like GPT-2, we examine whether similar models can learn useful representations for images....

← PreviousPage 1Next →

FAQ

What does this cs.CV page rank?

It ranks public content for cs.CV using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to cs.CV topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to cs.CV topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in cs.CV?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to cs.CV topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to cs.CV topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Topic: cs.CV

Short answer

ImagineNav++: Prompting VLMs as Embodied Navigator through Scene Imagination

FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

Zoom In: An Introduction to Circuits

Diffusion Models Beat GANs on Image Synthesis

Multimodal Neurons in Artificial Neural Networks

Improving Image Generation with Better Captions

Shap-E: Generating Conditional 3D Implicit Functions

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Sora: Video generation models as world simulators

Improved Denoising Diffusion Probabilistic Models

Zero-Shot Text-to-Image Generation

Hierarchical Text-Conditional Image Generation with CLIP Latents

Learning Transferable Visual Models From Natural Language Supervision

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Matching Networks for One Shot Learning

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Perceiver: General Perception with Iterative Attention

Flamingo: a Visual Language Model for Few-Shot Learning

Generative Pretraining from Pixels

Top Entities In This Topic

Related Topics

FAQ

What does this cs.CV page rank?

How should I use weekly vs monthly vs all-time?

How can I discover organizations active in cs.CV?

Can I follow this topic for updates?