Topic: company:openai-research

Short answer

This page shows the most relevant public items for company:openai-research, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

Weekly Monthly All time

Current week Past week 2 weeks ago

← Back to home

Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
Paper • Jun 10, 2021 • arXiv • Irene Solaiman, Christy Dennison
As language models grow in capability and scale, they increasingly generate outputs that reflect the biases, toxicity, and harmful stereotypes present in their internet-scraped training data. We in...
Scaling Laws for Reward Model Overoptimization
Paper • Oct 19, 2022 • arXiv • Leo Gao, John Schulman, Jacob Hilton
When optimizing a policy against a learned reward model, the policy eventually exploits errors in the reward model, leading to a decline in the true underlying objective. This phenomenon, known as ...
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Paper • Sep 8, 2021 • arXiv • Stephanie Lin, Jacob Hilton, Owain Evans
We propose TruthfulQA, a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including healt...
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Paper • Mar 20, 2017 • arXiv • Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, Pieter Abbeel
Bridging the 'reality gap' between simulated environments and the physical world is a major challenge in robotics. We introduce domain randomization, a simple yet powerful technique for training ne...
Zoom In: An Introduction to Circuits
Paper • Mar 10, 2020 • Distill • Chris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, Shan Carter
Neural networks are generally regarded as opaque black boxes. However, if we zoom in and carefully examine the weights and activations of convolutional neural networks, we find highly interpretable...
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Paper • Jun 7, 2017 • arXiv • Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch
We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inhere...
Hindsight Experience Replay
Paper • Jul 5, 2017 • arXiv • Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba
Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay (HER) which allows sample-efficient lear...
Concrete Problems in AI Safety
Paper • Jun 21, 2016 • arXiv • Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané
Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper, we discuss one such poten...
Diffusion Models Beat GANs on Image Synthesis
Paper • May 11, 2021 • arXiv • Prafulla Dhariwal, Alex Nichol
We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better archi...
Scaling Laws for Autoregressive Generative Modeling
Paper • Oct 28, 2020 • arXiv • Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish
Building upon previous work establishing scaling laws for language models, we investigate whether similar power-law scaling relationships hold across other data modalities. We train autoregressive ...
Asymmetric self-play for automatic goal discovery in robotic manipulation
Paper • Jan 14, 2021 • arXiv • OpenAI Robotics Team
Training robots to solve a wide variety of manipulation tasks typically requires massive amounts of human-engineered reward functions and goal specifications. We introduce a method for automatic go...
Multimodal Neurons in Artificial Neural Networks
Paper • Mar 4, 2021 • Distill • Gabriel Goh, Nick Cammarata, Chelsea Voss, Shan Carter, Michael Petrov, Ludwig Schubert, Alec Radford, Chris Olah
We investigate the internal representations of the CLIP model and discover the presence of 'multimodal neurons'. These neurons fire not only for specific visual features (like a spider) but also fo...
Language models can explain neurons in language models
Paper • May 9, 2023 • OpenAI • Steven Bills, Nick Cammarata, Dan Mossing, Henk Tillman, Leo Gao, Gabriel Goh, Ilya Sutskever, Jan Leike, Jeff Wu, William Saunders
Understanding the internal mechanisms of massive language models is a critical bottleneck for AI safety and alignment. Given the billions of parameters in modern models, manual human inspection of ...
Let's Verify Step by Step
Paper • May 31, 2023 • arXiv • Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe
Large language models often struggle with multi-step logical reasoning, frequently hallucinating incorrect steps that invalidate the final answer. To improve reasoning capabilities, we compare two ...
Generating Long Sequences with Sparse Attention
Paper • Apr 23, 2019 • arXiv • Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever
Transformers are powerful sequence models, but their self-attention mechanism scales quadratically with sequence length, making them computationally prohibitive for long inputs like high-resolution...
Emergent Tool Use From Multi-Agent Autocurricula
Paper • Sep 17, 2019 • arXiv • Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch
We demonstrate that simple multi-agent competition can drive the emergence of highly complex, intelligent behaviors without explicit human design. We train agents using reinforcement learning to pl...
Deep reinforcement learning from human preferences
Paper • Jun 12, 2017 • arXiv • Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei
For many complex real-world tasks, defining a mathematical reward function is difficult, leading to misaligned AI behavior when optimized. We explore a method for solving reinforcement learning tas...
Improving Image Generation with Better Captions
Paper • Oct 19, 2023 • OpenAI • James Betker, Gabriel Goh, Li Jing, Tim Brooks, Jianfeng Wang, Linjie Li, Long Ouyang, Juntang Zhuang, Joyce Lee, Yufei Guo, Wesam Manassra, Prafulla Dhariwal, Casey Chu, Yunxing Jiao, Aditya Ramesh
Current text-to-image models often struggle to faithfully follow detailed or complex prompts, frequently ignoring specific attributes or object relationships. We propose that this issue stems from ...
Shap-E: Generating Conditional 3D Implicit Functions
Paper • May 3, 2023 • arXiv • Heewoo Jun, Alex Nichol
We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of...
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Video
Paper • Jun 23, 2022 • arXiv • Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune
Training agents to perform complex, long-horizon tasks typically requires massive amounts of heavily annotated data or prohibitive amounts of reinforcement learning trial-and-error. We introduce Vi...

Topic: company:openai-research

Short answer

Related Topics