Topic: Alignment

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for Alignment, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

Weekly Monthly All time

Current month Last month 2 months ago

← Back to home

Concrete Problems in AI Safety
Paper • Jun 21, 2016 • arXiv • Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané
Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper, we discuss one such poten...
Language models can explain neurons in language models
Paper • May 9, 2023 • OpenAI • Steven Bills, Nick Cammarata, Dan Mossing, Henk Tillman, Leo Gao, Gabriel Goh, Ilya Sutskever, Jan Leike, Jeff Wu, William Saunders
Understanding the internal mechanisms of massive language models is a critical bottleneck for AI safety and alignment. Given the billions of parameters in modern models, manual human inspection of ...
Training language models to follow instructions with human feedback
Paper • Mar 4, 2022 • arXiv • Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe
Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not he...

FAQ

What does this Alignment page rank?

It ranks public content for Alignment using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to Alignment topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to Alignment topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in Alignment?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to Alignment topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to Alignment topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Topic: Alignment

Short answer

Concrete Problems in AI Safety

Language models can explain neurons in language models

Training language models to follow instructions with human feedback

Top Entities In This Topic

Related Topics

FAQ

What does this Alignment page rank?

How should I use weekly vs monthly vs all-time?

How can I discover organizations active in Alignment?

Can I follow this topic for updates?