← Home

Quick answer

AI Summary: A foundational AI safety document that formalizes five concrete, empirical research problems for preventing unintended and harmful behavior in advanced reinforcement learning systems.

Claim

Concrete Problems in AI Safety

Dario Amodei·
Chris Olah·
Jacob Steinhardt·
Paul Christiano·
John Schulman·
Dan Mané

ABSTRACT

Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper, we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk in machine learning systems: avoiding negative side effects, avoiding reward hacking, scalable oversight, safe exploration, and robustness to distributional shift. We ground these abstract concepts in concrete, empirical machine learning frameworks.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Concrete Problems in AI Safety.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful