Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Paper • Dec 14, 2023 • arXiv • Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, Jeff Wu
As AI models become increasingly capable, we will eventually face the challenge of superalignment: how can humans supervise AI systems that are much smarter than them? To study this empirically tod...