Topic: Process Reward Models

Short answer

This page shows the most relevant public items for Process Reward Models, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

WeeklyMonthlyAll time

← Back to home

  1. Let's Verify Step by Step

    PaperMay 31, 2023arXivHunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe

    Large language models often struggle with multi-step logical reasoning, frequently hallucinating incorrect steps that invalidate the final answer. To improve reasoning capabilities, we compare two ...

Related Topics

AI Alignment (1)cs.LG (1)company:openai-research (1)Reasoning (1)