Quick answer

AI Summary: Presents Sparrow, a dialogue agent that pioneered advanced RLHF alignment techniques and integrated internet search to significantly improve the safety and factual correctness of LLMs.

Paper2022-09-22•Source ↗•40 attns481 checkouts

Claim

Improving alignment of dialogue agents via targeted human judgements

Authors

Discuss with Grok

Amelia Glaese·

Nat McAleese·

Maja Trebacz·

John Aslanides·

Vlad Firoiu·

Geoffrey Irving

ABSTRACT

We present Sparrow, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines. We train our model using reinforcement learning from human feedback (RLHF), where human participants provide judgements on model responses based on a targeted set of rules. Crucially, Sparrow is augmented with the ability to search the internet, and human raters evaluate the accuracy of the model's claims based on the evidence it retrieves. We show that Sparrow breaks new ground in safety and correctness, providing a comprehensive framework for aligning large dialogue models with complex human values.

#ai-alignment lab:deep-mind-ai #sparrow #rlhf #cs-cl

Review Snapshot

Explore ratings

4.4

★★★★★

5 ratings

5 star

40%

4 star

60%

3 star

2 star

1 star

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Improving alignment of dialogue agents via targeted human judgements.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.

Post an inquiry

Sort by: Most helpful