← Home

Quick answer

AI Summary: Introduces Whisper, an open-source speech recognition model trained on 680,000 hours of weakly supervised data, achieving unprecedented robustness to noise, accents, and multiple languages.

Claim

Robust Speech Recognition via Large-Scale Weak Supervision

Alec Radford·
Jong Wook Kim·
Tao Xu·
Greg Brockman·
Christine McLeavey·
Ilya Sutskever

ABSTRACT

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any fine-tuning. The models exhibit remarkable robustness to accents, background noise, and technical language. Moreover, the architecture allows for seamless transcription in multiple languages, as well as translation from those languages into English. We release our models and inference code to serve as a foundation for further work on robust speech recognition.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Robust Speech Recognition via Large-Scale Weak Supervision.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful
Robust Speech Recognition via Large-Scale Weak Supervision | Attendemia