Quick answer
AI Summary: Introduces Whisper, an open-source speech recognition model trained on 680,000 hours of weakly supervised data, achieving unprecedented robustness to noise, accents, and multiple languages.
AI Summary: Introduces Whisper, an open-source speech recognition model trained on 680,000 hours of weakly supervised data, achieving unprecedented robustness to noise, accents, and multiple languages.
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any fine-tuning. The models exhibit remarkable robustness to accents, background noise, and technical language. Moreover, the architecture allows for seamless transcription in multiple languages, as well as translation from those languages into English. We release our models and inference code to serve as a foundation for further work on robust speech recognition.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Robust Speech Recognition via Large-Scale Weak Supervision.