Quick answer
AI Summary: Presents Jukebox, an autoregressive transformer utilizing a multi-scale VQ-VAE to generate highly realistic, minute-long songs with coherent musical structure and synthesized vocals.
AI Summary: Presents Jukebox, an autoregressive transformer utilizing a multi-scale VQ-VAE to generate highly realistic, minute-long songs with coherent musical structure and synthesized vocals.
We introduce Jukebox, a generative model that produces high-fidelity, highly diverse music with singing in the raw audio domain. We model music as a sequence of discrete tokens by using a multi-scale VQ-VAE to compress the extremely high-dimensional raw audio. We then train autoregressive Transformers to generate audio conditioned on artist, genre, and lyrics. The system generates minute-long musical compositions with coherent musical structure, instrumentation, and recognizable vocals. This work demonstrates that massive scale and hierarchical compression can bridge the gap between long-term musical structure and high-frequency audio details.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Jukebox: A Generative Model for Music.