← Home

Quick answer

AI Summary: Introduces Sora, a revolutionary diffusion transformer capable of generating minute-long, photorealistic videos that exhibit emergent physics simulation and 3D consistency.

Claim

Sora: Video generation models as world simulators

Tim Brooks·
Bill Peebles·
Connor Holmes·
Will DePue·
Yufei Guo·
Li Jing·
David Schnurr·
Joe Taylor·
Troy Luhman·
Eric Luhman·
Clarence Ng·
Ricky Wang·
Aditya Ramesh

ABSTRACT

We explore the large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of highly variable durations, resolutions, and aspect ratios. We leverage a transformer architecture operating on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a full minute of high-fidelity video. Our findings suggest that scaling video generation models is a promising path towards building general-purpose simulators of the physical world, capturing complex 3D consistency, object permanence, and temporal dynamics.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Sora: Video generation models as world simulators.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful