Quick answer

AI Summary: Introduces Sora, a revolutionary diffusion transformer capable of generating minute-long, photorealistic videos that exhibit emergent physics simulation and 3D consistency.

Paper2024-02-15•Source ↗•15 attns410 checkouts

Claim

Sora: Video generation models as world simulators

Authors

Discuss with Grok

Tim Brooks·

Bill Peebles·

Connor Holmes·

Will DePue·

Yufei Guo·

Li Jing·

David Schnurr·

Joe Taylor·

Troy Luhman·

Eric Luhman·

Clarence Ng·

Ricky Wang·

Aditya Ramesh

ABSTRACT

We explore the large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of highly variable durations, resolutions, and aspect ratios. We leverage a transformer architecture operating on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a full minute of high-fidelity video. Our findings suggest that scaling video generation models is a promising path towards building general-purpose simulators of the physical world, capturing complex 3D consistency, object permanence, and temporal dynamics.

#generative-ai/month/202402 #video-generation #generative-ai #sora company:openai-research #cs-cv #generative-ai/year/2024 #generative-ai/from/openai-research

Review Snapshot

Explore ratings

4.6

★★★★★

5 ratings

5 star

60%

4 star

40%

3 star

2 star

1 star

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Sora: Video generation models as world simulators.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.

Post an inquiry

Sort by: Most helpful