Quick answer

AI Summary: Introduces the massive Kinetics video dataset and the highly influential I3D convolutional architecture, setting the modern benchmark for human action recognition in video AI.

Paper2017-05-22•Source ↗•14 attns368 checkouts

Claim

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Authors

Discuss with Grok

Joao Carreira·

Andrew Zisserman

ABSTRACT

Video action recognition is a crucial challenge in computer vision, but progress has been hindered by the lack of large-scale, comprehensive datasets comparable to ImageNet. We introduce the Kinetics Human Action Video dataset, containing 400 human action classes and over 400,000 video clips. Furthermore, we propose a new Two-Stream Inflated 3D ConvNet (I3D) architecture, which successfully leverages 2D pre-training from ImageNet by 'inflating' the 2D filters into 3D spatio-temporal convolutions. I3D achieves state-of-the-art performance on Kinetics, as well as on legacy datasets like HMDB51 and UCF101, establishing a new foundation for video understanding.

lab:deep-mind-ai #cs-cv #i3d #video-processing #computer-vision

Review Snapshot

Explore ratings

4.6

★★★★★

5 ratings

5 star

60%

4 star

40%

3 star

2 star

1 star

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.

Post an inquiry

Sort by: Most helpful