← Home

Quick answer

AI Summary: Introduces the massive Kinetics video dataset and the highly influential I3D convolutional architecture, setting the modern benchmark for human action recognition in video AI.

Claim

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Joao Carreira·
Andrew Zisserman

ABSTRACT

Video action recognition is a crucial challenge in computer vision, but progress has been hindered by the lack of large-scale, comprehensive datasets comparable to ImageNet. We introduce the Kinetics Human Action Video dataset, containing 400 human action classes and over 400,000 video clips. Furthermore, we propose a new Two-Stream Inflated 3D ConvNet (I3D) architecture, which successfully leverages 2D pre-training from ImageNet by 'inflating' the 2D filters into 3D spatio-temporal convolutions. I3D achieves state-of-the-art performance on Kinetics, as well as on legacy datasets like HMDB51 and UCF101, establishing a new foundation for video understanding.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful