Quick answer
AI Summary: Introduces the massive Kinetics video dataset and the highly influential I3D convolutional architecture, setting the modern benchmark for human action recognition in video AI.
AI Summary: Introduces the massive Kinetics video dataset and the highly influential I3D convolutional architecture, setting the modern benchmark for human action recognition in video AI.
Video action recognition is a crucial challenge in computer vision, but progress has been hindered by the lack of large-scale, comprehensive datasets comparable to ImageNet. We introduce the Kinetics Human Action Video dataset, containing 400 human action classes and over 400,000 video clips. Furthermore, we propose a new Two-Stream Inflated 3D ConvNet (I3D) architecture, which successfully leverages 2D pre-training from ImageNet by 'inflating' the 2D filters into 3D spatio-temporal convolutions. I3D achieves state-of-the-art performance on Kinetics, as well as on legacy datasets like HMDB51 and UCF101, establishing a new foundation for video understanding.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset.