← Home

Quick answer

A short clip of video may contain progression of multiple events and an interesting story line. A human need to capture both the event in every shot and associate them together to understand the story behind it.

Claim

Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

Authors
Mingfei Han·
Linjie Yang·
Xiaojun Chang·
Lina Yao·
Heng Wang

ABSTRACT

A short clip of video may contain progression of multiple events and an interesting story line. A human need to capture both the event in every shot and associate them together to understand the story behind it. In this work, we present a new multi-shot video understanding benchmark Shot2Story with detailed shot-level captions, comprehensive video summaries and question-answering pairs. To facilitate better semantic understanding of videos, we provide captions for both visual signals and human narrations. We design several distinct tasks including single-shot video captioning, multi-shot video summarization, and multi-shot video question answering. Preliminary experiments show some challenges to generate a long and comprehensive video summary for multi-shot videos. Nevertheless, the generated imperfect summaries can already achieve competitive performance on existing video understanding tasks such as video question-answering, promoting an under-explored setting of video understanding with detailed summaries.

Review Snapshot

Explore ratings

0.0
★★★★★
0 ratings
5 star
0%
4 star
0%
3 star
0%
2 star
0%
1 star
0%

Recommendation

0%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful
Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos | Attendemia