HI! Welcome to PTVD

A Large-Scale Multimodal Plot-Oriented Story Understanding Dataset

Landing page

What is PTVD ?

About PTVD

Art forms such as movies and television (TV) dramas are reflections of the real world, which have attracted much attention from the multimodal learning community recently. However, existing corpora in this domain share three limitations: i. annotated in a scene-oriented fashion, they ignore the coherence within plots; ii. their text lacks empathy and seldom mentions situational context; iii. their video clips fail to cover long-form relationship due to short duration. To address these fundamental issues, using 1,106 TV drama episodes and 24,875 informative plot-focused sentences written by professionals, with the help of 449 human annotators, we constructed PTVD, the first plot-oriented multimodal dataset in the cinema domain.

What does PTVD have ?

Dataset Composition

What does data look like ?

Data Example
Landing page
Vision
Landing page

Description: PTVD takes screenshots of the video with an interval of one second and provides CLIP-based features.

Text
Landing page

Description: PTVD includes three types of text data: subtitles, plot and bullet screen comment. The subtitle is extracted from video, the plot text is crawlered from the network, and the bullet screen comment is obtained from Tencent video website.

Metadata
Landing page

 

Description: PTVD contains a large amount of metadata from Tencent video website, e.g., categories, tags, popularity, actors and so on.

For features based on the new visual encoder, please contact the author via email or github issue.

What can PTVD Do ?

Tasks
Landing page
CLassification

Giving multiple genre tags for a complete plot clip based on parallel multimodal data.

Landing page
Retrieval

Achieving cross-modal data (e.g., txt-img, txt-vid) matching based on a complete plot.

Download free bootstrap 4 landing page, free boootstrap 4 templates, Download free bootstrap 4.1 landing page, free boootstrap 4.1.1 templates, meyawo Landing page
Generation

Generating corresponding high-quality textual plot description for a given plot video clip.

Download free bootstrap 4 landing page, free boootstrap 4 templates, Download free bootstrap 4.1 landing page, free boootstrap 4.1.1 templates, meyawo Landing page
Others

There are tons of tasks that can be done based on TVD, e.g., plot reordering, plot distinguish, etc.

How to use PTVD?

Baseline
Download free bootstrap 4 landing page, free boootstrap 4 templates, Download free bootstrap 4.1 landing page, free boootstrap 4.1.1 templates, meyawo Landing page
Benchmark
As shown in this figure, aiming to serve as a future baseline, our algorithm first adopts the most popular text encoder, BERT, for text input, and a very promising vision encoder, ViT, for image and video. The vanilla METER does not support video input, so we employ the approach and add a signal converter which uniformly samples four frames from each clip. Then, the encoded features are fed to two separate co-attention fusion modules, which respectively produce embeddings for image-text and video-text multimodal learning. Refer to details and code

Reference

Bibtex
Download free bootstrap 4 landing page, free boootstrap 4 templates, Download free bootstrap 4.1 landing page, free boootstrap 4.1.1 templates, meyawo Landing page

{
        @article{li2023ptvd,
        title={PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas},
        author={Chen Li, Xutan Peng, Teng Wang, Yixiao Ge, Mengyang Liu, Xuyuan Xu, Yexin Wang, Ying Shan},         
        eprint={2306.14644},
        year={2023}
}

How can you communicate?

Contact Us