A Large-Scale Multimodal Plot-Oriented Story Understanding Dataset
What is PTVD ?
Art forms such as movies and television (TV) dramas are reflections of the real world, which have attracted much attention from the multimodal learning community recently. However, existing corpora in this domain share three limitations: i. annotated in a scene-oriented fashion, they ignore the coherence within plots; ii. their text lacks empathy and seldom mentions situational context; iii. their video clips fail to cover long-form relationship due to short duration. To address these fundamental issues, using 1,106 TV drama episodes and 24,875 informative plot-focused sentences written by professionals, with the help of 449 human annotators, we constructed PTVD, the first plot-oriented multimodal dataset in the cinema domain.
What does data look like ?
For features based on the new visual encoder, please contact the author via email or github issue.
What can PTVD Do ?
Giving multiple genre tags for a complete plot clip based on parallel multimodal data.
Achieving cross-modal data (e.g., txt-img, txt-vid) matching based on a complete plot.
Generating corresponding high-quality textual plot description for a given plot video clip.
There are tons of tasks that can be done based on TVD, e.g., plot reordering, plot distinguish, etc.
How to use PTVD?
Reference
{ @article{li2023ptvd, title={PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas}, author={Chen Li, Xutan Peng, Teng Wang, Yixiao Ge, Mengyang Liu, Xuyuan Xu, Yexin Wang, Ying Shan}, eprint={2306.14644}, year={2023} }
How can you communicate?