Charles Explorer logo
🇬🇧

Multimodal Abstractive Summarization for Open-Domain Videos

Publication at Faculty of Mathematics and Physics |
2018

Abstract

Multimodal and abstractive summarization of open-domain videos requires summarizing the contents of an entire video in a few short sentences, while fusing information from multiple modalities, in our case video and audio (or text). Different from traditional news summarization, the goal is less to "compress" text information only, but to provide a fluent textual summary of information that has been collected and fused from different source modalities.

In this paper, we introduce the task of abstractive summarization for open-domain videos, we show how a sequence-to-sequence model with hierarchical attention can integrate information from different modalities into a coherent output, and present pilot experiments on the How2 corpus of instructional videos. We also present a new evaluation metric for this task called Content F1 that measures semantic adequacy rather than fluency of the summaries, which is covered by ROUGE and BLEU like metrics.