A W2VV++ Case Study with Automated and Interactive Text-to-Video Retrieval

Publication at Faculty of Mathematics and Physics |

2020

Abstract

As reported by respected evaluation campaigns focusing both on automated and interactive video search approaches, deep learning started to dominate the video retrieval area. However, the results are still not satisfactory for many types of search tasks focusing on high recall.

To report on this challenging problem, we present two orthogonal task-based performance studies centered around the state-of-the-art W2VV++ query representation learning model for video retrieval. First, an ablation study is presented to investigate which components of the model are effective in two types of benchmark tasks focusing on high recall.

Second, interactive search scenarios from the Video Browser Showdown are analyzed for two winning prototype systems implementing a selected variant of the model and providing additional querying and visualization components. The analysis of collected logs demonstrates that even with the state-of-the-art text search video retrieval model, it is still auspicious to integrate users into the search process for task types, where high recall is essential.

Keywords

W2VV++ Automated and Interactive Text-to-Video Retrieval