Charles Explorer logo
🇬🇧

Video Search with CLIP and Interactive Text Query Reformulation

Publication at Faculty of Mathematics and Physics |
2023

Abstract

Nowadays, deep learning based models like CLIP allow simple design of cross-modal video search systems that are able to solve many tasks considered as highly challenging several years ago. In this paper, we analyze a CLIP based search approach that focuses on situations, where users cannot find proper text queries to describe searched video segments.

The approach relies on suggestions of classes for displayed intermediate result sets and thus allows users to realize missing words and ideas to describe video frames. This approach is supported with a preliminary study showing potential of the method.

Based on the results, we extend a respected known-item search system for the Video Browser Showdown, where more challenging visual known-item search tasks are planned.