Charles Explorer logo
🇨🇿

Less Is More: Similarity Models for Content-Based Video Retrieval

Publikace na Matematicko-fyzikální fakulta |
2023

Tento text není v aktuálním jazyce dostupný. Zobrazuje se verze "en".Abstrakt

The concept of object-to-object similarity plays a crucial role in interactive content-based video retrieval tools. Similarity (or distance) models are core components of several retrieval concepts, e.g.

Query by Example or relevance feedback. In these scenarios, the common approach is to apply some feature extractor that transforms the object to a vector of features, i.e., positions it into an induced latent space.

The similarity is then based on some distance metric in this space. Historically, feature extractors were mostly based on some color histograms or hand-crafted descriptors such as SIFT, but nowadays state-of-the-art tools mostly rely on some deep learning (DL) approaches.

However, so far there were no systematic study of how suitable are individual feature extractors in the video retrieval domain. Or, in other words, to what extent are human-perceived and model-based similarities concordant.

To fill this gap, we conducted a user study with over 4000 similarity judgements comparing over 20 variants of feature extractors. Results corroborate the dominance of deep learning approaches, but surprisingly favor smaller and simpler DL models instead of larger ones.