On Fusion of Learned and Designed Features for Video Data Analytics

Publikace na Matematicko-fyzikální fakulta |

2021

Abstrakt

Video cameras have become widely used for indoor and outdoor surveillance. Covering more and more public space in cities, the cameras serve various purposes ranging from security to traffic monitoring, urban life, and marketing.

However, with the increasing quantity of utilized cameras and recorded streams, manual video monitoring and analysis becomes too laborious. The goal is to obtain effective and efficient artificial intelligence models to process the video data automatically and produce the desired features for data analytics.

To this end, we propose a framework for real-time video feature extraction that fuses both learned and hand-designed analytical models and is applicable in real-life situations. Nowadays, state-of-the-art models for various computer vision tasks are implemented by deep learning.

However, the exhaustive gathering of labeled training data and the computational complexity of resulting models can often render them impractical. We need to consider the benefits and limitations of each technique and find the synergy between both deep learning and analytical models.

Deep learning methods are more suited for simpler tasks on large volumes of dense data while analytical modeling can be sufficient for processing of sparse data with complex structures. Our framework follows those principles by taking advantage of multiple levels of abstraction.

In a use case, we show how the framework can be set for an advanced video analysis of urban life.

Klíčová slova

video analytics deep learning content-based retrieval feature fusion