Charles Explorer logo
🇬🇧

Learning communication patterns for malware discovery in HTTPs data

Publication at Faculty of Mathematics and Physics |
2018

Abstract

Encrypted communication on the Internet using the HTTPs protocol represents a challenging task for network intrusion detection systems. While it significantly helps to preserve users' privacy, it also limits a detection system's ability to understand the traffic and effectively identify malicious activities.

In this work, we propose a method for modeling and representation of encrypted communication from logs of web communication. The idea is based on introducing communication snapshots of individual users' activity that model contextual information of the encrypted requests.

This helps to compensate the information hidden by the encryption. We then propose statistical descriptors of the communication snapshots that can be consumed by various machine learning algorithms for either supervised or unsupervised analysis of the data.

In the experimental evaluation, we show that the presented approach can be used even on a large corpus of network traffic logs as the process of creation of the descriptors can be effectively implemented on a Hadoop cluster.