A machine learning model for clickbait analysis

Publication

Abstract

The result of the functional prototype is a learned machine learning model for clickbait analysis and recognition. The data used to learn the model was obtained by translating the freely available Kaggle Clickbait Dataset, which is a large annotated dataset containing headlines from several US news sites.

The dataset was translated from English to English using the tool DeepL. We had a dataset of 32,000 article headlines with a balanced representation of 2 categories (normal, clickbait). 80 % of the data was used for training, where 20% was used for model evaluation.

The achieved evaluation accuracy is 98.33%. Input: sentence in the article/title - short text, e.g., "Top 10 best Christmas recipes!!!" Output: classification: a value from 0 to 1 that represents how confident the model is that the sentence is clickbait.

Thus, 0.5 is taken as the breakpoint (0.5 > x - standard sentence; 0.5 <= x - clickbait).

Keywords

artificial intelligence natural language processing journalism news clickbait