Fact Search and Analysis Tool

Abstract

Fact Search and Analysis Tool (Fact Search shortly) analyzes and compares various types of semantic and keyword document retrieval methods. It is meant to work on news databases, although it can be easily modified to work with related data.

The current implementation is built on the Czech News Agency archive of news articles from 2000 to 2019. We implement classic (keyword search) based on TFIDF [1] as well as state-of-the-art Transformer-like neural networks [2, 3] for the semantic-oriented search.

The later models are trained with fact-checking, i.e., textual claim support/refusal, in mind. The application can be used for the related Question-Answering tasks as well.

Current models are trained using a Czech version of the FEVER [4] Wikipedia fact-checking dataset, which was developed by the CTU team. The follow-up models' training will employ an annotated fact-checking dataset created directly atop of the ČTK data that is presently being collected (the annotation application is closely related to the Fact Search one).

From the user perspective, the Fact Search application allows real-time document search in extensive textual databases, simultaneously comparing multiple search methods. Along with retrieved documents, it gives statistics of the search procedures as well as a statistical description of document distributions.

As an additional part of the output, it also provides prediction explanations at the word or sentence level, which helps assess retrieval model quality. More importantly, it helps users to focus on relevant parts of the retrieved text.

The application further contains an initial version of the classifier module, giving confidence levels of claim veracity w.r.t. the news database. [1] Htut, Phu Mon, Samuel R. Bowman, and Kyunghyun Cho. "Training a ranking function for open-domain question answering." arXiv preprint arXiv:1804.04264 (2018). [2] Chang, Wei-Cheng, et al. "Pre-training tasks for embedding-based large-scale retrieval." arXiv preprint arXiv:2002.03932 (2020). [3] Reimers, Nils, and Iryna Gurevych. "Sentence-BERT: Sentence embeddings using siamese BERT-networks." arXiv preprint arXiv:1908.10084 (2019). [4] Thorne, James, et al. "FEVER: a large-scale dataset for fact extraction and verification." arXiv preprint arXiv:1803.05355 (2018).

Keywords

Natural language processing journalism fact, fact search fact verification claim veracity