Demonstration application of automatic news text generation and news content analysis

Publication

Abstract

The demonstration application contains two thematically different functionalities: tools for automatic text summarization and tools for automatic fact checking. The tools have a web browser interface and are therefore accessible from any environment.

Therefore, they could also be easily deployed for initial verification by the CTK project application partner. The first set of tools in the application are tools for automatic text summarization.

Automatic summarization is the process of truncating a text document to produce a summary with the main points of the original document or multiple input documents. Our tool is implemented as an assistant tool for journalists that: - selects relevant parts of the input articles for summarization; - identifies significant named entities in the text, such as related names, titles, years, etc.; - allows continuous retraining of the system by allowing the journalist to re-prioritize the selected information, mark other text as relevant, or discard text from an already made selection.

In addition to its functionality here, the system is capable of continuous retraining as the journalist works with it. The application analyses clusters of information, and produces output in the form of detected main information contained in the input article or articles.

The user can choose detection hyperparameters such as the method for selecting centroid thresholds, etc. Further, the user can reorder, replace, discard, or add sentences and information.

It preserves links to the original messages, so it can retrieve the context if it needs it. The main output for the user is the material for creating summaries, i.e. selected information from the underlying articles.

The tools for automatic summarization of texts are implemented as a backend, i.e. a server part in python, torch, tensorflow programming language. And the user part, i.e. the frontend is in python + js.

The other part of the tools is the automatic fact checking tools. These are primarily two collaborating tools: the Fact Search for semantic information retrieval and the Fact Check for assessing the plausibility of claims, which works with the help of the Fact Search tool.

This is because the classification of statements alone is not enough without an explanation (black box), what is more important is the search for evidence or supporting or refuting information, which we perform using Fact Search. The input is a statement (claim) - a short text, typically one sentence, e.g. "Miloš Zeman visited the Republic of Korea as president." The output is classifications: confirmed/refuted/lack of information, list of documents (and their parts) needed for classification (evidence), we check against the given document database - in our case the CTK news archive.

The Fact Search tool, after entering a claim and possibly limiting the time window, displays the result of the search by two selected methods and their detailed statistics. In addition, the sentences or words that had the greatest influence on the selection of the relevant passages are highlighted in the searched text blocks.

The resulting interface for the background Fact Check uses the Fact Search tool, assesses the credibility of the claim, and displays supporting information that proves or disproves the claim. The fact checking tools are implemented in the Python 3 programming language.

The user interface is built using the Dash tool. The neural models are implemented using the Hugging Face Transformers, PyTorch and TensorFlow libraries.

For fast evaluation of neural networks we use the Nvidia Tesla V100 accelerator.

Keywords

artificial intelligence natural language processing journalism news summarization fact fact search fact verification claim veracity