Charles Explorer logo
🇬🇧

Natural language inference data annotation software

Publication

Abstract

A special annotation software was developed to create unique datasets suitable for learning models to solve the Natural Language Inference problem, while at the same time a custom output dataset was created and made available to the research community. The dataset contains 3097 annotated textual assertions, supplemented by 1247 paragraphs extracted from 665 articles in the CTK archive.

The system operates by selecting relevant sentences within documents, at the paragraph level. Claims are generated based on randomly selected articles and the annotator (with exceptions) is not allowed to use his own knowledge, he has only the knowledge framework, which is the source article and other relevant texts -abstracts of articles referenced from the source article.

The system implements the whole annotation process, which consists of making initial claims, working with the source paragraphs and variations of the claims, working with the knowledge framework, annotating the correctness of others' claims based on the evidence, and any additional claims.