Charles Explorer logo
🇬🇧

Ptakopět data: the dataset for experiments on outbound translation

Publication

Abstract

The dataset used for the Ptakopět experiment on outbound machine translation. It consists of screenshots of web forms with user queries entered.

The queries are available also in a text form. The dataset comprises two language versions: English and Czech.

Whereas the English version has been fully post-processed (screenshots cropped, queries within the screenshots highlighted, dataset split based on its quality etc.), the Czech version is raw as it was collected by the annotators.