The dataset used for the Ptakopět experiment on outbound machine translation. It consists of screenshots of web forms with user queries entered.
The queries are available also in a text form. The dataset comprises two language versions: English and Czech.
Whereas the English version has been fully post-processed (screenshots cropped, queries within the screenshots highlighted, dataset split based on its quality etc.), the Czech version is raw as it was collected by the annotators.