Charles Explorer logo
🇬🇧

A Pilot Eye-Tracking Study of WMT-Style Ranking Evaluation

Publication at Faculty of Mathematics and Physics |
2016

Abstract

The shared translation task of the Workshop of Statistical Machine Translation (WMT) is one of the key annual events of the field. Participating machine translation systems in WMT translation task are manually evaluated by relatively ranking five candidate translations of a given sentence.

This style of evaluation has been used since 2007 with some discussion on interpreting the collected judgements but virtually no insight into what the annotators are actually doing. The scoring task is relatively cognitively demanding and many scoring strategies are possible, influencing the reliability of the final judgements.

In this paper, we describe our first steps towards explaining the scoring task: we run the scoring under an eye-tracker and monitor what the annotators do. At the current stage, our results are more of a proof-of-concept, testing the feasibility of eye tracking for the analysis of such a complex MT evaluation setup.