This paper presents the results of the premier shared task organized alongside the Confer- ence on Machine Translation (WMT) 2018. Participants were asked to build machine translation systems for any of 7 language pairs in both directions, to be evaluated on a test set of news stories.
The main metric for this task is human judgment of translation quality. This year, we also opened up the task to additional test suites to probe specific aspects of transla- tion.