Charles Explorer logo
🇬🇧

Ten Years of WMT Evaluation Campaigns: Lessons Learnt

Publication at Faculty of Mathematics and Physics |
2016

Abstract

The WMT evaluation campaign (http://www.statmt.org/wmt16) has been run annually since 2006. It is a collection of shared tasks related to machine translation, in which researchers compare their techniques against those of others in the field.

The longest running task in the campaign is the translation task, where participants translate a common test set with their MT systems. In addition to the translation task, we have also included shared tasks on evaluation: both on automatic metrics (since 2008), which compare the reference to the MT system output, and on quality estimation (since 2012), where system output is evaluated without a reference.

An important component of WMT has always been the manual evaluation, wherein human annotators are used to produce the official ranking of the systems in each translation task. This reflects the belief of theWMTorganizers that human judgement should be the ultimate arbiter of MT quality.

Over the years, we have experimented with different methods of improving th