We present TerrorCat, a submission to the WMT’12 metrics shared task. TerrorCat uses frequencies of automatically obtained translation error categories as base for pairwise comparison of translation hypotheses, which is in turn used to generate a score for every translation.
The metric shows high overall correlation with human judgements on the system level and more modest results on the level of individual sentences.