Charles Explorer logo
🇬🇧

Building parallel corpora through social network gaming

Publication at Faculty of Mathematics and Physics |
2012

Abstract

Building training data is labor-intensive and presents a major obstacle to the advancement of Natural Language Processing (NLP) systems. A prime use of NLP technologies has been toward the construction machine translation systems.

The most common form of machine translation systems are phrase based systems that require extensive training data. Building this training data is both expensive and error prone.

Emerging technologies, such as social networks and serious games, offer a unique opportunity to change how we construct training data. These serious games, or games with a purpose, have been constructed for sentence segmentation, image labeling, and co-reference resolution.

These games work on three levels: They provide entertainment to the players, the reinforce information the player might be learning, and they provide data to researchers. Most of these systems while well intended and well developed, have lacked participation.

We present, a set of linguistically based games that aim to construct pa