Charles Explorer logo
🇬🇧

Parallel corpus InterCorp seven years later

Publication at Faculty of Arts |
2011

Abstract

The paper presents the architecture and the current state of the parallel corpus InterCorp, including an outline of its recent development and a comparison with other parallel corpora. This is followed by an overview of the data collection procedure that covers text selection criteria, data format, conversion, alignment, lemmatization and tagging.

Among the specific tools, we focus on the on-line alignment editor InterText and the parallel search engine interface Park. Finally, we discuss challenges and prospects of the project.