Graph-based Dependency Parser Building for Myanmar Language

Publikace

Abstrakt

Examining the relationships between words in a sentence to determine its grammatical structure is known as dependency parsing (DP). Based on this, a sentence is broken down into several components.

The process is based on the concept that every linguistic component of a sentence has a direct relationship to one another. These relationships are called dependencies.

Dependency parsing is one of the key steps in natural language processing (NLP) for several text mining approaches. As the dominant formalism for dependency parsing in recent years, Universal Dependencies (UD) have emerged.

The various UD corpus and dependency parsers are publicly accessible for resource-rich languages. However, there are no publicly available resources for dependency parsing, especially for the low-resource language, Myanmar.

Thus, we manually extended the existing small Myanmar UD corpus (i.e., myPOS UD corpus) as myPOS version 3.0 UD corpus to publish the extended Myanmar UD corpus as the publicly available resource. To evaluate the effects of the extended UD corpus versus the original UD corpus, we utilized the graph-based neural dependency parsing models, namely, jPTDP (joint POS tagging and dependency parsing) and UniParse (universal graph-based parsing), and the evaluation scores are measured in terms of unlabeled and labeled attachment scores: (UAS) and (LAS).

We compared the accuracies of graph-based neural models based on the original and extended UD corpora. The experimental results showed that, compared to the original myPOS UD corpus, the extended myPOS version 3.0 UD corpus enhanced the accuracy of dependency parsing models.

Klíčová slova

universal dependency dependency parsing neural network Myanmar language