Computational Models of Morphological Learning

Publikace

Abstrakt

A computational learner needs three things: Data to learn from, a class of representations to acquire, and a way to get from one to the other. Language acquisition is a very particular learning setting that can be defined in terms of the input (the child’s early linguistic experience) and the output (a grammar capable of generating a language very similar to the input).

The input is infamously impoverished. As it relates to morphology, the vast majority of potential forms are never attested in the input, and those that are attested follow an extremely skewed frequency distribution.

Learners nevertheless manage to acquire most details of their native morphologies after only a few years of input. That said, acquisition is not instantaneous nor is it error-free.

Children do make mistakes, and they do so in predictable ways which provide insights into their grammars and learning processes. The most elucidating computational model of morphology learning from the perspective of a linguist is one that learns morphology like a child does, that is, on child-like input and along a child-like developmental path.

This article focuses on clarifying those aspects of morphology acquisition that should go into such an elucidating a computational model. Section 1 describes the input with a focus on child-directed speech corpora and input sparsity.

Section 2 discusses representations with focuses on productivity, developmental paths, and formal learnability. Section 3 surveys the range of learning tasks that guide research in computational linguistics and NLP with special focus on how they relate to the acquisition setting.

The conclusion in Section 4 presents a summary of morphology acquisition as a learning problem with Table 4 highlighting the key takeaways of this article.

Klíčová slova

morphology computational linguistics child language acquisition corpus linguistics child development natural language processing learning theory