Hidden in the Layers: Interpretation of Neural Networks for Natural Language Processing

Publication at Faculty of Mathematics and Physics |

2020

Abstract

In this book, we explore neural-network architectures and models that are used for Natural Language Processing (NLP). We analyze their internal representations (word-embeddings, hidden states, attention mechanism, and contextual embeddings) and review what properties these representations have and what kinds of linguistically interpretable features emerge in them.

We use our own experimental results, as well as the results published by other research teams to present an overview of models and representations and their linguistic properties. In the beginning, we explain the basic concepts of deep learning and its usage in NLP and discuss details of the most prominent neural architectures and models.

Then, we outline the concept of interpretability, different views on it, and introduce basic supervised and unsupervised methods that are used for interpreting trained neural-network models. The next part is devoted to static word embeddings.

We show various methods for embeddings space visualization, compo

Keywords

hidden layers interpretation neural networks natural language processing