Charles Explorer logo
🇬🇧

Searching for Reasons of Transformers’ Success: Memorization vs Generalization

Publication at Faculty of Mathematics and Physics |
2023

Abstract

The Transformer architecture has, since its conception, led to numerous breakthrough advancements in natural language processing. We are interested in finding out whether its success is primarily due to its capacity to learn the various generic language rules, or whether the architecture leverages some memorized constructs without understanding their structure.

We conduct a series of experiments in which we modify the training dataset to prevent the model from memorizing bigrams of words that are needed by the test data. We find out that while such a model performs worse than its unrestricted counterpart, the findings do not indicate that the Transformers' success is solely due to its memorization capacity.

In a small qualitative analysis, we demonstrate that a human translator lacking the necessary terminological knowledge would likely struggle in a similar way.