Background Words live social lives, we can index a speaker as belonging to a specific demographic group based on the types of words we use (Schwartz et al., 2013), which can affect how we perceive, produce and process those words (Hay, 2018). But do all words carry the same semantic meaning for all people? If they do not, then is there some underlying systematic structure that clusters together the words that differ, or indeed, the words that have stable representations, across different groups of people. We present here the first large scale investigation that aims to provide exploratory insights into these questions, focusing on the differences between populations of Czech speakers, specifically looking at the socio-demographic variables of binary gender and age.
Methods Our data are taken from the SocioLex dataset (Preininger et al, submitted), an ongoing project that aims to quantify how socio-semantic variables are associated to Czech words. Participants were asked to rate words (using a standard norming procedure, see Warriner et al., 2013) in relation to how they associate the word's meaning to 5 different dimensions - gender (feminine/masculine), location (rural/urban), political alignment (liberal/conservative), valence (negative/positive) and age (0-6, 7-17, 18-30, 31-50, 51-65, 66-80, and 81+ years). In total, 1,448 participants contributed to the ratings of 2,700 Czech words. We calculated the mean rating for each of the words, for each of the 5 dimensions, for each of the following groups of participants: young adults (females and males aged 18-30), old adults (females and males aged over 60), young females (females aged 18-30) and young males (males aged 18-30). See Fig. 1 for a detailed visualisation of the participants socio-demographic profile.
Results We used simple linear regression models predicting the relationship between each of the dimensions, with a fixed effect of either participant age (young/old) or participant gender (female/male). Visualisations of these results can be found in Fig. 2. Next, we wanted to assess the magnitude of difference between the groups ratings for each of the words by calculating the difference in ratings for each of the 5 dimensions. This provided us with an estimate of which words had a stable representation (no difference) and which had dynamic representations (large difference). Using these values we found specific clusters of semantically related words, for example old participants attribute words related to mental states (e.g. deprese [depression]), to older age groups, whilst younger participants attribute the same cluster of words to younger age groups.
Discussion This is the first exploration into how groups of socio-demographically different participants represent the socio-semantics of a large set of Czech words, revealing patterns of stability, as well as variation. Given the dramatic shifts that have taken place in Czech society over the last few decades, we aim to better understand how socio-semantic representations may have shifted as a result of societal changes.