1 Introduction
Quantifying the meaning of concepts has been a long-standing goal of the cogni- tive sciences, with more and more studies emerging in the literature that provide normative ratings across a range of semantic dimensions. However, there have been relatively few investigations into the way socially meaningful dimensions of meaning are represented, for instance how we associate gender with differ- ent concepts [1,2]. Such socio-semantic information has been demonstrated to be important for language processing, production and perception [3], and thus could be important for theories of learning, memory and cognition. We present here the first large-scale study that quantifies the socio-semantics of Czech, by providing normative ratings across 5 dimensions of meaning (age, gender, location, political, valence for a large set of words.
Furthermore, we use this data to explore how these socio-semantic represen- tations might vary across groups of people, specifically by comparing the ratings from female and male participants. The aim of this analysis is to identify whether certain concepts exhibit variation or stability across the two groups, in order to gain a better understanding of which types of concepts are likely to have a shared socio-semantic representation, and which might not. 2 Methods
Our data are taken from the SocioLex dataset [4], an ongoing project where par- ticipants were asked to rate words in relation to how they associate the word's meaning to 5 different dimensions on a 7-point Likert scale - gender (femi- nine/masculine), location (rural/urban), political alignment (liberal/conservative), valence (negative/positive) and age (0-6, 7-17, 18-30, 31-50, 51-65, 66-80, and 81+ years). All dimensions (other than age) had a neutral option at the mid- point. Ratings for 2,700 words were collected (1,603 nouns, 766 adjectives, 331 verbs) for all dimensions. 1,156 participants were recruited from a university wide student database at Charles University, in addition to recruitment via Prolific, with 848 identifying as female (Mage = 21.6 years, SDage = 2.0) and 308 as male (Mage = 22.3 years, SDage = 2.8).
We calculated separate mean ratings for females and males for each of the concepts, ranging from -3 (very masculine/rural/liberal/negative) to 3 (very fem- inine/urban/conservative/positive). For the age dimension, however, the ratings were treated as categorical values, so to compute an aggregated value compa- rable to the mean, we calculated the proportion of participants who selected each of the age categories, resulting in a 7-dimensional dataset. We then applied Principal Components Analysis to that data, which provided us with a principal component that can be interpreted as a numerical estimate for old-young. See https://osf.io/e47u8 for all data and analyses. 3 Analysis
We ran linear regression models predicting the relationship between the mean ratings of each of the words, with a fixed effect of participant gender (female/male) interacting with the ratings from the other predictor dimensions. See Figure 1. Results suggest that the relationships between dimensions are fairly similar for both gender groups, however, not always the same. For instance, concepts asso- ciated with liberal/conservative stances tend to receive more positive/negative ratings from females than from males.
Next, we assessed the magnitude of differences between the females and males ratings for each of the concepts. This was achieved by calculating the difference in ratings for each of the dimensions, providing us with an estimate of which concepts had a stable representation (no difference) and which had varied rep- resentations (large difference), see Table 1. For example, in the dimension of valence, the concept minisuknve [mini-skirt] was rated as more positive by males, whereas feminismus [feminism] was rated as more positive by females. 4 Discussion
The comparisons have shown conceptual representations exhibit variation and stability across gender groups for different dimensions of socio-semantic infor- mation. However, for a deeper understanding of these dynamics, we will aim to explore whether certain super-ordinate semantic categories may be predictive of this variation/stability. This will be achieved by obtaining semantic category tags for each of the words from new participants, e.g. golf RIGHTWARDS ARROW sport, and exploring if these categories cluster together words with large or small differences between the gender groups. Furthermore, we are also interested in exploring whether other socio-demographic groups show patterns of variation and stability in their conceptual representations, for instance by investigating socio-semantic ratings from participants who vary in age.