Respondents' self-reports are often employed in international surveys (e.g. PISA, TIMSS) and are frequently used to compare different groups of respondents (based on country, socioeconomic status etc.).
However, there is a serious concern about the comparability of such data, which may be hindered by bias. It occurs if the score differences on the indicator of a construct do not correspond to the differences in the underlying trait or ability (van de Vijver & Tanzer, 2004).
One of the potential sources of scale score distortion is socially desirable responding (Kam, Risavy, & Perunovic, 2015). Socially desirable responding (SDR) is defined as a tendency for some people to self-enhance when describing themselves (Paulhus, Harms, Bruce, & Lysy, 2003).
The overclaiming technique is a novel approach with the potential to improve the cross-cultural comparability of respondents' self-reports of knowledge in diverse fields. Several studies have documented notable differences in reporting behavior between respondents from different countries using different methods.
For example, Chen, Lee, and Stevenson (1995) found differences in response styles between North Americans and East Asians. U.S. students were more likely to use extreme scale points while East Asian students were more likely to use midpoints.
Buckley (2009) analyzed response styles using the PISA 2006 dataset and computed acquiescence, disacquiescence, and extreme response styles and noncontingent responding for 57 countries. Using the anchoring vignette method, He, Buchholz, and Klieme (2017) and Vonkova, Zamarro, DeBerg, and Hitt (2015) showed a substantial heterogeneity in student's perceptions of teacher's classroom management reporting behavior across PISA 2012 countries.
In some countries, students had higher standards for judging teacher behavior and therefore such countries improved their relative position in the ranking of teachers' classroom management skills after adjusting for heterogeneity in reporting behavior. In contrary, students in other countries had lower standards and therefore such countries worsened their relative position after adjustment.
He and van de Vijver (2016) in their analysis of PISA 2012 data focused on the motivation-achievement paradox in the case of Chinese students. They argue that the cultural influence of modesty and self-criticism is imprinted on the scale use preferences as measured by both response styles and overclaiming.
In this paper, we focus on the analysis of respondents' reporting behavior using the overclaiming technique (more about the technique below in Methods/Methodology part). The technique has been applied to the area of students' familiarity with mathematical concepts and was a part of PISA 2012 survey.
Our analysis is done for 64 participating countries/regions. We identify similar patterns of responding in geographically and culturally close country-regions.
We also validate the overclaiming scores using external variables like PISA math test scores, GDP and public expenditure in education. The main research questions are: (1) What are the responding patterns, as identified by the overclaiming technique, in different countries and world regions? (2) What is the external validity of overclaiming scores in cross-country comparison? Method Socially desirable responding is one of the potential sources of scale score distortion (Kam et al., 2015).
Several approaches like social desirability scales (e.g. Balanced Inventory of Desirable Responding), various intrapsychic measures, or criterion discrepancy measures have been proposed to measure SDR.
However, serious concerns have been raised about their validity or utility, for example, the problem researchers have in distinguishing valid personality content in response patterns from responses influenced by desirable responding (Paulhus, 2011; Paulhus et al., 2003). The overclaiming technique is a promising approach with the potential to overcome the disadvantages of previous methods.
The overclaiming technique asks respondents to rate their familiarity with a set of items from a particular field of knowledge (e.g. astronomy, history, literature). Some of the items (usually about 20%), however, do not actually exist (foils).
By using signal detection analysis, the technique allows us to measure respondents' knowledge exaggeration (the overal tendency to report familiarity with both existent and nonexistent items) and accuracy (the ability to discriminate between existent and nonexistent items; Paulhus et al., 2003). In this paper we use the questions on familiarity with mathematical concepts used in PISA 2012 student questionnaire.
It includes the observations of 275 904 students in 64 countries and economies. The question about familiarity with concepts is the following: Thinking about mathematical concepts: how familiar are you with the following terms? The list of concepts then follows in this order: exponential function, divisor, quadratic function, proper number, linear equation, vectors, complex number, rational number, radicals, subjunctive scaling, polygon, declarative fraction, congruent figure, cosine, arithmetic mean, and probability.
The 5-point rating scale for each concept was: 1) never heard of it, 2) heard of it once or twice, 3) heard of it a few times, 4) heard of it often, 5) know it well, understand the concept. The list of concepts included 13 existing mathematical concepts and 3 foils (proper number, subjunctive scaling, declarative fraction).
The foils were created by combining a grammatical term (proper, subjunctive, declarative) with a mathematical term (number, scaling, fraction; OECD, 2014). Expected Outcomes In total, 61.7% of all students report a higher familiarity with existing concepts than non-existing ones, but only 1% of all students achieved the highest possible accuracy, i.e. familiarity with all existing concepts and no familiarity with non-existing concepts, which is basically the "correct" solution.
Interestingly, in comparison with the low percentage of students achieving the highest possible accuracy (1%), 19.5% of all respondents reached the highest possible exaggeration, i.e. they report knowing all the concepts. We also found considerable differences in response patterns among PISA 2012 participating countries.
According to their response patterns we categorized the countries/economies into groups with: a) high accuracy and exaggeration like Macao and Turkey, b) low accuracy and high exaggeration like Indonesia and Albania, c) low accuracy and low exaggeration like Luxembourg and Sweden, and d) high accuracy and low exaggeration like Korea and Finland. Also, there seems to be consistent response patterns in particular world regions.
For example, East Asia (e.g. Chinese Taipei, Japan, Korea) is a consistent region where students tend to be accurate and don't exaggerate.
We investigated the unadjusted familiarity score (familiarity with only existing concepts) and the adjusted familiarity score using the OCT (familiarity with non-existing concepts subtracted from familiarity with existing concepts) and their relationships with external variables: math achievement, GDP, and public expenditure per student. We show that the unadjusted familiarity score correlates negatively with all the external variables (-0.04 with the math score, -0.22 with GDP, -0.39 with PEPS), which is contrary to what would reasonably be expected.
The adjusted score, however, correlates positively with all the external variables (0.68 with math score, 0.10 with GDP, 0.22 with PEPS) indicating the validity of the adjusted familiarity score.