Charles Explorer logo
🇬🇧

Testing Heterogeneity in Inter-Rater Reliability

Publication at Faculty of Education, Faculty of Arts |
2020

Abstract

Estimating the inter-rater reliability (IRR) is important for assessing and improving the quality of ratings. In some cases, the IRR may differ between groups due to their features.

To test heterogeneity in IRR, the second-order generalized estimating equations (GEE2) and linear mixed-effects models (LME) were already used. Another method capable of estimating the components for IRR is generalized additive models (GAM).

This paper presents a simulation study evaluating the performance of these methods in estimating variance components and in testing heterogeneity in IRR. We consider a wide range of sample sizes and various scenarios leading to heterogenous IRR.

The results show, that while the LME and GAM models perform similarly and yield reliable estimates, the GEE2 models may lead to incorrect results.