Better to be in agreement than in bad company A critical analysis of many kappa-like tests
Carregando...
Citações na Scopus
6
Tipo de produção
article
Data de publicação
2023
Título da Revista
ISSN da Revista
Título do Volume
Editora
SPRINGER
Citação
BEHAVIOR RESEARCH METHODS, v.55, n.7, p.3326-3347, 2023
Resumo
We assessed several agreement coefficients applied in 2x2 contingency tables, which are commonly applied in research due to dichotomization. Here, we not only studied some specific estimators but also developed a general method for the study of any estimator candidate to be an agreement measurement. This method was developed in open-source R codes and it is available to the researchers. We tested this method by verifying the performance of several traditional estimators over all possible configurations with sizes ranging from 1 to 68 (total of 1,028,789 tables). Cohen's kappa showed handicapped behavior similar to Pearson's r, Yule's Q, and Yule's Y. Scott's pi, and Shankar and Bangdiwala's B seem to better assess situations of disagreement than agreement between raters. Krippendorff's alpha emulates, without any advantage, Scott's pi in cases with nominal variables and two raters. Dice's F1 and McNemar's chi-squared incompletely assess the information of the contingency table, showing the poorest performance among all. We concluded that Cohen's kappa is a measurement of association and McNemar's chi-squared assess neither association nor agreement; the only two authentic agreement estimators are Holley and Guilford's G and Gwet's AC1. The latter two estimators also showed the best performance over the range of table sizes and should be considered as the first choices for agreement measurement in contingency 2x2 tables. All procedures and data were implemented in R and are available to download from Harvard Dataverse https://doi.org/10.7910/DVN/HMYTCK.
Palavras-chave
Agreement coefficient, Contingency table, Categorical data analysis, Inter-rater reliability
Referências
- [Anonymous], 2010, Handbook of Inter-Rater Reliability. Advanced Analytics
- Banerjee M, 1999, CAN J STAT, V27, P3, DOI 10.2307/3315487
- BENNETT EM, 1954, PUBLIC OPIN QUART, V18, P303
- COHEN J, 1968, PSYCHOL BULL, V70, P213, DOI 10.1037/h0026256
- COHEN J, 1960, EDUC PSYCHOL MEAS, V20, P37, DOI 10.1177/001316446002000104
- CONGER AJ, 1980, PSYCHOL BULL, V88, P322, DOI 10.1037/0033-2909.88.2.322
- Cramer H., 1946, Mathematical Methods of Statistics
- Deng, 2013, COMMUNICATION YB, V36, P419, DOI 10.1080/23808985.2013.11679142
- DICE LR, 1945, ECOLOGY, V26, P297, DOI 10.2307/1932409
- EFRON B, 1979, ANN STAT, V7, P1, DOI 10.1214/aos/1176344552
- FEINGOLD M, 1992, EDUC PSYCHOL MEAS, V52, P57, DOI 10.1177/001316449205200105
- Feng GCC, 2016, METHODOLOGY-EUR, V12, P145, DOI 10.1027/1614-2241/a000120
- Feng GC, 2015, METHODOLOGY-EUR, V11, P13, DOI 10.1027/1614-2241/a000086
- GREEN SB, 1981, EDUC PSYCHOL MEAS, V41, P1069, DOI 10.1177/001316448104100415
- Gwet K.L., 2011, KRIPPENDORFFS ALPHA
- Gwet KL, 2008, BRIT J MATH STAT PSY, V61, P29, DOI 10.1348/000711006X126600
- HOLLEY JW, 1964, EDUC PSYCHOL MEAS, V24, P749, DOI 10.1177/001316446402400402
- Hripcsak G, 2005, J AM MED INFORM ASSN, V12, P296, DOI 10.1197/jamia.M1733
- HUBERT L, 1977, BRIT J MATH STAT PSY, V30, P98, DOI 10.1111/j.2044-8317.1977.tb00728.x
- Hughes J, 2021, R J, V13, P413
- Hyndman RJ, 1996, AM STAT, V50, P120, DOI 10.2307/2684423
- JANSON S, 1982, APPL PSYCH MEAS, V6, P111, DOI 10.1177/014662168200600111
- King NB, 2012, BMJ-BRIT MED J, V345, DOI 10.1136/bmj.e5774
- Kirkwood BR., 2003, Essential Medical Statistics
- Konstantinidis M, 2022, SYMMETRY-BASEL, V14, DOI 10.3390/sym14020262
- KRIPPEND.K, 1970, EDUC PSYCHOL MEAS, V30, P61, DOI 10.1177/001316447003000105
- Krippendorff K., 2011, Computing krippendorff's alpha-reliability
- Krippendorff K, 2016, METHODOLOGY-EUR, V12, P139, DOI 10.1027/1614-2241/a000119
- Kuppens S, 2011, SOC WORK RES, V35, P185, DOI 10.1093/swr/35.3.185
- LIENERT GA, 1972, EDUC PSYCHOL MEAS, V32, P281, DOI 10.1177/001316447203200205
- Lu YQ, 2017, COMMUN STAT-THEOR M, V46, P10010, DOI 10.1080/03610926.2016.1228962
- Lu YQ, 2010, COMMUN STAT-THEOR M, V39, P3525, DOI 10.1080/03610920903289218
- Ludbrook J, 2011, ANZ J SURG, V81, P923, DOI 10.1111/j.1445-2197.2011.05906.x
- Manning Christopher D., 2008, Introduction to information retrieval, DOI 10.1017/CBO9780511809071
- MATTHEWS BW, 1975, BIOCHIM BIOPHYS ACTA, V405, P442, DOI 10.1016/0005-2795(75)90109-9
- McNemar Q, 1947, PSYCHOMETRIKA, V12, P153, DOI 10.1007/BF02295996
- R Core Team, 2019, R: A language and environment for statistical computing
- Scott WA, 1955, PUBLIC OPIN QUART, V19, P321, DOI 10.1086/266577
- Shankar V, 2014, BMC MED RES METHODOL, V14, DOI 10.1186/1471-2288-14-100
- SHREINER SC, 1980, INT J ADDICT, V15, P915, DOI 10.3109/10826088009040066
- Siegel S, 2003, Non-parametric Statistics for the Behavioural Sciences, V2nd
- Sim J, 2005, PHYS THER, V85, P257
- SLEIGH A, 1982, T ROY SOC TROP MED H, V76, P403, DOI 10.1016/0035-9203(82)90201-2
- Wikipedia, 2021, KRIPP ALPH
- Wongpakaran N, 2013, BMC MED RES METHODOL, V13, DOI 10.1186/1471-2288-13-61
- Xie Z., 2017, LIFE INT J HLTH LIFE, V3, P1, DOI 10.20319/LIJHLS.2017.32.115
- Yule GU, 1912, J R STAT SOC, V75, P579, DOI 10.2307/2340126