Better to be in agreement than in bad company A critical analysis of many kappa-like tests

Carregando...
Imagem de Miniatura
Citações na Scopus
6
Tipo de produção
article
Data de publicação
2023
Título da Revista
ISSN da Revista
Título do Volume
Editora
SPRINGER
Citação
BEHAVIOR RESEARCH METHODS, v.55, n.7, p.3326-3347, 2023
Projetos de Pesquisa
Unidades Organizacionais
Fascículo
Resumo
We assessed several agreement coefficients applied in 2x2 contingency tables, which are commonly applied in research due to dichotomization. Here, we not only studied some specific estimators but also developed a general method for the study of any estimator candidate to be an agreement measurement. This method was developed in open-source R codes and it is available to the researchers. We tested this method by verifying the performance of several traditional estimators over all possible configurations with sizes ranging from 1 to 68 (total of 1,028,789 tables). Cohen's kappa showed handicapped behavior similar to Pearson's r, Yule's Q, and Yule's Y. Scott's pi, and Shankar and Bangdiwala's B seem to better assess situations of disagreement than agreement between raters. Krippendorff's alpha emulates, without any advantage, Scott's pi in cases with nominal variables and two raters. Dice's F1 and McNemar's chi-squared incompletely assess the information of the contingency table, showing the poorest performance among all. We concluded that Cohen's kappa is a measurement of association and McNemar's chi-squared assess neither association nor agreement; the only two authentic agreement estimators are Holley and Guilford's G and Gwet's AC1. The latter two estimators also showed the best performance over the range of table sizes and should be considered as the first choices for agreement measurement in contingency 2x2 tables. All procedures and data were implemented in R and are available to download from Harvard Dataverse https://doi.org/10.7910/DVN/HMYTCK.
Palavras-chave
Agreement coefficient, Contingency table, Categorical data analysis, Inter-rater reliability
Referências
  1. [Anonymous], 2010, Handbook of Inter-Rater Reliability. Advanced Analytics
  2. Banerjee M, 1999, CAN J STAT, V27, P3, DOI 10.2307/3315487
  3. BENNETT EM, 1954, PUBLIC OPIN QUART, V18, P303
  4. COHEN J, 1968, PSYCHOL BULL, V70, P213, DOI 10.1037/h0026256
  5. COHEN J, 1960, EDUC PSYCHOL MEAS, V20, P37, DOI 10.1177/001316446002000104
  6. CONGER AJ, 1980, PSYCHOL BULL, V88, P322, DOI 10.1037/0033-2909.88.2.322
  7. Cramer H., 1946, Mathematical Methods of Statistics
  8. Deng, 2013, COMMUNICATION YB, V36, P419, DOI 10.1080/23808985.2013.11679142
  9. DICE LR, 1945, ECOLOGY, V26, P297, DOI 10.2307/1932409
  10. EFRON B, 1979, ANN STAT, V7, P1, DOI 10.1214/aos/1176344552
  11. FEINGOLD M, 1992, EDUC PSYCHOL MEAS, V52, P57, DOI 10.1177/001316449205200105
  12. Feng GCC, 2016, METHODOLOGY-EUR, V12, P145, DOI 10.1027/1614-2241/a000120
  13. Feng GC, 2015, METHODOLOGY-EUR, V11, P13, DOI 10.1027/1614-2241/a000086
  14. GREEN SB, 1981, EDUC PSYCHOL MEAS, V41, P1069, DOI 10.1177/001316448104100415
  15. Gwet K.L., 2011, KRIPPENDORFFS ALPHA
  16. Gwet KL, 2008, BRIT J MATH STAT PSY, V61, P29, DOI 10.1348/000711006X126600
  17. HOLLEY JW, 1964, EDUC PSYCHOL MEAS, V24, P749, DOI 10.1177/001316446402400402
  18. Hripcsak G, 2005, J AM MED INFORM ASSN, V12, P296, DOI 10.1197/jamia.M1733
  19. HUBERT L, 1977, BRIT J MATH STAT PSY, V30, P98, DOI 10.1111/j.2044-8317.1977.tb00728.x
  20. Hughes J, 2021, R J, V13, P413
  21. Hyndman RJ, 1996, AM STAT, V50, P120, DOI 10.2307/2684423
  22. JANSON S, 1982, APPL PSYCH MEAS, V6, P111, DOI 10.1177/014662168200600111
  23. King NB, 2012, BMJ-BRIT MED J, V345, DOI 10.1136/bmj.e5774
  24. Kirkwood BR., 2003, Essential Medical Statistics
  25. Konstantinidis M, 2022, SYMMETRY-BASEL, V14, DOI 10.3390/sym14020262
  26. KRIPPEND.K, 1970, EDUC PSYCHOL MEAS, V30, P61, DOI 10.1177/001316447003000105
  27. Krippendorff K., 2011, Computing krippendorff's alpha-reliability
  28. Krippendorff K, 2016, METHODOLOGY-EUR, V12, P139, DOI 10.1027/1614-2241/a000119
  29. Kuppens S, 2011, SOC WORK RES, V35, P185, DOI 10.1093/swr/35.3.185
  30. LIENERT GA, 1972, EDUC PSYCHOL MEAS, V32, P281, DOI 10.1177/001316447203200205
  31. Lu YQ, 2017, COMMUN STAT-THEOR M, V46, P10010, DOI 10.1080/03610926.2016.1228962
  32. Lu YQ, 2010, COMMUN STAT-THEOR M, V39, P3525, DOI 10.1080/03610920903289218
  33. Ludbrook J, 2011, ANZ J SURG, V81, P923, DOI 10.1111/j.1445-2197.2011.05906.x
  34. Manning Christopher D., 2008, Introduction to information retrieval, DOI 10.1017/CBO9780511809071
  35. MATTHEWS BW, 1975, BIOCHIM BIOPHYS ACTA, V405, P442, DOI 10.1016/0005-2795(75)90109-9
  36. McNemar Q, 1947, PSYCHOMETRIKA, V12, P153, DOI 10.1007/BF02295996
  37. R Core Team, 2019, R: A language and environment for statistical computing
  38. Scott WA, 1955, PUBLIC OPIN QUART, V19, P321, DOI 10.1086/266577
  39. Shankar V, 2014, BMC MED RES METHODOL, V14, DOI 10.1186/1471-2288-14-100
  40. SHREINER SC, 1980, INT J ADDICT, V15, P915, DOI 10.3109/10826088009040066
  41. Siegel S, 2003, Non-parametric Statistics for the Behavioural Sciences, V2nd
  42. Sim J, 2005, PHYS THER, V85, P257
  43. SLEIGH A, 1982, T ROY SOC TROP MED H, V76, P403, DOI 10.1016/0035-9203(82)90201-2
  44. Wikipedia, 2021, KRIPP ALPH
  45. Wongpakaran N, 2013, BMC MED RES METHODOL, V13, DOI 10.1186/1471-2288-13-61
  46. Xie Z., 2017, LIFE INT J HLTH LIFE, V3, P1, DOI 10.20319/LIJHLS.2017.32.115
  47. Yule GU, 1912, J R STAT SOC, V75, P579, DOI 10.2307/2340126