Better to be in agreement than in bad company A critical analysis of many kappa-like tests

SILVEIRA, Paulo Sergio Panse; SIQUEIRA, Jose Oliveira

Better to be in agreement than in bad company A critical analysis of many kappa-like tests

Arquivos

art_SILVEIRA_Better_to_be_in_agreement_than_in_bad_2023.PDF (1.73 MB)

Citações na Scopus

6

Tipo de produção

article

Data de publicação

2023

DOI

Scopus

Web of Science

PubMed

Editora

SPRINGER

Autores

SILVEIRA, Paulo Sergio Panse

SIQUEIRA, Jose Oliveira

Citação

BEHAVIOR RESEARCH METHODS, v.55, n.7, p.3326-3347, 2023

Resumo

We assessed several agreement coefficients applied in 2x2 contingency tables, which are commonly applied in research due to dichotomization. Here, we not only studied some specific estimators but also developed a general method for the study of any estimator candidate to be an agreement measurement. This method was developed in open-source R codes and it is available to the researchers. We tested this method by verifying the performance of several traditional estimators over all possible configurations with sizes ranging from 1 to 68 (total of 1,028,789 tables). Cohen's kappa showed handicapped behavior similar to Pearson's r, Yule's Q, and Yule's Y. Scott's pi, and Shankar and Bangdiwala's B seem to better assess situations of disagreement than agreement between raters. Krippendorff's alpha emulates, without any advantage, Scott's pi in cases with nominal variables and two raters. Dice's F1 and McNemar's chi-squared incompletely assess the information of the contingency table, showing the poorest performance among all. We concluded that Cohen's kappa is a measurement of association and McNemar's chi-squared assess neither association nor agreement; the only two authentic agreement estimators are Holley and Guilford's G and Gwet's AC1. The latter two estimators also showed the best performance over the range of table sizes and should be considered as the first choices for agreement measurement in contingency 2x2 tables. All procedures and data were implemented in R and are available to download from Harvard Dataverse https://doi.org/10.7910/DVN/HMYTCK.

Palavras-chave

Agreement coefficient, Contingency table, Categorical data analysis, Inter-rater reliability

URI

https://observatorio.fm.usp.br/handle/OPI/57530

Referências

[Anonymous], 2010, Handbook of Inter-Rater Reliability. Advanced Analytics
Banerjee M, 1999, CAN J STAT, V27, P3, DOI 10.2307/3315487
BENNETT EM, 1954, PUBLIC OPIN QUART, V18, P303
COHEN J, 1968, PSYCHOL BULL, V70, P213, DOI 10.1037/h0026256
COHEN J, 1960, EDUC PSYCHOL MEAS, V20, P37, DOI 10.1177/001316446002000104
CONGER AJ, 1980, PSYCHOL BULL, V88, P322, DOI 10.1037/0033-2909.88.2.322
Cramer H., 1946, Mathematical Methods of Statistics
Deng, 2013, COMMUNICATION YB, V36, P419, DOI 10.1080/23808985.2013.11679142
DICE LR, 1945, ECOLOGY, V26, P297, DOI 10.2307/1932409
EFRON B, 1979, ANN STAT, V7, P1, DOI 10.1214/aos/1176344552
FEINGOLD M, 1992, EDUC PSYCHOL MEAS, V52, P57, DOI 10.1177/001316449205200105
Feng GCC, 2016, METHODOLOGY-EUR, V12, P145, DOI 10.1027/1614-2241/a000120
Feng GC, 2015, METHODOLOGY-EUR, V11, P13, DOI 10.1027/1614-2241/a000086
GREEN SB, 1981, EDUC PSYCHOL MEAS, V41, P1069, DOI 10.1177/001316448104100415
Gwet K.L., 2011, KRIPPENDORFFS ALPHA
Gwet KL, 2008, BRIT J MATH STAT PSY, V61, P29, DOI 10.1348/000711006X126600
HOLLEY JW, 1964, EDUC PSYCHOL MEAS, V24, P749, DOI 10.1177/001316446402400402
Hripcsak G, 2005, J AM MED INFORM ASSN, V12, P296, DOI 10.1197/jamia.M1733
HUBERT L, 1977, BRIT J MATH STAT PSY, V30, P98, DOI 10.1111/j.2044-8317.1977.tb00728.x
Hughes J, 2021, R J, V13, P413
Hyndman RJ, 1996, AM STAT, V50, P120, DOI 10.2307/2684423
JANSON S, 1982, APPL PSYCH MEAS, V6, P111, DOI 10.1177/014662168200600111
King NB, 2012, BMJ-BRIT MED J, V345, DOI 10.1136/bmj.e5774
Kirkwood BR., 2003, Essential Medical Statistics
Konstantinidis M, 2022, SYMMETRY-BASEL, V14, DOI 10.3390/sym14020262
KRIPPEND.K, 1970, EDUC PSYCHOL MEAS, V30, P61, DOI 10.1177/001316447003000105
Krippendorff K., 2011, Computing krippendorff's alpha-reliability
Krippendorff K, 2016, METHODOLOGY-EUR, V12, P139, DOI 10.1027/1614-2241/a000119
Kuppens S, 2011, SOC WORK RES, V35, P185, DOI 10.1093/swr/35.3.185
LIENERT GA, 1972, EDUC PSYCHOL MEAS, V32, P281, DOI 10.1177/001316447203200205
Lu YQ, 2017, COMMUN STAT-THEOR M, V46, P10010, DOI 10.1080/03610926.2016.1228962
Lu YQ, 2010, COMMUN STAT-THEOR M, V39, P3525, DOI 10.1080/03610920903289218
Ludbrook J, 2011, ANZ J SURG, V81, P923, DOI 10.1111/j.1445-2197.2011.05906.x
Manning Christopher D., 2008, Introduction to information retrieval, DOI 10.1017/CBO9780511809071
MATTHEWS BW, 1975, BIOCHIM BIOPHYS ACTA, V405, P442, DOI 10.1016/0005-2795(75)90109-9
McNemar Q, 1947, PSYCHOMETRIKA, V12, P153, DOI 10.1007/BF02295996
R Core Team, 2019, R: A language and environment for statistical computing
Scott WA, 1955, PUBLIC OPIN QUART, V19, P321, DOI 10.1086/266577
Shankar V, 2014, BMC MED RES METHODOL, V14, DOI 10.1186/1471-2288-14-100
SHREINER SC, 1980, INT J ADDICT, V15, P915, DOI 10.3109/10826088009040066
Siegel S, 2003, Non-parametric Statistics for the Behavioural Sciences, V2nd
Sim J, 2005, PHYS THER, V85, P257
SLEIGH A, 1982, T ROY SOC TROP MED H, V76, P403, DOI 10.1016/0035-9203(82)90201-2
Wikipedia, 2021, KRIPP ALPH
Wongpakaran N, 2013, BMC MED RES METHODOL, V13, DOI 10.1186/1471-2288-13-61
Xie Z., 2017, LIFE INT J HLTH LIFE, V3, P1, DOI 10.20319/LIJHLS.2017.32.115
Yule GU, 1912, J R STAT SOC, V75, P579, DOI 10.2307/2340126

Coleções

Artigos e Materiais de Revistas Científicas - FM/MPT
Artigos e Materiais de Revistas Científicas - LIM/01

Página do item completo