Skip to main content

Table 2 Results of the experiment

From: Measuring the bias of incorrect application of feature selection when using cross-validation in radiomics

  AUC-ROC ΔAUC-ROC P AUC-F1 ΔAUC-F1 P Accuracy ΔAccuracy P
Carvalho2018 (Scheme A) 0.687 0.041 0.33 0.733 0.011 0.791 0.634 − 0.004 0.913
Carvalho2018 (Scheme B) 0.646 0.722 0.637
Hosny2018A (Scheme A) 0.765 0.13 < 0.001 0.781 0.135 0.001 0.689 0.075 0.035
Hosny2018A (Scheme B) 0.636 0.647 0.614
Hosny2018B (Scheme A) 0.855 0.13 < 0.001 0.716 0.293 < 0.001 0.791 0.09 0.001
Hosny2018B (Scheme B) 0.725 0.422 0.701
Hosny2018C (Scheme A) 0.77 0.149 0.005 0.87 0.043 0.212 0.792 0.093 0.019
Hosny2018C (Scheme B) 0.621 0.827 0.699
Ramella2018 (Scheme A) 0.872 0.061 0.147 0.893 0.051 0.21 0.846 0.11 0.024
Ramella2018 (Scheme B) 0.811 0.842 0.736
Toivonen2019 (Scheme A) 1 0.146 0.002 1 0.038 0.015 0.98 0.17 < 0.001
Toivonen2019 (Scheme B) 0.854 0.962 0.81
Keek2020 (Scheme A) 0.765 0.086 0.005 0.714 0.14 0.001 0.725 0.07 0.018
Keek2020 (Scheme B) 0.678 0.575 0.656
Li2020 (Scheme A) 0.972 0.107 0.018 0.984 0.067 0.057 0.922 0.157 0.006
Li2020 (Scheme B) 0.865 0.917 0.765
Park2020 (Scheme A) 0.698 0.067 0.006 0.394 0.061 0.036 0.763 0.005 0.602
Park2020 (Scheme B) 0.631 0.333 0.758
Song2020 (Scheme A) 0.985 0.02 0.002 0.984 0.022 0.007 0.942 0.012 0.334
Song2020 (Scheme B) 0.965 0.962 0.931
  1. AUC-ROC, AUC-F1 and accuracy of the correct and incorrect models for each dataset as well as their differences and significance. The p-values were computed using a bootstrap test with the null hypothesis that the difference is zero. Significant p-values are marked in bold