Skip to main content

Table 2 Results of the experiment

From: Measuring the bias of incorrect application of feature selection when using cross-validation in radiomics

 

AUC-ROC

ΔAUC-ROC

P

AUC-F1

ΔAUC-F1

P

Accuracy

ΔAccuracy

P

Carvalho2018 (Scheme A)

0.687

0.041

0.33

0.733

0.011

0.791

0.634

− 0.004

0.913

Carvalho2018 (Scheme B)

0.646

0.722

0.637

Hosny2018A (Scheme A)

0.765

0.13

< 0.001

0.781

0.135

0.001

0.689

0.075

0.035

Hosny2018A (Scheme B)

0.636

0.647

0.614

Hosny2018B (Scheme A)

0.855

0.13

< 0.001

0.716

0.293

< 0.001

0.791

0.09

0.001

Hosny2018B (Scheme B)

0.725

0.422

0.701

Hosny2018C (Scheme A)

0.77

0.149

0.005

0.87

0.043

0.212

0.792

0.093

0.019

Hosny2018C (Scheme B)

0.621

0.827

0.699

Ramella2018 (Scheme A)

0.872

0.061

0.147

0.893

0.051

0.21

0.846

0.11

0.024

Ramella2018 (Scheme B)

0.811

0.842

0.736

Toivonen2019 (Scheme A)

1

0.146

0.002

1

0.038

0.015

0.98

0.17

< 0.001

Toivonen2019 (Scheme B)

0.854

0.962

0.81

Keek2020 (Scheme A)

0.765

0.086

0.005

0.714

0.14

0.001

0.725

0.07

0.018

Keek2020 (Scheme B)

0.678

0.575

0.656

Li2020 (Scheme A)

0.972

0.107

0.018

0.984

0.067

0.057

0.922

0.157

0.006

Li2020 (Scheme B)

0.865

0.917

0.765

Park2020 (Scheme A)

0.698

0.067

0.006

0.394

0.061

0.036

0.763

0.005

0.602

Park2020 (Scheme B)

0.631

0.333

0.758

Song2020 (Scheme A)

0.985

0.02

0.002

0.984

0.022

0.007

0.942

0.012

0.334

Song2020 (Scheme B)

0.965

0.962

0.931

  1. AUC-ROC, AUC-F1 and accuracy of the correct and incorrect models for each dataset as well as their differences and significance. The p-values were computed using a bootstrap test with the null hypothesis that the difference is zero. Significant p-values are marked in bold