Skip to main content

Table 1 Overview of the datasets

From: Measuring the bias of incorrect application of feature selection when using cross-validation in radiomics

Dataset

N

d

Dimensionality (#Samples/#Features)

Outcome balance (%)

Modality

Tumor type

DOI

Carvalho2018 [30]

262

117

2.22

59

FDG-PET

NSCLC

https://doi.org/10.1371/journal.pone.0192859

Hosny2018A (HarvardRT) [31]

293

1004

0.29

54

CT

NSCLC

https://doi.org/10.1371/journal.pmed.1002711

Hosny2018B (Maastro) [31]

211

1004

0.21

28

CT

NSCLC

https://doi.org/10.1371/journal.pmed.1002711

Hosny2018C (Moffitt) [31]

183

1004

0.18

73

CT

NSCLC

https://doi.org/10.1371/journal.pmed.1002711

Ramella2018 [32]

91

242

0.37

55

CT

NSCLC

https://doi.org/10.1371/journal.pone.0207455

Toivonen2019 [33]

100

7105

0.01

60

MRI

Prostate Cancer

https://doi.org/10.1371/journal.pone.0217702

Keek2020 [34]

273

1322

0.21

40

CT

HNSCC

https://doi.org/10.1371/journal.pone.0232639

Li2020 [35]

51

396

0.13

63

MRI

Glioma

https://doi.org/10.1371/journal.pone.0227703

Park2020 [36]

768

940

0.82

24

US

Thyroid Cancer

https://doi.org/10.1371/journal.pone.0227315

Song2020 [37]

260

264

0.98

49

MR

Prostate Cancer

https://doi.org/10.1371/journal.pone.0237587

  1. Overview of all radiomics datasets used. Only publicly available datasets were included to allow for easy reproducibility. N denotes the sample size, while d denotes the number of features (corresponding to the dimension of the data). The outcome balance measures the number of events in the outcome. DOI denotes the identifier of the publication corresponding to the dataset