Skip to main content

Table 1 Overview of the datasets

From: Measuring the bias of incorrect application of feature selection when using cross-validation in radiomics

Dataset N d Dimensionality (#Samples/#Features) Outcome balance (%) Modality Tumor type DOI
Carvalho2018 [30] 262 117 2.22 59 FDG-PET NSCLC https://doi.org/10.1371/journal.pone.0192859
Hosny2018A (HarvardRT) [31] 293 1004 0.29 54 CT NSCLC https://doi.org/10.1371/journal.pmed.1002711
Hosny2018B (Maastro) [31] 211 1004 0.21 28 CT NSCLC https://doi.org/10.1371/journal.pmed.1002711
Hosny2018C (Moffitt) [31] 183 1004 0.18 73 CT NSCLC https://doi.org/10.1371/journal.pmed.1002711
Ramella2018 [32] 91 242 0.37 55 CT NSCLC https://doi.org/10.1371/journal.pone.0207455
Toivonen2019 [33] 100 7105 0.01 60 MRI Prostate Cancer https://doi.org/10.1371/journal.pone.0217702
Keek2020 [34] 273 1322 0.21 40 CT HNSCC https://doi.org/10.1371/journal.pone.0232639
Li2020 [35] 51 396 0.13 63 MRI Glioma https://doi.org/10.1371/journal.pone.0227703
Park2020 [36] 768 940 0.82 24 US Thyroid Cancer https://doi.org/10.1371/journal.pone.0227315
Song2020 [37] 260 264 0.98 49 MR Prostate Cancer https://doi.org/10.1371/journal.pone.0237587
  1. Overview of all radiomics datasets used. Only publicly available datasets were included to allow for easy reproducibility. N denotes the sample size, while d denotes the number of features (corresponding to the dimension of the data). The outcome balance measures the number of events in the outcome. DOI denotes the identifier of the publication corresponding to the dataset