Deep learning and radiomic feature-based blending ensemble classifier for malignancy risk prediction in cystic renal lesions

Insights into Imaging

Table 2 The performance of four models and Bosniak-2019 classification in external validation datasets

Model	Auc (95% CI)	Acc (95% CI)	Sensitivity	Specificity	p value in Delong test
Train cohort fivefold cross-validation
Blending ensemble	0.946 (0.912–0.980)	0.899 (0.898–0.900)	0.893 (0.812–0.974)	0.903 (0.846–0.960)	p < 0.001
Decision tree	0.862 (0.800–0.924)	0.843 (0.841–0.844)	0.750 (0.637–0.863)	0.893 (0.834–0.953)	p = 0.770
lightgbm	0.950 (0.917–0.982)	0.893 (0.892–0.894)	0.946 (0.887–1.000)	0.864 (0.798–0.930)	p < 0.001
xgboost	0.938 (0.899–0.977)	0.906 (0.905–0.907)	0.893 (0.812–0.974)	0.913 (0.858–0.967)	p = 0.010
Bosniak 2019 classification	0.870 (0.823–0.918)	0.843 (0.841–0.844)	0.964 (0.916–1.000)	0.777 (0.696–0.857)	Reference
Test cohort
Blending ensemble	0.934 (0.873–0.995)	0.905 (0.902–0.907)	0.900 (0.714–1.000)	0.906 (0.827–0.984)	p < 0.001
Decision tree	0.814 (0.681–0.947)	0.794 (0.789–0.799)	0.800 (0.552–1.000)	0.792 (0.683–0.902)	p = 0.681
lightgbm	0.898 (0.810–0.986)	0.905 (0.902–0.907)	0.800 (0.552–1.000)	0.925 (0.853–0.996)	p = 0.039
xgboost	0.862 (0.731–0.994)	0.841 (0.837–0.845)	0.900 (0.714–1.000)	0.830 (0.729–0.931)	p = 0.294
Bosniak 2019 classification	0.783 (0.716–0.850)	0.635 (0.628–0.642)	1.000 (1.000–1.000)	0.566 (0.433–0.699)	Reference

Auc area under the receiver operating characteristic curve, Acc accuracy score, reference reference in DeLong test, 95% CI 95% confidence interval