Skip to main content

Table 2 The performance of four models and Bosniak-2019 classification in external validation datasets

From: Deep learning and radiomic feature-based blending ensemble classifier for malignancy risk prediction in cystic renal lesions

Model

Auc (95% CI)

Acc (95% CI)

Sensitivity

Specificity

p value in Delong test

Train cohort fivefold cross-validation

     

Blending ensemble

0.946 (0.912–0.980)

0.899 (0.898–0.900)

0.893 (0.812–0.974)

0.903 (0.846–0.960)

p < 0.001

Decision tree

0.862 (0.800–0.924)

0.843 (0.841–0.844)

0.750 (0.637–0.863)

0.893 (0.834–0.953)

p = 0.770

lightgbm

0.950 (0.917–0.982)

0.893 (0.892–0.894)

0.946 (0.887–1.000)

0.864 (0.798–0.930)

p < 0.001

xgboost

0.938 (0.899–0.977)

0.906 (0.905–0.907)

0.893 (0.812–0.974)

0.913 (0.858–0.967)

p = 0.010

Bosniak 2019 classification

0.870 (0.823–0.918)

0.843 (0.841–0.844)

0.964 (0.916–1.000)

0.777 (0.696–0.857)

Reference

Test cohort

     

Blending ensemble

0.934 (0.873–0.995)

0.905 (0.902–0.907)

0.900 (0.714–1.000)

0.906 (0.827–0.984)

p < 0.001

Decision tree

0.814 (0.681–0.947)

0.794 (0.789–0.799)

0.800 (0.552–1.000)

0.792 (0.683–0.902)

p = 0.681

lightgbm

0.898 (0.810–0.986)

0.905 (0.902–0.907)

0.800 (0.552–1.000)

0.925 (0.853–0.996)

p = 0.039

xgboost

0.862 (0.731–0.994)

0.841 (0.837–0.845)

0.900 (0.714–1.000)

0.830 (0.729–0.931)

p = 0.294

Bosniak 2019 classification

0.783 (0.716–0.850)

0.635 (0.628–0.642)

1.000 (1.000–1.000)

0.566 (0.433–0.699)

Reference

  1. Auc area under the receiver operating characteristic curve, Acc accuracy score, reference reference in DeLong test, 95% CI 95% confidence interval