Modality | AUC | Accuracy | Sensitivity | Specificity | PPV | NPV | Kappa value | F1 score |
---|---|---|---|---|---|---|---|---|
a. Performance metrics of the models and US specialists on the primary internal test set A. | ||||||||
 Clinical | 0.70 (0.59–0.80) | 0.63 (0.52–0.73) | 0.62 (0.49–0.74) | 0.64 (0.51–0.77) | 0.63 | 0.63 | 0.26 | 0.62 |
 BMUS | 0.82 (0.74–0.90) | 0.77 (0.67–0.85) | 0.81 (0.70–0.91) | 0.72 (0.60–0.85) | 0.75 | 0.79 | 0.53 | 0.78 |
 CDFI | 0.77 (0.67–0.86) | 0.70 (0.60–0.79) | 0.85 (0.74–0.94) | 0.55 (0.40–0.68) | 0.66 | 0.79 | 0.40 | 0.74 |
 Ensemble | 0.86 (0.78–0.94) | 0.79 (0.69–0.86) | 0.83 (0.72–0.94) | 0.74 (0.62–0.87) | 0.78 | 0.82 | 0.57 | 0.79 |
 Expert 1 | N/A | 0.63 (0.52–0.73) | 0.62 (0.46–0.75) | 0.64 (0.48–0.77) | 0.63 | 0.63 | 0.26 | 0.62 |
 Expert 2 | N/A | 0.55 (0.42–0.67) | 0.43 (0.29–0.58) | 0.68 (0.53–0.80) | 0.57 | 0.54 | 0.15 | 0.49 |
 Expert 3 | N/A | 0.49 (0.39–0.60) | 0.47 (0.32–0.62) | 0.51 (0.36–0.66) | 0.49 | 0.49 | 0.11 | 0.48 |
b. Performance metrics of the models and US specialists on the secondary external test set B. | ||||||||
 Clinical | 0.62 (0.51–0.72) | 0.60 (0.49–0.70) | 0.66 (0.52–0.80) | 0.58 (0.42–0.71) | 0.60 | 0.63 | 0.24 | 0.63 |
 BMUS | 0.71 (0.61–0.82) | 0.66 (0.54–0.75) | 0.73 (0.57–0.85) | 0.60 (0.44–0.74) | 0.64 | 0.69 | 0.33 | 0.68 |
 CDFI | 0.72 (0.62–0.83) | 0.67 (0.57–0.77) | 0.77 (0.64–0.89) | 0.58 (0.42–0.71) | 0.64 | 0.72 | 0.39 | 0.70 |
 Ensemble | 0.77 (0.68–0.87) | 0.72 (0.61–0.81) | 0.75 (0.61–0.86) | 0.69 (0.56–0.82) | 0.70 | 0.74 | 0.44 | 0.72 |
 Expert 1 | N/A | 0.66 (0.54–0.75) | 0.67 (0.51–0.80) | 0.66 (0.50–0.79) | 0.67 | 0.66 | 0.33 | 0.67 |
 Expert 2 | N/A | 0.58 (0.47–0.70) | 0.62 (0.47–0.76) | 0.55 (0.39–0.69) | 0.58 | 0.59 | 0.17 | 0.60 |
 Expert 3 | N/A | 0.52 (0.41–0.63) | 0.44 (0.30–0.60) | 0.59 (0.43–0.73) | 0.53 | 0.51 | 0.03 | 0.48 |