Deep learning based on ultrasound images assists breast lesion diagnosis in China: a multicenter diagnostic study

Insights into Imaging

Table 2 Performance metrics for the DL model in the test sets

	AUC (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)	PPV (95% CI)	NPV (95% CI)	ACC	F1	MCC
Internal test set	0.908 (0.879–0.933)	83.23 (76.55–88.65)	83.61 (78.97–87.58)	72.83 (67.33–77.71)	90.43 (86.96–93.04)	83.48 (79.79–86.73)	0.777	0.650
External test sets	0.913 (0.881–0.939)	88.84 (83.72–92.79)	83.77 (77.76–88.70)	85.51 (81.00–89.10)	87.43 (82.48–91.13)	86.40 (82.63–89.61)	0.871	0.728
External test set A	0.908 (0.859–0.945)	88.00 (79.98–93.64)	85.57 (76.97–91.88)	86.28 (79.39–91.12)	87.37 (80.17–92.21)	86.80 (81.26–91.19)	0.871	0.736
External test set B	0.918 (0.871–0.952)	89.62 (82.19–94.71)	81.92 (72.63–89.10)	84.82 (78.34–89.62)	87.50 (79.87–92.51)	86.00 (80.41–90.49)	0.872	0.719

DL deep learning, AUC area under the receiver operating characteristic curve, PPV positive predictive value, NPV negative predictive value, ACC accuracy, MCC Matthews correlation coefficient, CI confidence interval