Deep learning based on ultrasound images assists breast lesion diagnosis in China: a multicenter diagnostic study

Insights into Imaging

Table 3 Performance metrics for the DL model versus the prospective BI-RADS assessment and the five radiologists in the comparison set

	AUC (95%CI)	p value	Sensitivity (95%CI)	p value	Specificity (95%CI)	p value	PPV (95%CI)	p value	NPV (95%CI)	p value	ACC (95%CI)	p value
DL	0.924 (0.879–0.957)		89.77 (81.47–95.22)		82.30 (74.00–88.84)		79.80 (72.51–85.54)		91.18 (84.69–95.08)		85.57 (79.94–90.12)
Pro	0.969 (0.934–0.988)	0.0058*	98.86 (93.83–99.97)	0.0078*	53.10 (43.48–62.55)	< 0.0001*	62.14 (57.40–66.67)	0.0036*	98.36 (89.45–99.77)	0.0652	73.13 (66.45–79.13)	0.0005*
R1	0.935 (0.892–0.965)	0.5629	95.46 (88.77–98.75)	0.2266	74.34 (65.27–82.09)	0.0784	74.34 (67.84–79.91)	0.3478	95.46 (88.90–98.22)	0.2454	83.58 (77.72–88.42)	0.5966
R2	0.901 (0.851–0.939)	0.2112	97.73 (92.03–99.72)	0.0391*	52.21 (42.61–61.70)	< 0.0001*	61.43 (56.71–65.94)	0.0025*	96.72 (88.11–99.16)	0.1734	72.14 (65.40–78.22)	0.0002*
R3	0.852 (0.795–0.898)	0.0021*	100 (95.90–100)	0.0039*	21.24 (14.11–29.93)	< 0.0001*	49.72 (47.33–52.11)	< 0.0001*	100	0.1325	55.72 (48.57–62.71)	< 0.0001*
R4	0.795 (0.733–0.849)	< 0.0001*	93.18 (85.75–97.46)	0.5488	46.90 (37.45–56.52)	< 0.0001*	57.75 (53.25–62.12)	0.0004*	89.83 (79.93–95.15)	0.7778	67.16 (60.21–73.61)	< 0.0001*
R5	0.778 (0.714–0.834)	< 0.0001*	97.73 (92.03–99.72)	0.0391*	17.70 (11.16–26.00)	< 0.0001*	48.05 (45.77–50.33)	< 0.0001*	90.91 (70.60–97.66)	0.9682	52.74 (45.59–59.80)	< 0.0001*

p value, comparison diagnostic performance with DL model
DL deep learning, BI-RADS Breast Imaging Reporting and Data System, AUC area under the receiver operating characteristic curve, PPV positive predictive value, NPV negative predictive value, ACC accuracy, CI confidence interval, Pro prospective BI-RADS assessment, R radiologist
*p value shows statistical difference