Skip to main content

Table 3 Performance metrics for the DL model versus the prospective BI-RADS assessment and the five radiologists in the comparison set

From: Deep learning based on ultrasound images assists breast lesion diagnosis in China: a multicenter diagnostic study

  AUC (95%CI) p value Sensitivity (95%CI) p value Specificity (95%CI) p value PPV (95%CI) p value NPV (95%CI) p value ACC (95%CI) p value
DL 0.924 (0.879–0.957)   89.77 (81.47–95.22)   82.30 (74.00–88.84)   79.80 (72.51–85.54)   91.18 (84.69–95.08)   85.57 (79.94–90.12)  
Pro 0.969 (0.934–0.988) 0.0058* 98.86 (93.83–99.97) 0.0078* 53.10 (43.48–62.55)  < 0.0001* 62.14 (57.40–66.67) 0.0036* 98.36 (89.45–99.77) 0.0652 73.13 (66.45–79.13) 0.0005*
R1 0.935 (0.892–0.965) 0.5629 95.46 (88.77–98.75) 0.2266 74.34 (65.27–82.09) 0.0784 74.34 (67.84–79.91) 0.3478 95.46 (88.90–98.22) 0.2454 83.58 (77.72–88.42) 0.5966
R2 0.901 (0.851–0.939) 0.2112 97.73 (92.03–99.72) 0.0391* 52.21 (42.61–61.70)  < 0.0001* 61.43 (56.71–65.94) 0.0025* 96.72 (88.11–99.16) 0.1734 72.14 (65.40–78.22) 0.0002*
R3 0.852 (0.795–0.898) 0.0021* 100 (95.90–100) 0.0039* 21.24 (14.11–29.93)  < 0.0001* 49.72 (47.33–52.11)  < 0.0001* 100 0.1325 55.72 (48.57–62.71)  < 0.0001*
R4 0.795 (0.733–0.849)  < 0.0001* 93.18 (85.75–97.46) 0.5488 46.90 (37.45–56.52)  < 0.0001* 57.75 (53.25–62.12) 0.0004* 89.83 (79.93–95.15) 0.7778 67.16 (60.21–73.61)  < 0.0001*
R5 0.778 (0.714–0.834)  < 0.0001* 97.73 (92.03–99.72) 0.0391* 17.70 (11.16–26.00)  < 0.0001* 48.05 (45.77–50.33)  < 0.0001* 90.91 (70.60–97.66) 0.9682 52.74 (45.59–59.80)  < 0.0001*
  1. p value, comparison diagnostic performance with DL model
  2. DL deep learning, BI-RADS Breast Imaging Reporting and Data System, AUC area under the receiver operating characteristic curve, PPV positive predictive value, NPV negative predictive value, ACC accuracy, CI confidence interval, Pro prospective BI-RADS assessment, R radiologist
  3. *p value shows statistical difference