Deep learning predicts cervical lymph node metastasis in clinically node-negative papillary thyroid carcinoma

Insights into Imaging

Table 2 Performance of the four models and three radiologists according to the test sets

Modality	AUC	Accuracy	Sensitivity	Specificity	PPV	NPV	Kappa value	F1 score
a. Performance metrics of the models and US specialists on the primary internal test set A.
Clinical	0.70 (0.59–0.80)	0.63 (0.52–0.73)	0.62 (0.49–0.74)	0.64 (0.51–0.77)	0.63	0.63	0.26	0.62
BMUS	0.82 (0.74–0.90)	0.77 (0.67–0.85)	0.81 (0.70–0.91)	0.72 (0.60–0.85)	0.75	0.79	0.53	0.78
CDFI	0.77 (0.67–0.86)	0.70 (0.60–0.79)	0.85 (0.74–0.94)	0.55 (0.40–0.68)	0.66	0.79	0.40	0.74
Ensemble	0.86 (0.78–0.94)	0.79 (0.69–0.86)	0.83 (0.72–0.94)	0.74 (0.62–0.87)	0.78	0.82	0.57	0.79
Expert 1	N/A	0.63 (0.52–0.73)	0.62 (0.46–0.75)	0.64 (0.48–0.77)	0.63	0.63	0.26	0.62
Expert 2	N/A	0.55 (0.42–0.67)	0.43 (0.29–0.58)	0.68 (0.53–0.80)	0.57	0.54	0.15	0.49
Expert 3	N/A	0.49 (0.39–0.60)	0.47 (0.32–0.62)	0.51 (0.36–0.66)	0.49	0.49	0.11	0.48
b. Performance metrics of the models and US specialists on the secondary external test set B.
Clinical	0.62 (0.51–0.72)	0.60 (0.49–0.70)	0.66 (0.52–0.80)	0.58 (0.42–0.71)	0.60	0.63	0.24	0.63
BMUS	0.71 (0.61–0.82)	0.66 (0.54–0.75)	0.73 (0.57–0.85)	0.60 (0.44–0.74)	0.64	0.69	0.33	0.68
CDFI	0.72 (0.62–0.83)	0.67 (0.57–0.77)	0.77 (0.64–0.89)	0.58 (0.42–0.71)	0.64	0.72	0.39	0.70
Ensemble	0.77 (0.68–0.87)	0.72 (0.61–0.81)	0.75 (0.61–0.86)	0.69 (0.56–0.82)	0.70	0.74	0.44	0.72
Expert 1	N/A	0.66 (0.54–0.75)	0.67 (0.51–0.80)	0.66 (0.50–0.79)	0.67	0.66	0.33	0.67
Expert 2	N/A	0.58 (0.47–0.70)	0.62 (0.47–0.76)	0.55 (0.39–0.69)	0.58	0.59	0.17	0.60
Expert 3	N/A	0.52 (0.41–0.63)	0.44 (0.30–0.60)	0.59 (0.43–0.73)	0.53	0.51	0.03	0.48