Skip to main content

Table 2 Performance of the four models and three radiologists according to the test sets

From: Deep learning predicts cervical lymph node metastasis in clinically node-negative papillary thyroid carcinoma

Modality

AUC

Accuracy

Sensitivity

Specificity

PPV

NPV

Kappa value

F1 score

a. Performance metrics of the models and US specialists on the primary internal test set A.

 Clinical

0.70 (0.59–0.80)

0.63 (0.52–0.73)

0.62 (0.49–0.74)

0.64 (0.51–0.77)

0.63

0.63

0.26

0.62

 BMUS

0.82 (0.74–0.90)

0.77 (0.67–0.85)

0.81 (0.70–0.91)

0.72 (0.60–0.85)

0.75

0.79

0.53

0.78

 CDFI

0.77 (0.67–0.86)

0.70 (0.60–0.79)

0.85 (0.74–0.94)

0.55 (0.40–0.68)

0.66

0.79

0.40

0.74

 Ensemble

0.86 (0.78–0.94)

0.79 (0.69–0.86)

0.83 (0.72–0.94)

0.74 (0.62–0.87)

0.78

0.82

0.57

0.79

 Expert 1

N/A

0.63 (0.52–0.73)

0.62 (0.46–0.75)

0.64 (0.48–0.77)

0.63

0.63

0.26

0.62

 Expert 2

N/A

0.55 (0.42–0.67)

0.43 (0.29–0.58)

0.68 (0.53–0.80)

0.57

0.54

0.15

0.49

 Expert 3

N/A

0.49 (0.39–0.60)

0.47 (0.32–0.62)

0.51 (0.36–0.66)

0.49

0.49

0.11

0.48

b. Performance metrics of the models and US specialists on the secondary external test set B.

 Clinical

0.62 (0.51–0.72)

0.60 (0.49–0.70)

0.66 (0.52–0.80)

0.58 (0.42–0.71)

0.60

0.63

0.24

0.63

 BMUS

0.71 (0.61–0.82)

0.66 (0.54–0.75)

0.73 (0.57–0.85)

0.60 (0.44–0.74)

0.64

0.69

0.33

0.68

 CDFI

0.72 (0.62–0.83)

0.67 (0.57–0.77)

0.77 (0.64–0.89)

0.58 (0.42–0.71)

0.64

0.72

0.39

0.70

 Ensemble

0.77 (0.68–0.87)

0.72 (0.61–0.81)

0.75 (0.61–0.86)

0.69 (0.56–0.82)

0.70

0.74

0.44

0.72

 Expert 1

N/A

0.66 (0.54–0.75)

0.67 (0.51–0.80)

0.66 (0.50–0.79)

0.67

0.66

0.33

0.67

 Expert 2

N/A

0.58 (0.47–0.70)

0.62 (0.47–0.76)

0.55 (0.39–0.69)

0.58

0.59

0.17

0.60

 Expert 3

N/A

0.52 (0.41–0.63)

0.44 (0.30–0.60)

0.59 (0.43–0.73)

0.53

0.51

0.03

0.48