Skip to main content

Table 2 Diagnostic performance of the deep learning model, clinical model, and integrated model in the training set, internal test set, and external test set

From: Deep learning for differentiation of osteolytic osteosarcoma and giant cell tumor around the knee joint on radiographs: a multicenter study

Dataset

AUC (95% CI)

Accuracy

Sensitivity

Specificity

p value

Training set (n = 217)

 DL model

0.94 (0.90–0.97)

91.2% (198/217)

90.2% (92/102)

92.2% (106/115)

< 0.001

Internal test set (n = 62)

 DL model

0.97 (0.90–1.00)

93.5% (58/62)

90.8% (20/22)

95.0% (38/40)

< 0.001

 Clinical model

0.77 (0.65–0.87)

82.3% (51/62)

59.1% (13/22)

95.0% (38/40)

< 0.001

 Integrated model

0.94 (0.84–0.98)

93.5% (58/62)

86.4% (19/22)

97.5% (39/40)

< 0.001

External test set (n = 54)

 DL model

0.97 (0.88–1.00)

92.6% (50/54)

100% (12/12)

90.5% (38/42)

< 0.001

 Clinical model

0.64 (0.50–0.76)

79.6% (43/54)

41.7% (5/12)

90.5% (38/42)

0.17

 Integrated model

0.88 (0.76–0.95)

90.7% (49/54)

66.7% (8/12)

97.6% (41/42)

< 0.001

Total test set (n = 116)

 DL model

0.97 (0.92–1.00)

93.1% (108/116)

94.1% (32/34)

92.7% (76/82)

< 0.001

 Clinical model

0.72 (0.63–0.80)

81.0% (94/116)

52.9% (18/34)

92.7% (76/82)

< 0.001

 Integrated model

0.91 (0.85–0.96)

92.2% (107/116)

79.4% (27/34)

97.6% (80/82)

< 0.001

  1. p value represents a comparison between the AUC value of the model and chance (AUC = 0.5)
  2. DL Deep learning, integrated model indicates DL model combined with clinical model