Deep learning for differentiation of osteolytic osteosarcoma and giant cell tumor around the knee joint on radiographs: a multicenter study

Insights into Imaging

Table 2 Diagnostic performance of the deep learning model, clinical model, and integrated model in the training set, internal test set, and external test set

Dataset	AUC (95% CI)	Accuracy	Sensitivity	Specificity	p value
Training set (n = 217)
DL model	0.94 (0.90–0.97)	91.2% (198/217)	90.2% (92/102)	92.2% (106/115)	< 0.001
Internal test set (n = 62)
DL model	0.97 (0.90–1.00)	93.5% (58/62)	90.8% (20/22)	95.0% (38/40)	< 0.001
Clinical model	0.77 (0.65–0.87)	82.3% (51/62)	59.1% (13/22)	95.0% (38/40)	< 0.001
Integrated model	0.94 (0.84–0.98)	93.5% (58/62)	86.4% (19/22)	97.5% (39/40)	< 0.001
External test set (n = 54)
DL model	0.97 (0.88–1.00)	92.6% (50/54)	100% (12/12)	90.5% (38/42)	< 0.001
Clinical model	0.64 (0.50–0.76)	79.6% (43/54)	41.7% (5/12)	90.5% (38/42)	0.17
Integrated model	0.88 (0.76–0.95)	90.7% (49/54)	66.7% (8/12)	97.6% (41/42)	< 0.001
Total test set (n = 116)
DL model	0.97 (0.92–1.00)	93.1% (108/116)	94.1% (32/34)	92.7% (76/82)	< 0.001
Clinical model	0.72 (0.63–0.80)	81.0% (94/116)	52.9% (18/34)	92.7% (76/82)	< 0.001
Integrated model	0.91 (0.85–0.96)	92.2% (107/116)	79.4% (27/34)	97.6% (80/82)	< 0.001

p value represents a comparison between the AUC value of the model and chance (AUC = 0.5)
DL Deep learning, integrated model indicates DL model combined with clinical model