Artificial intelligence for radiological paediatric fracture assessment: a systematic review

Insights into Imaging

Table 5 Studies comparing artificial intelligence algorithms versus (or combined with) human reader, organised by publication date

Author, year	Human/AI	Accuracy, % (95% CI)	Sensitivity, % (95% CI)	Specificity, % (95% CI)	TP	FP	FN	TN
England [31]	AI	0.907 (0.843–0.951)	0.909 (0.788–1.000)	0.906 (0.844–0.958)	87	9	3	30
England [31]	PGY5 emergency medicine trainee (non-radiologist)	0.915 (0.852–0.957)	0.848 (0.681–0.949)	0.938 (0.869–0.977)	90	6	5	28
Choi, [17]	AI (Geographical test set)	0.895 (0.817–0.942)	1.000 (0.852–1.000)	0.861 (0.759–0.931)	23	10	0	62
	Summated score of three radiologists (2–7-year experience) from different institution to test dataset	0.975 (0.950–0.988)	0.957 (0.880–0.985)	0.981 (0.953–0.993)	66	4	3	212
	Lowest performing radiologist alone	NS (AUC 0.977 (0.924–0.997))	0.957 (0.781–0.999)	0.972 (0.903–0.997)	NS	NS	NS	NS
	Lowest performing radiologist with AI assistance	NS (AUC 0.993 (0.949–1.000))	1.000 (0.852–1.000)	0.972 (0.903–0.997)	NS	NS	NS	NS
Zhang [35]	AI (Test set—data undefined)	0.920	1.000	0.870	NS	NS	NS	NS
Zhang [35]	Human: paediatric musculoskeletal radiologist	0.89 (0.782–0.949)	1.000 (0.833–1.000)	0.833 (0.681–0.921)	19	6	0	30

95% confidence intervals are omitted where these are not provided in the publication
NS not stated. CI confidence interval. AUC area under the curve, PPV positive predictive value, NPV negative predictive value, TP true positive, FP false positive, FN false negative, TN true negative, PGY postgraduate year