Skip to main content

Table 5 Studies comparing artificial intelligence algorithms versus (or combined with) human reader, organised by publication date

From: Artificial intelligence for radiological paediatric fracture assessment: a systematic review

Author, year

Human/AI

Accuracy, % (95% CI)

Sensitivity, % (95% CI)

Specificity, % (95% CI)

TP

FP

FN

TN

England [31]

AI

0.907

(0.843–0.951)

0.909

(0.788–1.000)

0.906

(0.844–0.958)

87

9

3

30

PGY5 emergency medicine trainee (non-radiologist)

0.915

(0.852–0.957)

0.848

(0.681–0.949)

0.938

(0.869–0.977)

90

6

5

28

Choi, [17]

AI (Geographical test set)

0.895

(0.817–0.942)

1.000

(0.852–1.000)

0.861

(0.759–0.931)

23

10

0

62

Summated score of three radiologists (2–7-year experience) from different institution to test dataset

0.975

(0.950–0.988)

0.957

(0.880–0.985)

0.981

(0.953–0.993)

66

4

3

212

Lowest performing radiologist alone

NS

(AUC 0.977 (0.924–0.997))

0.957

(0.781–0.999)

0.972

(0.903–0.997)

NS

NS

NS

NS

Lowest performing radiologist with AI assistance

NS

(AUC 0.993 (0.949–1.000))

1.000

(0.852–1.000)

0.972

(0.903–0.997)

NS

NS

NS

NS

Zhang [35]

AI (Test set—data undefined)

0.920

1.000

0.870

NS

NS

NS

NS

Human: paediatric musculoskeletal radiologist

0.89

(0.782–0.949)

1.000

(0.833–1.000)

0.833

(0.681–0.921)

19

6

0

30

  1. 95% confidence intervals are omitted where these are not provided in the publication
  2. NS not stated. CI confidence interval. AUC area under the curve, PPV positive predictive value, NPV negative predictive value, TP true positive, FP false positive, FN false negative, TN true negative, PGY postgraduate year