Reproducibility of artificial intelligence models in computed tomography of the head: a quantitative analysis

Insights into Imaging

Table 1 Balancing of training and test sets compared to real world epidemiology

Data set	Mean ± SD
Training sets	0.50 + 0.31
Test sets	0.47 + 0.30
Real word epidemiology	0.22 + 0.28
t test
Training sets/test sets	0.45 (.66)
Training sets/real world epidemiology	3.78 (.0004***)
Test sets/real world epidemiology	3.32 (.002**)

We statistically compared the prevalences of diseases of the training and test sets in the article selection with the respective real world epidemiology. We used a Welch Two sample t-test
*p < .05. **p < .01. ***p < .001; n = 30