An Analysis for Key Indicators of Reproducibility in Radiology

: Background: ​ Given the central role of radiology in patient care, it is important that radiological research is grounded in reproducible science. It remains unexamined whether there is a lack of reproducibility or transparency in radiologic research. Purpose: ​ The purpose of this study was to analyze published radiology literature for the presence or absence of key indicators of reproducibility. Methods: ​ This cross-sectional, retrospective study was performed by conducting a search of the National Library of Medicine to identify publications contained within journals in the field of Radiology. Journals that were not written in English or MEDLINE indexed were excluded from the analysis. Studies published from January 1, 2014 to December 31, 2018 were used to generate a random list of 300 publications for this meta-analysis. A pilot-tested, Google form was used to evaluate key indicators of reproducibility in the queried publications. Results: ​ ​ Our initial search returned 295,543 records, from which 300 were randomly selected for analysis. Of these 300 records, 294 met the inclusion criteria. Among the empirical publications, 5.6% contained a data availability statement (11/195, 95% CI: 3.0-8.3), 0.51% provided clearly documented raw data (1/195), 12.0% provided a materials availability statement (23/191, 8.4-15.7), none provided analysis scripts, 4.1% provided a preregistration statement (8/195, 1.9-6.3), 2.1% provided a protocol statement (4/195, 0.4-3.7), and 3.6% were preregistered (7/195, 1.5-5.7). Conclusion: ​ Our findings demonstrate that key indicators of reproducibility are missing in the field of radiology. Thus, the ability to reproduce radiological studies may be problematic and may have potential clinical implications.


Introduction:
The field of radiology plays a significant role in the diagnosis, monitoring, and treatment of numerous disease processes.The importance of radiology to the field of medicine is evident by the large annual expenditures on imaging, estimated to be 10% of total healthcare costs in the United States.(1) Advancements in imaging modalities and diagnostic testing are predicated upon robust and trustworthy research.Yet, the field of radiology has been known for low-level-evidence study designs, with randomized trials, multicenter studies, and meta-analyses making up the smallest portion of publications (0.8% to 1.5%).( 2) With movement toward patient-centered, evidence-based care, efforts are needed to ensure the robustness and reproducibility of radiology research.
Reproducibility -defined as the ability to conduct an independent replication study and reach the same or similar conclusions as the study in question (3,4) -gained national attention after the majority of 1,500 surveyed scientists reported an inability to reproduce another scientist's experiment and half being unable to reproduce their own experiments.(5) In radiology research, a lack of reproducibility has been partly attributed to imaging datasets that are too small to power significant findings, models lacking independent validation, and improper separation of training and validation data.(6) Such practices may go undetected by editors, peer reviewers, and readers and contribute to downstream effects, such as irreproducible results and perpetuated errors in subsequent studies.
Given the central role of radiology in patient care, reproducible radiology research is necessary.
In this study, we investigated radiology publications for key factors of reproducibility and transparency.
Findings may be used to evaluate the current climate of reproducible research practices in the field and contribute baseline data for future comparison studies.

Materials and Methods:
Our investigation was designed as a cross-sectional, meta-research study to evaluate specific indicators of reproducibility and transparency in radiology.The study methodology is a replication of work done by Hardwick et al., (7) with minor adjustments.Our analysis did not utilize human subjects, thus this investigation was not subject to institutional review board oversight.(8) Guidelines detailed by Murad and Wang were used for the reporting of our meta-research.(9) The Preferred Reporting for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were used as necessary.(10) We supplied all protocols, raw data, and pertinent materials on the Open Science Framework ( https://osf.io/n4yh5/).

Journal and Study Selection
One investigator (DT) used the subject term tag "Radiology[ST]" to search the National Library of Medicine (NLM) catalog on June 5, 2019.To meet the inclusion criteria, journals had to be MEDLINE indexed and written in the English language.DT extracted the electronic ISSN number (or linking ISSN if the electronic version was unavailable) for the included journals.PubMed was searched using the list of ISSN (PubMed contains the MEDLINE collection) on June 5, 2019 to identify publications.Publications from January 1, 2014 to December 31, 2018 were included.A random sample of 300 were selected to have data extracted with additional publications available as needed ( https://osf.io/4hq87/).With the goal of creating a diverse spectrum of publications for our study, restrictions were not placed on specific study types.We divided the publications into two categories: (1) publications with empirical data (e.g., clinical trial, cohort, case series, case reports, case-control, secondary analysis, chart review, commentary [with data analysis], and cross-sectional) and (2) publications without empirical data (e.g., editorials, commentaries [without reanalysis], simulations, news, reviews, and poems).For the sake of this study, imaging protocols with no patients or intervention were considered nonempirical.Different study designs resulted in variation in the data collected from individual publications.We analyzed nonempirical studies for the following characteristics: funding source(s), conflict of interest declarations, open access, and journal impact factor.Case reports and case series are not typically expected to be reproducible with a prespecified protocol.(11) As a result, data were extracted from them in an identical manner as publications that lacked empirical data.There was no expectation for meta-analyses and systematic reviews to contain additional materials, and so, the materials availability indicator was excluded from their analysis.For the purpose of our study, data were synonymous with raw data and considered unaltered data directly collected from an instrument.Investigators were prompted by the data extraction form to identify the presence or absence of necessary prespecified indicators of reproducibility, which is available here: https://osf.io/3nfa5/ .The Google form used in this study included additional options beyond those contained in the form used by Hardwick et al. (7) Our form had additional study designs, such as case series, cohort, secondary analysis, meta-analysis/systematic review, chart review, and cross-sectional.Sources of funding were more specific to include non-profit, public, hospital, university, and private/industry.Following data extraction, all three investigators convened and resolved any discrepancies by consensus.Though unneeded, a third party was readily available for adjudication.

Assessing Open Access
We systematically evaluated the accessibility of a full-text version of each publication.The Open Access Button ( https://openaccessbutton.org/ ) was used to perform a search using publication title, DOI, and/or PubMed ID.If this search failed to provide an article, then Google Scholar and PubMed were searched using these parameters.If an investigator was still unable to locate a full-text publication, it was deemed inaccessible.

Evaluation of Replication and Citation in Research Synthesis
Web of Science was searched for all studies containing empirical data.Once located on the Web of Science, we searched for the following: (1) the number of times a publication was used as part of a subsequent replication study and (2) the number of times a publication was cited in a systematic review/meta-analysis. Titles, abstracts and full-text manuscripts available on the Web of Science were used to analyze if a study was cited in a systematic review/meta-analysis or a replication study.

Statistical Analysis
Statistics from each category of our analysis were calculated using Microsoft Excel.Excel functions were used to provide quantitative analysis with results characterized by frequencies, percentages, and 95% confidence intervals.

Journal and Publication Selection
The NLM catalog search identified 144 radiology journals, but only 64 met the inclusion criteria.
Our PubMed search for included publications included in these journals identified 295,543 radiology publications with 53,328 being published within the time frame.We randomly sampled 300, but 6 publications were inaccessible, even with institutional access.Of the eligible publications, 215 contained empirical data, and 79 did not (Figure 1).Publications without empirical data were excluded from certain analyses because they could not be assessed for reproducibility characteristics.Furthermore, 20 publications were identified as either case studies or case series; these research designs are unable to be replicated and were excluded from the analysis of study characteristics.Study reproducibility characteristics were analyzed for the remaining 195 radiology publications (Table 1).

Sample Characteristics
From our sample of 294 radiology publications, the publishing journals had a median 5-year impact factor of 2.824 (interquartile range:  4.Among the 195 publications containing empirical data, none provided analysis scripts for reproducing statistical results (0/195, 0%).None of the publications reported being a replication (0/195, 0%).Most publications were not cited in systematic review or meta-analysis (194/211, 91.9%), whereas a few publications were cited in either a single publication (17/211, 8.1%) or between one and five publications (8/211, 3.8%).None of the publications were cited in replication studies (0/211, 0%).

Discussion:
Our cross-sectional investigation found that key transparency and reproducibility-related factors were rare or entirely absent among our sample of publications in the field of radiology.No analyzed publications reported an analysis script, a minority provided access to materials, few were preregistered, and only one provided raw data.While concerning, our findings are similar to those found in the social science and biomedical literature.(7,12,13) Here, we discuss two of the main findings from our study.
One factor that is important for reproducible research is the availability of all raw data.In radiology, clinical data and research data, when shared, are often stored and processed in online A second factor lacking in radiology literature was access to detailed analysis scripts.In radiology research, many analytic decisions exist regarding data management, radiomic identification, biomarker identification with validation, and sharing necessary items (data, coding, statistics, and protocol).(23)(24)(25)(26) Radiomic study findings cannot be accurately reproduced or validated without an exact description of the steps taken to measure and analyze the data points.A systematic review of 41 radiomic studies found that 16 failed to report detailed software information, two failed to provide image acquisition settings, and eight lacked detailed descriptions about any preprocessing modifications.These three methodological areas are important in radiological imaging studies as they can alter results significantly, thus decreasing the reproducibility of study findings.( 27) A recent study by Carp et al. further tested possible variations in radiology image analysis by using a combination of five preprocessing and five modeling decisions for data acquisition in functional magnetic resonance imaging.This modification in data collection created almost 7,000 unique analytical pathways with varying results.(28) For research findings to be reproducible, a detailed analysis script with explicit software information and methodological decision making is necessary, this could be accomplished through public repositories and "containers" that replicate the study calculations in real time and are applicable to other data sets.(29) Investigators should be encouraged to take notes of analysis coding and scripts so as to be able to create a detailed explanation to be published with the study.(30)(31)(32) Providing such analysis scripts could improve the reproducibility of study outcomes and serve as a guide for future radiology publications to follow.( 26)

Implications Moving Forward
Our sample indicates that there is room for improvement in reporting reproducibility-related factors in radiology research.Ninety percent of scientists agree that science is currently experiencing a "reproducibility crisis," with "more robust experimental designs and better statistics" seen as possible solutions.(5) To improve reporting of experimental designs, we encourage authors to follow available reporting guidelines.(33) (34,35) In radiology, reliability and agreement studies are prevalent as interobserver agreement between radiologists is measured to identify the potential for errors in treatments or diagnostic imaging.(36) The Guidelines for Reporting Reliability and Agreement Studies (GRRAS) is a 15-item checklist required for study findings to be accurately interpreted and reproduced in reliability and agreement studies (37) .Items such as sample selection, study design, and statistical analysis are often omitted by authors.(36,38,39) Researchers have had success with using the GRRAS, specifically, but reporting guidelines, in general, provide the framework for studies to be understood by readers, reproduced by researchers, used by doctors, and included in systematic reviews.(38,40,41)

Better Statistics
The current reproducibility crisis is, in part, tied to poor statistics.In psychology, a large-scale reproducibility study found that only 39 of 100 original psychology studies could be successfully replicated.(42) In response, the Association for Psychological Science has pioneered several innovations, such as statistical verification programs, and statistical advisors, to provide expertise on manuscripts with sophisticated statistics or methodological techniques, and to promote reproducibility within psychology.(43) Based on our findings, it is possible that radiology may be experiencing similar transparency and reproducibility problems and should consider promoting improved statistical practices by using a statistician to assist in the review process.StatReviewer -an automated review of statistical tests and appropriate reporting -may aid peer-reviewers who are not formally trained in detecting relevant statistical errors or detailed methodological errors.(44)

Strengths and Limitations
Regarding the strengths of this study, we randomly sampled a large selection of radiology journals.Our double data extraction methodology was performed in similar fashion to systematic reviews by following the Cochrane Handbook .(48) Complete raw data and all relevant study materials are provided online to ensure transparency and reproducibility.Regarding its limitations, our analysis included only 300 of the 53,328 returned publications in the radiology literature; thus, our results may not be generalizable to publications in time periods outside of our search or other medical specialties.Our study focused on analyzing the transparency and reproducibility of the published literature in radiology, and as such, we relied solely on information reported within the publications.Therefore, it cannot be assumed that reproducibility-related factors are not available upon request from the author.Had we contacted the corresponding authors of the 300 analyzed publications, it is plausible we could have obtained more information.

Conclusion
With the potential lack of transparency and reproducibility practices in radiology, opportunities exist to improve radiology research.Our results indicate important factors for reproducibility and transparency are frequently missing in radiology publications, leaving room for improvement.Methods to improve reproducibility and transparency are practical and applicable to many research designs.Replication studies provide validation to previously done publications by determining whether similar outcomes can be acquired.† Excludes publications that have no empirical data.‡ Excludes case studies and case series ¶ Excludes meta-analyses and systematic reviews, which materials may not be relevant, in addition to ‡.
Three investigators (BW, NV, and JN) underwent rigorous, in-person training led by DT on data extraction and study methodology to ensure reliability between investigators.Training included a review of the following: objectives of the study, design of the study, protocol, Google form used for data extraction, and the process of extracting data.The process for extracting data was demonstrated via the use of two example publications.All investigators who underwent training independently conducted a blinded, duplicate extraction of data from two further example publications.Once the mock data extraction was completed, the investigators (BW, NV, and JN) convened and resolved any discrepancies.The entire training session was recorded from the presenters point of view and was posted online for investigators to reference ( https://osf.io/tf7nw/).Data Extraction Once all required training was completed, data was extracted from publications.Data extraction began on June 9, 2019 and was completed on June 20, 2019.One investigator (BW) performed data extraction on 300 publications with the other two investigators (NV and JN) extracting from 150 each.
repositories.Picture archives and communication systems allow researchers to observe details of image acquisition, patient positioning, image depth, and bit depth of an image; but proprietary formats, large file sizes, and protected health information can complicate sharing of radiology data.(14) However, picture archiving and communication systems and software prototypes that combine the benefits of clinical and research designs have been developed for the purpose of storing, retrieving and sharing functional imaging data.(15)Additionally, the rarity of data sharing may also be due to an absence of specific recommendations from radiology journals, with a survey of 18 general imaging journals finding only one with policies requesting data at submission.(16) By improving data sharing in radiology, others have the ability to critically assess the trustworthiness of data analysis and the interpretation of results.(17)With more than 30 radiology and imaging journals being listed as International Committee of Medical Journal Editors (ICMJE) members, the ICMJE could have a substantial influence by enforcing their data sharing policy and commending the dissemination of research results and datasets.(18)(19)(20)(21)A recent survey by the European Society of Radiology research committee found that 98% of respondents would be interested in sharing data, yet only 23 institutions (34%) had previously shared clinical trial data.From the data shared by these 23 institutions, at least 44 additional original works have been published.(22) 1.765-3.718).Study designs of sampled publications are presented in Table 2. Most publications originated from the United States (209/294, 71.1%), followed by Nearly half of the eligible publications required paywall access (149/300, 49.7%).Open access publications were found via the Open Access Button 26.7% of the time (80/300) or via Google Scholar or Pubmed 23.7% of the time (71/300).More than half of the publications failed to provide a funding statement (176/294, 59.9%).Public funding accounted for 16% of analyzed publications (47/294).Authors reported having no conflicts of interest (COI) in the majority of publications (No COI: 156/294, 53.1% v. COI: 38/294, 12.9%).No COI statement was provided 34.0% of the time (100/294).Data availability was reported in 11 publications (11/195, 5.6%), but only nine had accessible data (9/11, 81.8%).Complete, raw data was located in 0.51% of empirical publications (1/195).A materials availability statement was included in 23 publications (23/191, 12.0%), but only 18 provided access to materials used in the study (18/23, 78.3%).Most publications did not include a preregistration statement (8/195, 4.1%) or protocol statement (4/195, 2.1%).Specifications of reproducibility-related characteristics are reported in Table

Table 1 :
Reproducibility Indicators.The indicators measured for the articles varied depending on its study type.More details about extraction and coding are available here: https://osf.io/x24n3/

Table 3 : Reproducibility-related Characteristics of Included Publications
Reproducibility-related characteristics that are available contain further specifications within Table4.Abbreviations: CI, Confidence Interval ) §Data specifications for documentation and raw data availability are provided if data was made accessible.‖Pre-registration specifications for contents are provided if pre-registration was made accessible. *