Image quality in whole-body MRI using the MY-RADS protocol in a prospective multi-centre multiple myeloma study

Keaveney, Sam; Dragan, Alina; Rata, Mihaela; Blackledge, Matthew; Scurr, Erica; Winfield, Jessica M.; Shur, Joshua; Koh, Dow-Mu; Porta, Nuria; Candito, Antonio; King, Alexander; Rennie, Winston; Gaba, Suchi; Suresh, Priya; Malcolm, Paul; Davis, Amy; Nilak, Anjumara; Shah, Aarti; Gandhi, Sanjay; Albrizio, Mauro; Drury, Arnold; Pratt, Guy; Cook, Gordon; Roberts, Sadie; Jenner, Matthew; Brown, Sarah; Kaiser, Martin; Messiou, Christina

doi:10.1186/s13244-023-01498-3

Original Article
Open access
Published: 15 October 2023

Image quality in whole-body MRI using the MY-RADS protocol in a prospective multi-centre multiple myeloma study

Sam Keaveney ORCID: orcid.org/0000-0001-7715-0750^1,2,
Alina Dragan¹,
Mihaela Rata^1,2,
Matthew Blackledge²,
Erica Scurr¹,
Jessica M. Winfield^1,2,
Joshua Shur¹,
Dow-Mu Koh^1,2,
Nuria Porta³,
Antonio Candito²,
Alexander King⁴,
Winston Rennie⁵,
Suchi Gaba⁶,
Priya Suresh⁷,
Paul Malcolm⁸,
Amy Davis⁹,
Anjumara Nilak¹⁰,
Aarti Shah¹¹,
Sanjay Gandhi¹²,
Mauro Albrizio¹³,
Arnold Drury¹⁴,
Guy Pratt¹⁵,
Gordon Cook^16,17,
Sadie Roberts¹⁶,
Matthew Jenner⁴,
Sarah Brown¹⁶,
Martin Kaiser^1,2 &
…
Christina Messiou^1,2

Insights into Imaging volume 14, Article number: 170 (2023) Cite this article

1802 Accesses
1 Citations
16 Altmetric
Metrics details

Abstract

Background

The Myeloma Response Assessment and Diagnosis System (MY-RADS) guidelines establish a standardised acquisition and analysis pipeline for whole-body MRI (WB-MRI) in patients with myeloma. This is the first study to assess image quality in a multi-centre prospective trial using MY-RADS.

Methods

The cohort consisted of 121 examinations acquired across ten sites with a range of prior WB-MRI experience, three scanner manufacturers and two field strengths. Image quality was evaluated qualitatively by a radiologist and quantitatively using a semi-automated pipeline to quantify common artefacts and image quality issues. The intra- and inter-rater repeatability of qualitative and quantitative scoring was also assessed.

Results

Qualitative radiological scoring found that the image quality was generally good, with 94% of examinations rated as good or excellent and only one examination rated as non-diagnostic. There was a significant correlation between radiological and quantitative scoring for most measures, and intra- and inter-rater repeatability were generally good.

When the quality of an overall examination was low, this was often due to low quality diffusion-weighted imaging (DWI), where signal to noise ratio (SNR), anterior thoracic signal loss and brain geometric distortion were found as significant predictors of examination quality.

Conclusions

It is possible to successfully deliver a multi-centre WB-MRI study using the MY-RADS protocol involving scanners with a range of manufacturers, models and field strengths. Quantitative measures of image quality were developed and shown to be significantly correlated with radiological assessment. The SNR of DW images was identified as a significant factor affecting overall examination quality.

Trial registration

ClinicalTrials.gov, NCT03188172, Registered on 15 June 2017.

Critical relevance statement

Good overall image quality, assessed both qualitatively and quantitatively, can be achieved in a multi-centre whole-body MRI study using the MY-RADS guidelines.

Key points

• A prospective multi-centre WB-MRI study using MY-RADS can be successfully delivered.

• Quantitative image quality metrics were developed and correlated with radiological assessment.

• SNR in DWI was identified as a significant predictor of quality, allowing for rapid quality adjustment.

Graphical Abstract

Background

Whole-body magnetic resonance imaging (WB-MRI) is a technique for imaging focal bone marrow lesions with superior sensitivity to ¹⁸F Fluorodeoxyglucose positron emission tomography/computed tomography (PET/CT) in patients with myeloma [1]. Contemporary WB-MRI is integral to international and national guidelines for patients with a suspected diagnosis of myeloma [2, 3]; however, it is not yet consistently available outside of centres with specialist expertise [2].

The need for standardised acquisition, interpretation and reporting of WB-MRI in myeloma led to the development of the Myeloma Response Assessment and Diagnosis System (MY-RADS) [4]. MY-RADS recommends key imaging parameters for WB diffusion weighted imaging (DWI), T₁-weighted (T₁w) Dixon imaging and T₁ and T₂-weighted (T₂w) sagittal spine imaging but does not mandate a complete set of imaging protocol parameters (the MY-RADS acquisition recommendations are summarised in Supplementary Table 1). Imaging sites are therefore required to optimise acquisition for their particular hardware and software in order to achieve high quality imaging.

Quantitative measurements of apparent diffusion coefficient (ADC) and fat fraction from WB-MRI show promise for evaluating and predicting treatment response [5,6,7]. By establishing acquisition protocols at a range of sites, multi-centre imaging studies are a crucial step in the translation of quantitative MR imaging biomarkers (qMR IBs) from research to clinical practice [8].

The feasibility of multi-centre WB-MRI has been demonstrated in healthy volunteers [9, 10] and, across a small number of sites, in patients with lymphoma [11, 12] and patients with myeloma [13]. Larger multi-centre WB-MRI studies have utilised imaging hubs, with patients referred to specialist imaging sites for scanning [14, 15]. This study is the first to establish standardised WB-MRI protocols across sites that reflect the variation in scanners and experience found in clinical practice and it is essential to evaluate the achievable image quality in this setting.

The purpose of this work was to evaluate the image quality achieved in a multi-centre WB-MRI study using the MY-RADS protocol across a range of scanner manufacturers and field strengths. Images were assessed qualitatively by radiological scoring and quantitatively using metrics developed to measure the presence and severity of image quality issues that frequently affect WB-MRI. The correlation between qualitative and quantitative metrics was evaluated, with a view towards developing tools for automated quality control (QC) of WB images in multi-centre studies.

Methods

OPTIMUM/MUKnine (ClinicalTrials.gov Identifier: NCT03188172 [16]) is a prospective phase II study to screen for high-risk multiple myeloma [17, 18] and evaluate a novel treatment strategy. A sub-study of MUKnine, IMAGIng Minimal residual disease in Myeloma (IMAGIMM), is investigating the potential of WB-MRI to monitor treatment response in patients with multiple myeloma.

Patients enrolled in this sub-study underwent WB-MRI scans at three timepoints: baseline/study enrolment, 3 months post-autologous stem cell transplantation (ASCT) and 18–21 months post-ASCT. This evaluation included images from 121 WB-MRI examinations (from 83 individual patients across all timepoints) acquired for the OPTIMUM/MUKnine trial IMAGIMM sub-study across ten UK sites. This comprises all imaging data uploaded to the trial’s central imaging repository by 20 May 2022.

The sites underwent a site qualification process [19] to establish a MY-RADS-compliant imaging protocol consisting of axial DWI, axial T₁w Dixon imaging and sagittal T₁w and T₂w spine imaging on a local scanner. Hardware and software limitations and scan time constraints required some protocol modifications between sites (full details are included in a prior publication [19]). Volunteer or exemplar patient data from each site were reviewed by the lead site to confirm that sufficient data quality was achievable prior to patient enrolment. Twelve sites were set up for the study; however, only ten went on to acquire patient data.

The scanners used for acquisition included five models from three manufacturers: 1.5 T MAGNETOM Aera, 1.5 T MAGNETOM Avanto, 3 T MAGNETOM Skyra (all Siemens Healthcare, Erlangen, Germany), 3 T Discovery MR750w (GE Healthcare, Milwaukee, USA) and 1.5 T and 3 T Ingenia (Philips Healthcare, Best, Netherlands). There were 110 examinations conducted at 1.5 T and 11 examinations conducted at 3 T. All data were sent to a central imaging repository at the lead site for QC and analysis.

Quantitative metrics are a valuable method for monitoring objective image quality; however, they must be linked to clinically relevant quality assessments. The following were identified as image artefacts or quality issues that commonly affect the quality of WB-MRI or DWI [20,21,22]:

1.
Low signal to noise ratio (SNR)
2.
Anterior thoracic signal loss
3.
Susceptibility artefacts
4.
Poor fat suppression
5.
Ghosting
6.
Geometric distortion
7.
Eddy current distortion
8.
Fat/water swaps

Each examination was scored both qualitatively and quantitatively as follows:

Qualitative assessment

A radiologist with over 4 years of WB-MRI experience used a Likert scale, defined in Table 1, to rate the quality of the overall examination and each image series: DWI (focusing on images with b-values of 50 smm⁻² (b50) and 900 smm⁻² (b900), and ADC maps), Dixon (focusing on water and fat images) and spine imaging (T₁w and T₂w spine images were evaluated together and are referred to collectively as “spine imaging” in this work). The presence and severity of each of the eight artefacts/image quality issues described above was also evaluated.

Table 1 Likert scales used to score image quality and the presence/effect on diagnostic quality of each artefact/image quality issue

Full size table

Susceptibility artefacts and fat/water swaps were scored for each artefact identified rather than for the whole examination. To capture regional variations, ghosting and geometric distortion were scored separately at the level of the pelvis and the brain. Differences in qualitative scores were evaluated for field strength (1.5 vs 3 T) and site using the Kruskal–Wallis H test.

Quantitative assessment

A semi-automated pipeline was developed in Matlab (R2019a, MathWorks, Natick, MA, USA) to calculate metrics related to each of the eight artefacts/image quality issues. Each quantitative metric is described in Table 2, with examples provided in Fig. 1. These metrics were developed in collaboration with a radiologist, with the intention that they should relate to clinically relevant features.

Table 2 Each of the image quality issues/artefacts is defined in terms of the image series and location defined, and the calculation of quantitative metric

Full size table

Three slice locations were identified for measurements:

Pelvis—at the widest point of the gluteal muscle on the axial cross-section
Thorax—at the widest point of the pectoral muscle on the axial cross-section
Brain—immediately superior to the orbits

For most metrics, measurements were made at one of these locations, chosen as the location where it was most suitable to measure. For each metric, the same location was used for all examinations. The physicist was required to identify the station and slice numbers corresponding to these locations, and to define the ROIs.

Measurements were made on the image series where the issue is likely to be most significant, e.g. SNR measurements were made on the b900 DW image as signal is inherently low. Some metrics were comparative, e.g. distortion on a b50 DW image is measured by comparing a contour to the equivalent contour in the water-only Dixon series.

Susceptibility artefacts can occur at any location and were therefore identified by the radiologist and measurements made wherever they occurred. No quantitative measure was developed for fat/water swaps as these are either present or not present. Examinations were grouped according to the qualitative score they received for each issue/artefact and one-way ANOVAs with Tukey post hoc tests were used to assess for group differences in quantitative scores.

Ordinal logistic regression was used to create a model of the relationship between all the quantitative metrics and the radiological score for DWI quality. The quantitative scores were prepared for this analysis as follows:

The natural logarithm was taken for any ratio metric (e.g. SNR or fat suppression) to linearise the response [23].
The reciprocal of ln(SNR) was taken so that a higher score corresponds to lower quality for all metrics.
Both susceptibility artefact metrics were aggregated across multiple artefacts to give total number of slices and total length as predictor variables.
All metrics were normalised onto an equivalent scale by calculating the mean and standard deviation across all examinations, then for each score subtracting the mean and dividing by the standard deviation.

Repeatability/reproducibility

Ten examinations, one from each site, were randomly selected for a sub-study to assess the repeatability of scoring. To examine intra-rater repeatability, the same radiologist repeated the qualitative scoring and the same physicist repeated the quantitative scoring. For inter-rater repeatability, a different radiologist (with 3 years of experience reporting WB-MRI) repeated the qualitative scoring and a different physicist repeated the quantitative scoring for the same subset of ten examinations.

Cohen’s weighted kappa, using the categories of agreement proposed by Landis and Koch [24], was used to assess the significance of intra- and inter-rater differences for the qualitative measures. The repeatability of quantitative scoring was assessed with Bland–Altman analysis and the intraclass correlation coefficient (ICC).

The difference between an “excellent” and “good” examination is unlikely to be as clinically significant as the difference between a “good” and “suboptimal” examination. The qualitative scores were therefore binarised into two categories, excellent/good and sub-optimal/non-diagnostic, and assessed in terms of percentage agreement.

Results

Qualitative assessment

Qualitative scoring for image quality and artefact presence/severity is summarised in Table 3 and Fig. 2, with examples of each score provided in Fig. 3.

Table 3 The number of examinations receiving each image quality score for diffusion-weighted imaging (DWI), Dixon imaging, sagittal spine imaging and overall examination. Note that Dixon imaging was not provided for one examination. To maintain consistency in the definition of overall exam this exam was excluded from the overall scoring, although Dixon and spine imaging were scored

Full size table

94.2% of examinations received a score of either good or excellent for overall image quality, with 93.4%, 95.8% and 99.2% receiving good or excellent scores for DWI, Dixon and spine imaging, respectively. This reflects that DWI generally remains marginally more challenging to implement than the rest of the protocol, although 66.1% of DWI exams were rated as excellent with only two (1.7%) rated as non-diagnostic.

A Kruskal–Wallis H test determined that the qualitative scores at 1.5 T were significantly higher than those at 3 T for overall exams (χ²(1) = 24.6, p < 0.001), DWI (χ²(1) = 32.0, p < 0.001) and spine imaging (χ²(1) = 16.4, p < 0.001), with no statistically significant difference for Dixon imaging (χ²(1) = 0.6, p = 0.559).

A Kruskal–Wallis H test showed a statistically significant difference in mean score between at least two sites for the overall exams (χ²(9) = 57.5, p < 0.001), DWI (χ²(9) = 47.4, p < 0.001), Dixon (χ²(9) = 86.2, p < 0.001) and spine imaging (χ²(9) = 72.5, p < 0.001).

Repeatability/reproducibility—qualitative scores

Intra- and inter-rater repeatability is illustrated graphically in Fig. 4. For the intra-rater image scoring, the agreement was excellent for Dixon imaging, substantial for overall exams and DWI and moderate for spine imaging. For the artefact scoring, the agreement was moderate or higher for all metrics apart from susceptibility artefacts, brain ghosting and eddy current distortion.

For the inter-rater image scoring, the agreement was substantial for DWI and moderate for overall exams, Dixon imaging and spine imaging. For the artefact scoring, the agreement was fair for all metrics except brain distortion, anterior signal loss, brain ghosting and pelvic ghosting, for which it was slight/poor.

When scores were binarised into excellent/good and sub-optimal/non-diagnostic categories, all scores had an intra-rater percentage agreement of between 80 and 100% and an inter-rater percentage agreement of between 70 and 100%.

Quantitative assessment

Figure 5 illustrates the quantitative scoring, with examinations grouped by their qualitative scores.

A one-way ANOVA found a statistically significant group difference in quantitative score between at least two groups for the following metrics: SNR (F(3,117) = 3.50, p = 0.018), anterior thoracic signal loss (F(3,117) = 41.71, p < 0.001), susceptibility number of affected slices (F(2,73) = 112.14, p < 0.001), susceptibility length (F(2,73) = 59.06, p < 0.001), fat suppression (F(2,118) = 89.77, p < 0.001), pelvic ghosting (F(2,118) = 19.67, p < 0.001) and brain geometric distortion (F(2,108) = 19.20, p < 0.007). Tukey’s HSD test for multiple comparisons was used to compare scores between individual groups, as indicated in Fig. 5. There was no statistically significant group difference for brain ghosting (p = 0.156) or eddy current distortion (p = 0.108).

The results of the ordinal logistic regression model are summarised in Table 4. The normalised metrics for SNR, anterior signal loss and brain distortion were found to be statistically significant predictors of DWI image quality.

Table 4 The model was used to predict the radiological DWI scan quality using all ten quantitative metrics. Metrics that were found to be statistically significant predictors are indicated with an asterisk

Full size table

The odds of an exam receiving a higher quality score were reduced by a factor of 0.62 (95% CI: 0.40–0.96), 0.49 (95% CI: 0.31–0.76) and 0.59 (95% CI: 0.37–0.92) for a unit increase in the normalised measures of 1/SNR, anterior signal loss and brain distortion, respectively.

Repeatability/reproducibility—quantitative scores

The repeatability of the quantitative scoring is summarised in Table 5, with Bland–Altman plots presented in Fig. 6.

Table 5 Summary of the intra- and interrater correlation of quality scores for each quantitative metric. Metrics are identified according to the letters assigned in Table 2

Full size table

For the intra-rater comparison, ICC was found to be higher than 0.75 (considered to indicate good reliability [25]) and statistically significant (Bonferroni-corrected α = 0.005) for SNR (ICC = 0.91, p < 0.001), fat suppression (ICC = 0.83, p < 0.001), brain ghosting (ICC = 0.74, p = 0.004) and susceptibility artefact number of slices (ICC = 0.95, p < 0.001). For the inter-rater comparison, this was the case for SNR (ICC = 0.51, p = 0.032), brain ghosting (ICC = 0.92, p < 0.001), pelvic ghosting (ICC = 0.90, p < 0.001), brain distortion (ICC = 0.65, p < 0.007) and susceptibility artefact number of slices (ICC = 0.85, p = 0.005).

Discussion

The MY-RADS guidelines promote standardisation for WB-MRI; however, image quality using the MY-RADS protocol has not previously been assessed in a large multi-centre study. For WB-MRI to become a widely available clinical tool outside of specialist centres, good image quality must be achievable across the range of hardware and software in use. Sites participating in the MUKnine IMAGIMM sub-study were invited based on their patient population and not prior WB-MRI experience, providing an opportunity to evaluate the achievable image quality in a realistic multi-centre WB-MRI study.

Out of 121 examinations from ten varied sites, 120 were judged by a radiologist to be diagnostic with 89 of those being of excellent overall quality. The high proportion of exams rated as good or excellent shows that the MY-RADS protocol can be successfully implemented in a representative patient cohort across a variety of sites. This result was achieved despite the additional challenges of the COVID-19 pandemic, which coincided with the study.

Only one overall examination was deemed to be non-diagnostic, scoring poorly across all series. In this case, the poor image quality can be linked to non-compliance with the desired imaging protocol, with DWI acquired with only two b-values, insufficient averaging and an incorrect slice thickness (6 mm rather than 5 mm). A different scanner was used to that which was qualified for this site, underlining the importance of the site qualification process for establishing protocols that deliver good image quality.

One other exam was reported to have non-diagnostic DW images. In this case, the examination was compliant with the imaging protocol; however, the quality of the b900 images was degraded by a loss of SNR due to the patient’s size and a substantial susceptibility artefact in the region of a metallic implant in the spine. The excellent quality of the Dixon and spine imaging meant that the overall exam retained some diagnostic value.

The qualitative radiological image scoring found that overall exams, DWI and spine imaging are higher quality at 1.5 T than at 3 T. The degree of anterior thoracic signal loss and geometric distortion at 3 T suggests that there are still challenges related to B₀ field homogeneity in the implementation of standardised protocols across the fleet of available scanners.

There were some limitations to this study, including the uneven distribution of manufacturer and field strength. 111 examinations were from a single manufacturer and 110 were conducted at 1.5 T, making it difficult to separate manufacturer, field strength and site-specific performance. No inferences have therefore been drawn regarding image quality across different scanner manufacturers. The quantitative measurements are limited by their reliance on a single imaging slice and therefore do not reflect the potential inhomogeneity of effects.

Both qualitative and quantitative scoring have a degree of subjectivity and repeatability must be assessed; however, Cohen’s kappa can be misleadingly low for small sample sizes such as this. For example, the inter-rater percentage agreement for qualitative anterior signal loss was 80%; however, the distribution of scores for this metric meant that Cohen’s kappa was 0.00, implying very poor agreement. Figure 4 demonstrates visually that the intra- and inter-rater repeatability between qualitative scores was generally good, providing reassurance that the radiological image scoring is a relatively objective measure of clinical image quality.

The quantitative metrics need to demonstrate significant correlation with a radiological assessment of clinical significance. This was the case for several of the metrics defined here, including SNR, anterior signal loss and the measures of susceptibility artefacts. Some metrics, such as fat/water swaps, were relatively common but generally did not affect diagnostic quality while others, such as eddy current distortion, occurred very rarely in these examinations. Clinical outcome was not considered in this work; however, it is assumed that radiological image quality is associated with lesion detection.

Manual assessment of image quality is time-consuming and impractical for larger cohorts so there is potential value in the development of automated quality assessment pipelines that reflect clinical interpretation of quality [26, 27]. When the overall quality of a WB-MRI examination was sub-optimal or non-diagnostic in this dataset, this was likely to be because of DWI quality issues. SNR, anterior/posterior signal ratio and brain distortion measurements were found to be statistically significant predictors of DWI quality and could therefore form an automated pipeline to predict radiological image quality. Retrospectively, this could be used to rapidly highlight sites providing poor quality imaging so that underlying issues can be addressed. An automated pipeline could also be implemented prospectively during protocol development or routine clinical scanning providing the user with feedback on quality that informs protocol development or modification.

The SNR of b900 DW images correlates with radiological assessment of SNR, is a significant predictor of qualitative image quality and demonstrates good repeatability. It is therefore proposed that the SNR of b900 DW images is the most important factor determining the quality of WB-MRI examinations and that measurement of SNR may be used to predict exam quality. The use of simple SNR measurements should be investigated further to characterise the performance of a particular scanner or acquisition sequence for WB-DWI and to provide a benchmark for acceptable image quality in multi-centre trials.

Conclusions

This image quality assessment has shown for the first time that it is possible to successfully deliver a multi-centre WB-MRI study using the MY-RADS protocol, even from sites with a range of hardware and prior WB-MRI experience. This underlines the importance of the site qualification process [19], which established acquisition protocols that were optimised to local conditions and ensured that all sites were capable of delivering high quality imaging prior to patient enrolment. Quantitative metrics of image quality have been shown to have good repeatability and correlation with radiological assessment and could be developed further to provide a pipeline for automated QC of WB-MRI data in multi-centre studies.

Availability of data and materials

Due to privacy regulations, the data used in this study are not publicly available. In order to see and discuss the data, the authors can be contacted. If needed, we can arrange approval to share the data with individual researchers.

Abbreviations

ADC:: Apparent diffusion coefficient
ASCT:: Autologous stem cell transplantation
DWI:: Diffusion weighted imaging
EPI:: Echo planar imaging
ICC:: Intraclass correlation coefficient
MY-RADS:: Myeloma Response Assessment and Diagnosis System
PET/CT:: Positron emission tomography/computed tomography
QC:: Quality control
qMR IBs:: Quantitative magnetic resonance imaging biomarkers
ROI:: Region of interest
SNR:: Signal to noise ratio
WB-MRI:: Whole-body magnetic resonance imaging

References

Messiou C, Porta N, Sharma B et al (2021) Prospective evaluation of whole-body MRI versus FDG PET/CT for lesion detection in participants with myeloma. Radiology 3:e210048
PubMed PubMed Central Google Scholar
Hillengass J, Usmani S, Rajkumar SV et al (2019) International myeloma working group consensus recommendations on imaging in monoclonal plasma cell disorders. Lancet Oncol 20:e302–e312
Article PubMed Google Scholar
National Institute for Health and Care Excellence (NICE) NG35 (2016) Myeloma: diagnosis and management
Google Scholar
Messiou C, Hillengass J, Delorme S et al (2019) Guidelines for acquisition, interpretation, and reporting of whole-body MRI in myeloma: myeloma response assessment and diagnosis system (MY-RADS). Radiology 291:5–13
Article PubMed Google Scholar
Wu C, Huang J, Xu W-B et al (2018) Discriminating depth of response to therapy in multiple myeloma using whole-body diffusion-weighted MRI with apparent diffusion coefficient: preliminary results from a single-center study. Acad Radiol 25:904–914
Article PubMed Google Scholar
Zhang Y, Xiong X, Fu Z et al (2019) Whole-body diffusion-weighted MRI for evaluation of response in multiple myeloma patients following bortezomib-based therapy: a large single-center cohort study. Eur J Radiol 120:108695
Article PubMed Google Scholar
Latifoltojar A, Hall-Craggs M, Rabin N et al (2017) Whole body magnetic resonance imaging in newly diagnosed multiple myeloma: early changes in lesional signal fat fraction predict disease response. Br J Haematol 176:222–233
Article PubMed Google Scholar
O’Connor JP, Aboagye EO, Adams JE et al (2017) Imaging biomarker roadmap for cancer studies. Nat Rev Clin Oncol 14:169–186
Article CAS PubMed Google Scholar
Schlett CL, Hendel T, Hirsch J et al (2016) Quantitative, organ-specific interscanner and intrascanner variability for 3 T whole-body magnetic resonance imaging in a multicenter, multivendor study. Invest Radiol 51:255–265
Article PubMed Google Scholar
Michoux NF, Ceranka JW, Vandemeulebroucke J et al (2021) Repeatability and reproducibility of ADC measurements: a prospective multicenter whole-body-MRI study. Eur Radiol 1–14
Kwee TC, Vermoolen MA, Akkerman EA et al (2014) Whole-body MRI, including diffusion-weighted imaging, for staging lymphoma: Comparison with CT in a prospective multicenter study. J Magn Reson Imaging 40:26–36
Article PubMed Google Scholar
Littooij AS, Kwee TC, de Keizer B et al (2015) Whole-body MRI-DWI for assessment of residual disease after completion of therapy in lymphoma: a prospective multicenter study. J Magn Reson Imaging 42:1646–1655
Article PubMed Google Scholar
Wennmann M, Thierjung H, Bauer F et al (2022) Repeatability and reproducibility of ADC measurements and MRI signal intensity measurements of bone marrow in monoclonal plasma cell disorders: a prospective bi-institutional multiscanner, multiprotocol study. Invest Radiol 57:272–281
Article CAS PubMed Google Scholar
Taylor SA, Mallett S, Ball S et al (2019) Diagnostic accuracy of whole-body MRI versus standard imaging pathways for metastatic disease in newly diagnosed non-small-cell lung cancer: the prospective Streamline L trial. Lancet Respir Med 7:523–532
Article PubMed PubMed Central Google Scholar
Taylor SA, Mallett S, Beare S et al (2019) Diagnostic accuracy of whole-body MRI versus standard imaging pathways for metastatic disease in newly diagnosed colorectal cancer: the prospective Streamline C trial. Lancet Gastroenterol Hepatol 4:529–537
Article PubMed PubMed Central Google Scholar
Brown S, Sherratt D, Hinsley S et al (2021) MUKnine OPTIMUM protocol: a screening study to identify high-risk patients with multiple myeloma suitable for novel treatment approaches combined with a phase II study evaluating optimised combination of biological therapy in newly diagnosed high-risk multiple myeloma and plasma cell leukaemia. BMJ Open 11:e046225
Article PubMed PubMed Central Google Scholar
Kyle R, Rajkumar SV (2009) Criteria for diagnosis, staging, risk stratification and response assessment of multiple myeloma. Leukemia 23:3–9
Article CAS PubMed Google Scholar
Shah V, Sherborne AL, Walker BA et al (2018) Prediction of outcome in newly diagnosed myeloma: a meta-analysis of the molecular profiles of 1905 trial patients. Leukemia 32:102–110
Article CAS PubMed Google Scholar
Rata M, Blackledge M, Scurr E et al (2022) Implementation of Whole-Body MRI (MY-RADS) within the OPTIMUM/MUKnine multi-centre clinical trial for patients with myeloma. Insights Imaging 13:1–16
Article Google Scholar
Koh D-M, Blackledge M, Padhani AR et al (2012) Whole-body diffusion-weighted MRI: tips, tricks, and pitfalls. AJR Am J Roentgenol 199:252–262
Article PubMed Google Scholar
Barnes A, Alonzi R, Blackledge M et al (2018) UK quantitative WB-DWI technical workgroup: consensus meeting recommendations on optimisation, quality control, processing and analysis of quantitative whole-body diffusion-weighted imaging for cancer. Br J Radiol 91:20170577
Article PubMed Google Scholar
QIBA Dwi Biomarker Committee (2022) Quantitative Imaging Biomarkers Alliance (QIBA) Profile: Diffusion-Weighted Magnetic Resonance Imaging (DWI), clinically feasible profile
Google Scholar
Hedges LV, Gurevitch J, Curtis PS (1999) The meta-analysis of response ratios in experimental ecology. Ecology 80:1150–1156
Article Google Scholar
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 159–74
Koo TK, Li MY (2016) A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 15:155–163
Article PubMed PubMed Central Google Scholar
Küstner T, Gatidis S, Liebgott A et al (2018) A machine-learning framework for automatic reference-free quality assessment in MRI. Magn Reson Imaging 53:134–147
Article PubMed Google Scholar
Kastryulin S, Zakirov J, Pezzotti N, Dylov DV (2023) Image quality assessment for magnetic resonance imaging. IEEE Access 11:14154–14168
Article Google Scholar

Download references

Acknowledgements

We would like to acknowledge all patients and staff from clinical trials, haematology and MRI units across the ten sites involved in this multi-centre trial.

Funding

We would like to acknowledge Janssen and Celgene (for supporting the MUKnine OPTIMUM study), as well as support from Cancer Research UK National Cancer Imaging Translational Accelerator (NCITA), the National Institute for Health and Care Research (NIHR) Biomedical Research Centre and the Clinical Research Facility in Imaging at The Royal Marsden NHS Foundation Trust and The Institute of Cancer Research, London. The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.

Author information

Authors and Affiliations

MRI Unit, The Royal Marsden NHS Foundation Trust, London, UK
Sam Keaveney, Alina Dragan, Mihaela Rata, Erica Scurr, Jessica M. Winfield, Joshua Shur, Dow-Mu Koh, Martin Kaiser & Christina Messiou
Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK
Sam Keaveney, Mihaela Rata, Matthew Blackledge, Jessica M. Winfield, Dow-Mu Koh, Antonio Candito, Martin Kaiser & Christina Messiou
Clinical Trials and Statistics Unit, The Institute of Cancer Research, London, UK
Nuria Porta
University Hospital Southampton NHS Foundation Trust, Southampton, UK
Alexander King & Matthew Jenner
University Hospitals of Leicester NHS Trust, Leicester, UK
Winston Rennie
University Hospitals of North Midlands NHS Trust, Stoke-on-Trent, UK
Suchi Gaba
University Hospitals Plymouth NHS Trust, Plymouth, UK
Priya Suresh
Norfolk & Norwich University Hospitals NHS Foundation Trust, Norwich, UK
Paul Malcolm
Epsom & St. Helier University Hospitals NHS Trust, Epsom, UK
Amy Davis
Worcestershire Acute Hospitals NHS Trust, Worcester, UK
Anjumara Nilak
Hampshire Hospitals NHS Foundation Trust, Basingstoke, UK
Aarti Shah
North Bristol NHS Trust, Bristol, UK
Sanjay Gandhi
Nottingham University Hospitals NHS Trust, Nottingham, UK
Mauro Albrizio
Royal Bournemouth and Christchurch Hospitals NHS Foundation Trust, Bournemouth, UK
Arnold Drury
University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
Guy Pratt
Clinical Trials Research Unit, Leeds Institute of Clinical Trials Research, University of Leeds, Leeds, UK
Gordon Cook, Sadie Roberts & Sarah Brown
Leeds Cancer Centre, Leeds Teaching Hospitals NHS Trust, Leeds, UK
Gordon Cook

Authors

Sam Keaveney
View author publications
You can also search for this author in PubMed Google Scholar
Alina Dragan
View author publications
You can also search for this author in PubMed Google Scholar
Mihaela Rata
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Blackledge
View author publications
You can also search for this author in PubMed Google Scholar
Erica Scurr
View author publications
You can also search for this author in PubMed Google Scholar
Jessica M. Winfield
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Shur
View author publications
You can also search for this author in PubMed Google Scholar
Dow-Mu Koh
View author publications
You can also search for this author in PubMed Google Scholar
Nuria Porta
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Candito
View author publications
You can also search for this author in PubMed Google Scholar
Alexander King
View author publications
You can also search for this author in PubMed Google Scholar
Winston Rennie
View author publications
You can also search for this author in PubMed Google Scholar
Suchi Gaba
View author publications
You can also search for this author in PubMed Google Scholar
Priya Suresh
View author publications
You can also search for this author in PubMed Google Scholar
Paul Malcolm
View author publications
You can also search for this author in PubMed Google Scholar
Amy Davis
View author publications
You can also search for this author in PubMed Google Scholar
Anjumara Nilak
View author publications
You can also search for this author in PubMed Google Scholar
Aarti Shah
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Gandhi
View author publications
You can also search for this author in PubMed Google Scholar
Mauro Albrizio
View author publications
You can also search for this author in PubMed Google Scholar
Arnold Drury
View author publications
You can also search for this author in PubMed Google Scholar
Guy Pratt
View author publications
You can also search for this author in PubMed Google Scholar
Gordon Cook
View author publications
You can also search for this author in PubMed Google Scholar
Sadie Roberts
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Jenner
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Brown
View author publications
You can also search for this author in PubMed Google Scholar
Martin Kaiser
View author publications
You can also search for this author in PubMed Google Scholar
Christina Messiou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SK designed the study, analysed and interpreted the data and drafted and revised the manuscript. AD designed the study, analysed and interpreted the data and revised the manuscript. MR, MB and ES set-up sites, acquired the data and revised the manuscript. JMW conceived and designed the study, analysed and interpreted data and drafted and revised the manuscript. JS analysed and interpreted data and revised the manuscript. DMK conceived and designed the study, interpreted data and revised the manuscript. NP interpreted data and revised the manuscript. AC developed software and revised the manuscript. AK, WR, SG, PS, PM, AD, AN, AS, SG, MA and AD acquired data and revised the manuscript. GP, GC, SR, MJ and SB designed the multi-centre trial and revised the manuscript. MK designed the multi-centre trial, conceived the study and revised the manuscript. CM designed the multi-centre trial, conceived and designed the study, interpreted data and drafted and revised the manuscript.

Corresponding author

Correspondence to Sam Keaveney.

Ethics declarations

Ethics approval and consent to participate

All patients provided written consent for inclusion into the study. ClinicalTrials.gov identifier: NCT03188172.

Consent for publication

Authors consent to publication.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplementary Table 1.

The MY-RADS recommended protocol for WB-MRI.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Keaveney, S., Dragan, A., Rata, M. et al. Image quality in whole-body MRI using the MY-RADS protocol in a prospective multi-centre multiple myeloma study. Insights Imaging 14, 170 (2023). https://doi.org/10.1186/s13244-023-01498-3

Download citation

Received: 11 May 2023
Accepted: 08 August 2023
Published: 15 October 2023
DOI: https://doi.org/10.1186/s13244-023-01498-3

Image quality in whole-body MRI using the MY-RADS protocol in a prospective multi-centre multiple myeloma study

Abstract

Background

Methods

Results

Conclusions

Trial registration

Critical relevance statement

Key points

Graphical Abstract

Background

Methods

Qualitative assessment

Quantitative assessment

Repeatability/reproducibility

Results

Qualitative assessment

Repeatability/reproducibility—qualitative scores

Quantitative assessment

Repeatability/reproducibility—quantitative scores

Discussion

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1: Supplementary Table 1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords