- Critical Review
- Open Access
Factors affecting the value of diffusion-weighted imaging for identifying breast cancer patients with pathological complete response on neoadjuvant systemic therapy: a systematic review
Insights into Imaging volume 12, Article number: 187 (2021)
This review aims to identify factors causing heterogeneity in breast DWI–MRI and their impact on its value for identifying breast cancer patients with pathological complete response (pCR) on neoadjuvant systemic therapy (NST). A search was performed on PubMed until April 2020 for studies analyzing DWI for identifying breast cancer patients with pCR on NST. Technical and clinical study aspects were extracted and assessed for variability. Twenty studies representing 1455 patients/lesions were included. The studies differed with respect to study population, treatment type, DWI acquisition technique, post-processing (e.g., mono-exponential/intravoxel incoherent motion/stretched exponential modeling), and timing of follow-up studies. For the acquisition and generation of ADC-maps, various b-value combinations were used. Approaches for drawing regions of interest on longitudinal MRIs were highly variable. Biological variability due to various molecular subtypes was usually not taken into account. Moreover, definitions of pCR varied. The individual areas under the curve for the studies range from 0.50 to 0.92. However, overlapping ranges of mean/median ADC-values at pre- and/or during and/or post-NST were found for the pCR and non-pCR groups between studies. The technical, clinical, and epidemiological heterogeneity may be causal for the observed variability in the ability of DWI to predict pCR accurately. This makes implementation of DWI for pCR prediction and evaluation based on one absolute ADC threshold for all breast cancer types undesirable. Multidisciplinary consensus and appropriate clinical study design, taking biological and therapeutic variation into account, is required for obtaining standardized, reliable, and reproducible DWI measurements for pCR/non-pCR identification.
Large heterogeneity/variability in studies hampers successful clinical implementation of DWI metrics.
Technical variability was encountered in, e.g., b-value combinations, ROIs, and models.
Clinical heterogeneity was observed (e.g., scan-moment during treatment, tumor type differentiation, and NST-protocol)
Multi-disciplinary consensus/cooperation is required for proper clinical study design.
Quality control and standardization are essential for clinical and technical validation.
Women with breast cancer are increasingly treated with neoadjuvant systemic therapy (NST) . The optimal response is achieved when at subsequent surgical pathology no residual cancer is detected (pathological complete response, pCR). Between subtypes, pCR rates vary widely from 0.3% (luminal A) to 60% (HER2-type) .
To identify breast tumor pCR, a diagnostic lumpectomy is currently necessary, albeit for therapeutic reasons this may no longer be required. Identifying pCR with imaging only would be a significant improvement, as it would prevent needless surgical procedures. However, this requires that non-pCR is accurately detected. Only then omitting surgery can be accepted with a wait-and-see strategy as a practical and reliable alternative. Such an approach is already proposed for colorectal cancer treated with neo-adjuvant chemo-radiotherapy . In the case of breast cancer, 18F-FDG PET-CT and/or dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI) is extensively investigated to predict and evaluate NST-response [4, 5]. Despite all these efforts, NST response assessment still needs to be improved. The percentage of correctly identified pCR on MRI appears too low to safely omit diagnostic lumpectomy . Furthermore, the accuracy of DCE-MRI seems to depend on the cancer subtype [6, 7]. In addition, the potential risk of the observed gadolinium deposition in the deep nuclei of the brain after repeated exposure to gadolinium-based contrast agent has raised some concerns . Therefore, other MRI-techniques, like diffusion-weighted imaging (DWI), are investigated .
While DCE provides information on perfusion, DWI provides information about cell density and tissue microstructure based on the diffusion of tissue water. Tumors with high cell density have a relative low apparent diffusion coefficient (ADC), which theoretically increases when the density is reduced by chemotherapy. However, this is not observed in all tumors, since ADC is dependent on multiple factors .
The use of DWI might be beneficial for the response assessment of NST, as microstructure changes may be detected at an earlier stage than tumor size reduction . Previous reviews reported aggregate values on the performance of DWI–MRI for predicting or identifying pCR. Chu et al. reported a sensitivity = 0.88, and specificity = 0.79 ; similarly, Gao et al. reported sensitivity = 0.89 and a specificity = 0.72 . However, reported cutoff ADC-values in the individual studies appear variable, preventing the use of a single cutoff value to achieve such performance. It is, therefore, uncertain whether these aggregate performance measures are valid. In addition, studies vary in including factors, such as patient selection, tumor subtypes, and NST-types. Moreover, the methodology used for quantitative analysis of DWI–MRI is not uniform. To partly solve this issue, Baltzer et al. published a EUSOBI consensus paper regarding DWI of the breast for lesion classification. However, the consensus paper does not provide insights on issues applicable in treatment monitoring using DWI for identifying patients with pCR . To shed a light on the magnitude of these issues, this review aims to identify technical, clinical, and biological heterogeneity and their impact in DWI studies identifying pCR on NST. The final aim is to support a more robust implementation of quantitative DWI for NST monitoring in breast cancer patients.
Materials and methods
Search, inclusion/exclusion criteria, and quality assessment
A PubMed-search was performed until April 2020, using Medical Subject Headings (MeSH)- and free-text terms for breast cancer, NST, DCE, DWI, and pCR. Identified abstracts were read and selected by two researchers. Abstracts were excluded when they were: (1) not published in English; (2) not about human breast cancer; (3) studies that performed no prediction/evaluation of the breast tumor with pCR; (4) studies that did not compare outcome to histopathology; (5) studies with neoadjuvant therapy using radiotherapy; (6) comment on; (7) meta-analysis; (8) case report.
Data extraction and analysis
Data were extracted based on general parameters (e.g., first author, publication year), clinical characteristics (e.g., type of tumor, neoadjuvant treatment protocol), scan-moments (i.e., before, during (number of cycles) and/or post-NST), MRI/DWI protocol parameters (e.g., B0-field strength (T), b-values (s/mm2)), and details on the measures derived from the DWI data (e.g., ADC (mm2/s)). The reported performance measures per study were collected. For pCR prediction/detection, pCR-definitions were also extracted, since studies could permit different degrees of residual (tumor) tissue for pCR.
If performance measures were missing, reconstruction was tried by extracting data (from full-text/supplementary material) normally used in 2 × 2 contingency tables. In this review, pCR and non-pCR are defined as, respectively, positive and negative events.
After data extraction, grouping of results based on comparable study methodologies/definitions was performed. Data were analyzed by comparing study population (-related) and MR (-related) parameters to outcomes in terms of distinguishing pCR/non-pCR.
Sub-analyses were performed on different pCR-definitions (regarding in- or exclusion of residual ductal carcinoma in situ (DCIS)), when sufficient data were available.
Due to expected heterogeneity, we did not initially intend to conduct formal data-pooling and/or meta-analysis. Post hoc analysis of the results also prohibited this.
Search strategy and study selection
The search (Additional file 1: Search term combinations in PubMed) resulted in 42 unique publications. After selection, 20 publications were included (Fig. 1). QUADAS-2  assessment identified sources of bias and applicability concerns present in most studies (Table 1). In some studies, the patient selection might have initiated bias by using a non-representative study population (e.g., not describing the group as consecutive, small research populations in a large time interval, tumor diameter as exclusion criterion). Furthermore, several studies included patients who had a different number of scans within the study.
General study characteristics
A general overview of the study characteristics is presented in Tables 2 and 3. In total, approximately 1455 patients/lesions were included (range per study: 7–242) (Table 2). Most studies were retrospective single center. NST-regimens varied widely between and within the studies. The pCR-ratios varied (12.9–39.3%), reflecting the variability and wide distribution of histological and molecular subtypes in the studies. In addition, the pCR-definitions differed between studies. Taking this into account, we noted that in general higher pCR-ratios were observed with less stringent pCR-definitions (especially for DCIS), as shown in Table 4.
MRI characteristics and DWI measures to predict and evaluate NST response
Regarding MRI-scanners, coils, and acquisition parameters of the DWI sequence, large heterogeneity was observed (Table 3). For example, in ten studies, DWI was performed at 1.5 T, eight studies used a 3.0 T scanner, and two studies used MRI-scanners with both field strengths. Although most studies used single-shot echo-planar imaging (SS-EPI), a wide variety was observed within and between studies regarding echo times (TE), the use of low b-values (< 150 s/mm2), methods to calculate ADC-values, and region of interest (ROI)-definitions (Table 5). Details/study characteristics (Tables 2, 3) are reviewed in “Discussion” section.
Furthermore, the DWI measures varied in the studies (e.g., absolute, relative (: (percentage) change, ratios) or histogram related values). Figures 2 and 3 illustrate the ADC values and the percentage change in ADC over time for pCR and non-pCR, respectively. In Fig. 2, studies using scanners with the main magnetic field strength B0, 1.5 T or 3 T, were also visually separated (Fig. 2).
Sixteen publications investigated DWI–MRI at baseline for predicting pCR. Most studies tried to identify an ADC-threshold. The reported overall (mean or median) ADC-values varied between studies for patients that obtained pCR (0.41 × 10−3–1.16 × 10−3 mm2/s) and those that did not (0.64 × 10−3–1.23 × 10−3 mm2/s). Reported thresholds were highly variable. Figure 4 shows the results of three studies that distinguished pCR/non-pCR based on molecular subtype [16,17,18]. In general intervals of ADC-values for pCR and non-pCR cases were overlapping between studies (Fig. 5). An observed trend within studies, where residual DCIS is explicitly not allowed in the pCR-definition, is that some tumors with a relative low ADC tend to have a higher chance to show pCR on NST (Fig. 5, category: “Invasive-, DCIS-”).
Some studies reported non-mono-exponential/non-Gaussian models, (e.g., intravoxel incoherent motion (IVIM)). A mean true diffusion coefficient (D) of 0.92 × 10−3 mm2/s (pCR) versus 0.83 × 10−3 mm2/s (non-pCR) was reported (p = 0.323) . Another non-Gaussian approach, stretched exponential modeling (SEM), quantifying the intravoxel heterogeneity (i.e., the intravoxel heterogeneity index (α)) and the distributed diffusion coefficient (DDC) in a multi-exponential decay, resulted in cutoff values for α = 0.838 (AUC = 0.644) and DDC = 1.141 × 10−3 mm2/s (AUC = 0.756) . Furthermore, one study identified skewness (p < 0.05) and entropy (p = 0.05) (both histogram based features) as predictor for pCR . More details are presented in Table 6.
DWI–MRI during NST
Nine studies reported on absolute ADC-values during NST to predict pCR. The scan-moments varied widely between the studies (after 1–5 NST-cycles). Reported ADC-values were heterogeneous. Overall, increasing ADC-values during NST seem to reflect response of the tumor. However, there is no clear threshold to distinguish partial and non-responders from complete responders. The optimal scan-moment evaluating therapy during NST seems to be subtype and NST-regimen dependent.
In one study , three types of NST (start) regimens were compared to predict pCR for different molecular subtypes. Looking at the highest AUC per subtype over all NST variants, the optimal scan-moment for pCR prediction in Luminal A and B after starting with taxanes or anthracyclines is suggested after 3 weeks of therapy. When using change in ADC, an AUC = 0.865 for Luminal B (starting with taxanes) and AUC = 0.845 for luminal A (when starting with anthracyclines) are reported. The optimal scan-moment for basal-like and HER2-enriched tumors starting with anthracyclines and taxanes is suggested after 3 weeks, with AUC = 0.879 and AUC = 0.783, respectively, using change in ADC. For other NST-regimen and molecular subtype combinations, 6 weeks is reported as optimal scan-moment. The optimum can thus differ, depending on a specific NST-type and cancer subtype; see for all details .
A difficulty is that reported series are in general small. Subdividing those in different subtypes and NST regimen leads to very small study populations. Partridge et al.  reported that all subtypes were underpowered, except HR+/HER2−. For this subtype, the predictive value of DWI (\(\Delta\)ADC (%)) after 3 weeks of taxane (paclitaxel) treatment achieved an AUC of 0.61, whereas Yuan et al.  reported an AUC = 0.678 for the (absolute) \(\Delta\)ADC in Luminal A cancers, neglecting Ki-67 in this comparison. Furthermore, one study investigated three ROI-types in luminal cancer and defined the optimal ROIs according to the specific shrinkage pattern, achieving an AUC = 0.877 for \(\Delta\)ADC% after two cycles . In addition, ADC-ratios, related to baseline and a time point (number of cycles), were analyzed. Here, increased AUCs were observed as the evaluation moment progressed toward post-NST .
Studying IVIM, Che et al.  found after two cycles a mean true diffusion coefficient (D) of 1.36 × 10−3 mm2/s (pCR) versus 0.98 × 10−3 mm2/s (non-pCR) over all subtypes (p = 0.001). For distinguishing pCR/non-pCR, they reported a cutoff value of 0.971 × 10−3 mm2/s, yielding a 100% sensitivity at 63% specificity (AUC = 0.851). Another IVIM-parameter, the change in perfusion fraction (\(\Delta f)\) showed an AUC of 0.906 using a cutoff of 11.3% . More details are displayed in Tables 7 and 8.
DWI–MRI after NST
Four papers evaluated absolute post-NST ADC-values (Table 9). In one study , an ADC-threshold of 1.19 × 10−3 mm2/s to distinguish pCR/non-pCR yielded an AUC of 0.80. Another study  used higher thresholds that also differed for the molecular subtypes (range: 1.33 × 10−3 mm2/s (luminal B) to 1.43 × 10−3 mm2/s (triple negative)).
Using the change in ADC between baseline and post-NST, one study suggested a threshold of 40.7% of increase to identify patients with a pCR, with 100% sensitivity, 91% specificity, and an AUC of 0.96 .
Finally, also the ROI-methodology differed between studies for cases with and without apparent residual disease (ROI-specifications: Table 5).
This review describes 20 studies reporting on DWI–MRI prior to/during/after NST to identify pCR of the breast. A major finding is that the studies were very heterogeneous regarding clinical, technical, and epidemiological aspects. These differences make pooling of results for meta-analysis difficult. Previous meta-analyses [12, 13] should therefore be interpreted with caution. Currently, it is impossible to define the role of DWI in identifying pCR after NST. The observed heterogeneity in type of cancers, applied treatments, and used quantification methods precludes straightforward implementation of DWI protocols for NST-monitoring in other hospitals.
Some of these limitations were also recognized for the value of DWI for lesion classification. The European Society of Breast Imaging (EUSOBI) International DWI working group recently published a consensus and mission statement to alleviate this issue for lesion classification only . Further standardization to implement DWI for treatment monitoring seems based on the findings of current systematic review essential.
The Quantitative Imaging Biomarkers Alliance (QIBA) of the RSNA published in 2018 for some organs standards related to implementation of quantitative DWI biomarkers (like reproducibility, repeatability, and regarding measurement errors vs. real changes) . In the revised standard, currently under development, also technical breast imaging aspects are included . These aspects may alleviate some differences in acquisition and evaluation parameters that currently make multicenter implementation challenging. From a technical perspective, even more parameters than discussed in this review may influence measurements [30,31,32,33]. Different hardware components and MRI-protocols might also initiate effect on the precision and accuracy of the DWI metrics obtained for pCR prediction/evaluation or even DWI in general [34,35,36]. Furthermore, interpretation factors (e.g., reading system, reader experience) may affect results. Some quality issues were already addressed in a test–retest study of Newitt et al. . Strikingly, the biological variability of cancers and the differences in treatment protocols are not at all addressed by the available guidelines.
Below, we discuss some of the most eye-catching differences between studies with respect to treatment monitoring that need to be addressed shortly. We acknowledge that this list is certainly not complete.
As observed, ADC-values overlapped between pCR/non-pCR groups, and between studies. This may partly be explained by different b-value combinations used for calculating ADC-values . For example, including perfusion-sensitive low b-values can overestimate ADC, whereas using (diffusion and noise sensitive) high b-values potentially underestimate ADC (Fig. 6, Additional file 1: Figure A and B, illustrating the different slopes). Moreover, the b-values can be constructed in different ways (i.e., depending on the DWI gradient properties). Theoretically, diffusion time, and thereby the DWI image, can vary between scans, although the b-value is identical. This makes it difficult to compare b-values between scanners. Reporting differences in the gradient strength and its timing properties, which may influence measurement results, makes multi-center multi-scanner studies easier to understand. This is important as DWI, by applying a certain b-value, can be sensitive to intra- and/or extra-cellular water motion effects (i.e., restriction and hindrance, respectively) and/or perfusion/pseudo-diffusion effects. Additionally, the ADC calculation methods (e.g., the scanner or specific formulas)  might not be identical.
The large variation in studies is fairly illustrated by the differences in the baseline mean ADC: in Santamaria et al. and Tozaki et al. between the non-pCR (1.072 × 10−3 mm2/s vs. 0.64 × 10−3 mm2/s) and pCR cases (1.025 × 10−3 mm2/s vs. 0.41 × 10−3 mm2/s), with b-combinations: b = 0 s/mm2 and b = 700 s/mm2 or b = 50 s/mm2 and b = 700 s/mm2 for Santamaria et al.; and b = 500 s/mm2 and b = 1500 s/mm2 for Tozaki et al. [27, 40]. ADC cutoff values for pCR and non-pCR reported in different studies may thus be sensitive to technical heterogeneity. This makes Quality Control (QC) and Quality Assurance (QA), using DWI phantoms [41,42,43] and patient test–retest procedures , essential.
One could argue that, in a longitudinal study, using (flow-sensitive) low b-values may have an undesirable effect on the validity of ADC measuring response in highly vascularized tumors. NST reduces vascularization within the ROI and therefore leads to a decrease in the perfusion fraction (f), which may cause a decrease in the slope of (a part of) the attenuation curve. Simultaneously the diffusion coefficient increases and compensates this decrease, resulting in a smaller (or even no) difference in ADC between time points. Theoretically, separating the perfusion/pseudo-diffusion and diffusion effects by using > 2 b-values and calculating IVIM-parameters could solve this. However, whether this is really beneficial could not be concluded from the included studies in this review due to the small number of studies and heterogeneity. The complexity of choosing the optimal scan-moments and parameters can be observed in Li et al.  who suggested that tumors with a relative high ADC during NST are more likely to show pCR, while Tozaki et al.  suggested the opposite (Table 7). However, this could not clearly be explained by the DWI acquisition moment during NST.
Besides DWI models  and b-values, ROI-selection is also crucial for a representative quantitative analysis of each lesion. Using different ROI-definitions (2D/3D) can influence the quantitative results in general, as reported by Bickel et al. . These authors suggested to choose the area of the most aggressive part, the minimum ADC for a 2D-ROI.  Other methods are also studied, like whole tumor versus small sub-regions ROI’s . However, these publications are related to lesion classification. It is even more unclear which ROI is most appropriate in a longitudinal setting. Within the ROI, partial volume effects (PVE) might influence (mean) ADC. During therapy, tumor heterogeneity (and thus PVE) may increase and the optimal ROI selection may be affected by various observed shrinkage patterns of breast cancer [24, 48]. Consequently, these aspects make choosing a reliable ROI during and after therapy even more difficult to standardize. Based on systematic review, no optimal ROI technique was identified . In line with the recent study of Wielema et al., regarding the optimal ROI technique for lesion classification using DWI, more extensive research regarding this specific topic in the setting of therapy monitoring is also required.
For identifying the most reliable ROI, in case of small regions of (residual) disease, a sufficient spatial resolution and contrast-to-noise ratio (CNR) between the lesion and the breast parenchyma is required. In DWI, this can be challenging, as often SS-EPI is used with a large field-of-view (FOV) for covering both breasts and thereby compromising spatial resolution due to signal-to-noise ratio (SNR) and scan-time limitations. Therefore, often DCE-images are used as guidance for tumor localization, assisting with identifying lesion(s) at the high b-value images. It should be noticed that at higher b-values, the SNR decreases and thereby the noise level (rician noise floor) can be reached. To increase SNR for these cases, the number of excitations (NEX) can be increased, which directly will increase the total scan-time. Balancing both (noise ratio and scan-efficiency) can be challenging and will depend on the magnitude of the high b-value image. Increasing the highest b-value might result in a longer TE, causing a lower SNR, requiring more NEX, and finally a longer acquisition time. Moreover, as there is an inverse relation between image resolution and SNR, recommendations are required discussing the optimal use of DWI for near complete response cases at time-points toward surgery or when small volume lesions (< 1 cm) at baseline are detected (e.g., by using a different or additional high resolution protocol). The development of new DWI sequences addressing this resolution aspect  and implementation of post-processing (noise filtering, using advanced DWI models/representations with their considerations ) need to be investigated more for these kind of cases. However, it should be noted this would make standardization of DWI for treatment monitoring even more complex.
Analyzing the value of DWI requires measurements coupled to a specific pathological endpoint after NST (pCR/non-pCR). Differences in the histo-pathological analysis (and inter-observer differences in defining the molecular subtype of the diagnostic biopsy ) and pCR-definitions can affect this categorization, which further hampers data pooling. Some authors allowed residual DCIS within the group of pCR; others classified it as non-pCR. Furthermore, Liu et al.  included Miller & Payne grade 4 (> 90% loss of tumor cells) within the pCR group and Kim et al.  labeled those as good responders, whereas others only included grade 5 (no viable tumor cells). Inclusion of DCIS (alone or in combination with grade 4 residual disease) in the pCR group logically leads to different ADC measures than when the pCR group consists of cases without residual DCIS. Noteworthy, while DCIS is not always visible on DWI, because of the spatial resolution, it may still affect ADC-values due to microstructural changes. With the final goal of identifying pCR of the breast after NST in mind, and thereby omitting breast surgery, it seems most appropriate to use a pCR definition of ypT0 (i.e., residual DCIS is not permitted). However, recommendations from the Breast International Group-North American Breast Cancer Group (the BIG-NABCG), on the pathological evaluation of post-NST specimens, still give the option to in- or exclude DCIS from this definition [54, 55]. Aiming at more standardization by making studies more comparable, expert consensus on the most suitable pCR-definition and the definition of radiological complete response on DWI is required.
ADC-values can also vary widely between tumors of different morphological  and molecular subtype . Remarkably, in most studies ADC-values were not differentiated by tumor type. Likewise, differences can occur after treatment due to varying NST-regimes. Only four studies reported (absolute/change in) ADC-values for different cancer subtypes, showing differences in distinguishing pCR/non-pCR cases. In other words, all subtypes will likely have specific cutoff values that will also further differ depending on the NST-type. In line with DCE-MRI  and PET-CT , DWI will likely also have varying diagnostic performance for the response prediction in different subtypes. Partridge et al.  and Yuan et al.  underlined that also the optimal timing of DWI during NST differ for the molecular subtypes and types of NST. Substantial knowledge about the tumor, its initial and long-term reaction to NST (e.g., cell swelling, apoptosis, and inflammation) is required to determine the optimal timing. Therefore, future DWI research should study identical treatment regimen for specific tumors in large study populations.
Based on this review, identifying pCR seems to be more accurate with parameters that measure differences in ADC-value during NST than with measuring an (absolute) ADC at one or several time point(s). This is likely, because the relative changes (partly) compensate for the variability in the acquisition parameters and biological properties of breast cancers. In general, treatment response is represented by an increase in the lesion’s ADC-value, although even this was not apparent in all studies .
Moreover, statistical limitations hamper the potential comparison and pooling of studies. For example, in the QUADAS-2  assessment, risks of bias were observed regarding the research populations. Furthermore, for comparing predictive statistical parameters (PPV/NPV) the prevalence of tumor subtypes needs to be identical. Only a ROC-AUC might give some statistical value to all cases, because it is reported to be prevalence independent . However, as reported in this study, this does not compensate for underlying heterogeneity.
In summary, this review unearths many sources of heterogeneity that are currently present in studies on the use of breast DWI for the prediction of response to NST. This heterogeneity is not limited to acquisitions parameters, but is also caused by large differences in patient populations, biological tumor characteristics, differences in applied therapies, and differences in the used outcome parameters. We acknowledge that besides the factors we specifically addressed even more characteristics in each of these fields could influence DWI measurements. Considering the limited case and study numbers, and all heterogeneity encountered, it would be premature to define the optimal DWI parameters based upon this review. Overall, the level of evidence for response prediction and evaluation using ADC as DWI metric is moderate. However, specific details, such as the influence of the biology of tumors, and the technical aspects of DWI for response prediction only have a low level of evidence . Proper validation aimed at overcoming the translational gaps  and, standardization of the study designs (patient inclusion → analysis), requires substantial consensus efforts that are crucial to accelerate optimization, and potential implementation of quantitative-DWI for NST-monitoring in breast cancer patients.
Finally, besides standardization and validation issues, there are also limited data about the cost-effectiveness of MRI in the NST setting . To get an overall idea of the added value of DWI in this NST setting, also cost-effectiveness needs to be analyzed.
By addressing these issues, this review aims to increase awareness on different sources of variability and supplements the works of EUSOBI , QIBA , Padhani et al.  and O’Connor et al. , to initiate a future consensus for the use of breast DWI in the treatment monitoring setting.
Clinical, technical, and epidemiological heterogeneity was observed in all aspects of studies correlating DWI measurements to pCR/non-pCR.
The observed methodological heterogeneity and the small patient numbers make it currently difficult to assess to what extent DWI–MRI might predict pCR. The preliminary conclusion is that the absolute ADC is not (yet) robust for distinguishing pCR/non-pCR, without considering multiple variables. Therefore, multidisciplinary cooperation/consensus is required, to obtain reliable and reproducible longitudinal DWI measurements for identifying non-pCR/pCR cases in specific and well-defined subgroups of patients.
Availability of data and materials
Apparent diffusion coefficient
- AUC (ROC):
Area under the curve of the receiver operating curve
Breast International Group-North American Breast Cancer Group
Dynamic contrast enhanced
Ductal carcinoma in situ
Distributed diffusion coefficient
European Society of Breast Imaging
- HER 2:
Human epidermal growth factor receptor 2
Intravoxel incoherent motion
Medical Subject Headings
Magnetic resonance imaging
Number of excitations
Negative predictive value
Neoadjuvant systemic therapy
Pathologic complete response
Positive predictive value
Partial volume effect
Quantitative Imaging Biomarkers Alliance
Region of interest
Stretched exponential modeling
Single-shot echo planar imaging
Spronk PER, Volders JH, van den Tol P, Smorenburg CH, Vrancken Peeters M (2019) Breast conserving therapy after neoadjuvant chemotherapy; data from the Dutch Breast Cancer Audit. Eur J Surg Oncol 45(2):110–117. https://doi.org/10.1016/j.ejso.2018.09.027
Haque W, Verma V, Hatch S, Suzanne Klimberg V, Brian Butler E, Teh BS (2018) Response rates and pathologic complete response by breast cancer molecular subtype following neoadjuvant chemotherapy. Breast Cancer Res Treat 170(3):559–567. https://doi.org/10.1007/s10549-018-4801-3
López-Campos F, Martín-Martín M, Fornell-Pérez R et al (2020) Watch and wait approach in rectal cancer: current controversies and future directions. World J Gastroenterol 26(29):4218–4239. https://doi.org/10.3748/wjg.v26.i29.4218
Li H, Yao L, Jin P et al (2018) MRI and PET/CT for evaluation of the pathological response to neoadjuvant chemotherapy in breast cancer: a systematic review and meta-analysis. Breast 40:106–115. https://doi.org/10.1016/j.breast.2018.04.018
Tian F, Shen G, Deng Y, Diao W, Jia Z (2017) The accuracy of (18)F-FDG PET/CT in predicting the pathological response to neoadjuvant chemotherapy in patients with breast cancer: a meta-analysis and systematic review. Eur Radiol 27(11):4786–4796. https://doi.org/10.1007/s00330-017-4831-y
Schaefgen B, Mati M, Sinn HP et al (2016) Can routine imaging after neoadjuvant chemotherapy in breast cancer predict pathologic complete response? Ann Surg Oncol 23(3):789–795. https://doi.org/10.1245/s10434-015-4918-0
Loo CE, Straver ME, Rodenhuis S et al (2011) Magnetic resonance imaging response monitoring of breast cancer during neoadjuvant chemotherapy: relevance of breast cancer subtype. J Clin Oncol 29(6):660–666. https://doi.org/10.1200/JCO.2010.31.1258
Gulani V, Calamante F, Shellock FG, Kanal E, Reeder SB, International Society for Magnetic Resonance in Medicine (2017) Gadolinium deposition in the brain: summary of evidence and recommendations. Lancet Neurol 16(7):564–570. https://doi.org/10.1016/S1474-4422(17)30158-8
Iima M, Honda M, Sigmund EE, Ohno Kishimoto A, Kataoka M, Togashi K (2019) Diffusion MRI of the breast: current status and future directions. J Magn Reson Imaging. https://doi.org/10.1002/jmri.26908
Padhani AR, Liu G, Koh DM et al (2009) Diffusion-weighted magnetic resonance imaging as a cancer biomarker: consensus and recommendations. Neoplasia 11(2):102–125
Pickles MD, Gibbs P, Lowry M, Turnbull LW (2006) Diffusion changes precede size reduction in neoadjuvant treatment of breast cancer. Magn Reson Imaging 24(7):843–847. https://doi.org/10.1016/j.mri.2005.11.005
Chu W, Jin W, Liu D et al (2018) Diffusion-weighted imaging in identifying breast cancer pathological response to neoadjuvant chemotherapy: a meta-analysis. Oncotarget 9(6):7088–7100. https://doi.org/10.18632/oncotarget.23195
Gao W, Guo N, Dong T (2018) Diffusion-weighted imaging in monitoring the pathological response to neoadjuvant chemotherapy in patients with breast cancer: a meta-analysis. World J Surg Oncol 16(1):145. https://doi.org/10.1186/s12957-018-1438-y
Baltzer P, Mann RM, Iima M et al (2019) Diffusion-weighted imaging of the breast-a consensus and mission statement from the EUSOBI International Breast Diffusion-Weighted Imaging working group. Eur Radiol. https://doi.org/10.1007/s00330-019-06510-3
Whiting PF, Rutjes AW, Westwood ME et al (2011) QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 155(8):529–536. https://doi.org/10.7326/0003-4819-155-8-201110180-00009
Liu S, Ren R, Chen Z et al (2015) Diffusion-weighted imaging in assessing pathological response of tumor in breast cancer subtype to neoadjuvant chemotherapy. J Magn Reson Imaging 42(3):779–787. https://doi.org/10.1002/jmri.24843
Bufi E, Belli P, Costantini M et al (2015) Role of the apparent diffusion coefficient in the prediction of response to neoadjuvant chemotherapy in patients with locally advanced breast cancer. Clin Breast Cancer 15(5):370–380. https://doi.org/10.1016/j.clbc.2015.02.002
Pereira NP, Curi C, Osorio C et al (2019) Diffusion-weighted magnetic resonance imaging of patients with breast cancer following neoadjuvant chemotherapy provides early prediction of pathological response—a prospective study. Sci Rep 9(1):16372. https://doi.org/10.1038/s41598-019-52785-3
Che S, Zhao X, Ou Y et al (2016) Role of the intravoxel incoherent motion diffusion weighted imaging in the pre-treatment prediction and early response monitoring to neoadjuvant chemotherapy in locally advanced breast cancer. Medicine (Baltimore) 95(4):e2420. https://doi.org/10.1097/MD.0000000000002420
Bedair R, Priest AN, Patterson AJ et al (2017) Assessment of early treatment response to neoadjuvant chemotherapy in breast cancer using non-mono-exponential diffusion models: a feasibility study comparing the baseline and mid-treatment MRI examinations. Eur Radiol 27(7):2726–2736. https://doi.org/10.1007/s00330-016-4630-x
Gallivanone F, Panzeri MM, Canevari C et al (2017) Biomarkers from in vivo molecular imaging of breast cancer: pretreatment (18)F-FDG PET predicts patient prognosis, and pretreatment DWI-MR predicts response to neoadjuvant chemotherapy. MAGMA 30(4):359–373. https://doi.org/10.1007/s10334-017-0610-7
Yuan L, Li JJ, Li CQ et al (2018) Diffusion-weighted MR imaging of locally advanced breast carcinoma: the optimal time window of predicting the early response to neoadjuvant chemotherapy. Cancer Imaging 18(1):38. https://doi.org/10.1186/s40644-018-0173-5
Partridge SC, Zhang Z, Newitt DC et al (2018) Diffusion-weighted MRI findings predict pathologic response in neoadjuvant treatment of breast cancer: the ACRIN 6698 multicenter trial. Radiology 289(3):618–627. https://doi.org/10.1148/radiol.2018180273
Zhang D, Zhang Q, Suo S et al (2018) Apparent diffusion coefficient measurement in luminal breast cancer: will tumour shrinkage patterns affect its efficacy of evaluating the pathological response? Clin Radiol 73(10):909.e907–909.e914. https://doi.org/10.1016/j.crad.2018.05.026
Ramirez-Galvan YA, Cardona-Huerta S, Elizondo-Riojas G, Alvarez-Villalobos NA (2018) Apparent diffusion coefficient value to evaluate tumor response after neoadjuvant chemotherapy in patients with breast cancer. Acad Radiol 25(2):179–187. https://doi.org/10.1016/j.acra.2017.08.009
Shin HJ, Baek HM, Ahn JH et al (2012) Prediction of pathologic response to neoadjuvant chemotherapy in patients with breast cancer using diffusion-weighted imaging and MRS. NMR Biomed 25(12):1349–1359. https://doi.org/10.1002/nbm.2807
Santamaria G, Bargallo X, Fernandez PL, Farrus B, Caparros X, Velasco M (2017) Neoadjuvant systemic therapy in breast cancer: association of contrast-enhanced MR imaging findings, diffusion-weighted imaging findings, and tumor subtype with tumor response. Radiology 283(3):663–672. https://doi.org/10.1148/radiol.2016160176
Shukla-Dave A, Obuchowski NA, Chenevert TL et al (2018) Quantitative imaging biomarkers alliance (QIBA) recommendations for improved precision of DWI and DCE-MRI derived biomarkers in multicenter oncology trials. J Magn Reson Imaging. https://doi.org/10.1002/jmri.26518
QIBA Diffusion-Weighted Imaging MR Biomarker Committee Diffusion-Weighted Magnetic Resonance Imaging. Quantitative Imaging Biomarkers Alliance. http://qibawiki.rsna.org/index.php/Profiles.
Wu W, Miller KL (2017) Image formation in diffusion MRI: a review of recent technical developments. J Magn Reson Imaging 46(3):646–662. https://doi.org/10.1002/jmri.25664
Arlinghaus LR, Welch EB, Chakravarthy AB et al (2011) Motion correction in diffusion-weighted MRI of the breast at 3 T. J Magn Reson Imaging 33(5):1063–1070. https://doi.org/10.1002/jmri.22562
Hancu I, Lee SK, Hulsey K et al (2017) Distortion correction in diffusion-weighted imaging of the breast: performance assessment of prospective, retrospective, and combined (prospective + retrospective) approaches. Magn Reson Med 78(1):247–253. https://doi.org/10.1002/mrm.26328
Le Bihan D, Poupon C, Amadon A, Lethimonnier F (2006) Artifacts and pitfalls in diffusion MRI. J Magn Reson Imaging 24(3):478–488. https://doi.org/10.1002/jmri.20683
Fedeli L, Belli G, Ciccarone A et al (2018) Dependence of apparent diffusion coefficient measurement on diffusion gradient direction and spatial position—a quality assurance intercomparison study of forty-four scanners for quantitative diffusion-weighted imaging. Phys Med 55:135–141. https://doi.org/10.1016/j.ejmp.2018.09.007
Fedeli L, Benelli M, Busoni S et al (2021) On the dependence of quantitative diffusion-weighted imaging on scanner system characteristics and acquisition parameters: a large multicenter and multiparametric phantom study with unsupervised clustering analysis. Phys Med 85:98–106. https://doi.org/10.1016/j.ejmp.2021.04.020
Giannelli M, Sghedoni R, Iacconi C et al (2014) MR scanner systems should be adequately characterized in diffusion-MRI of the breast. PLoS One 9(1):e86280. https://doi.org/10.1371/journal.pone.0086280
Newitt DC, Zhang Z, Gibbs JE et al (2019) Test-retest repeatability and reproducibility of ADC measures by breast DWI: results from the ACRIN 6698 trial. J Magn Reson Imaging 49(6):1617–1628. https://doi.org/10.1002/jmri.26539
Peters NH, Vincken KL, van den Bosch MA, Luijten PR, Mali WP, Bartels LW (2010) Quantitative diffusion weighted imaging for differentiation of benign and malignant breast lesions: the influence of the choice of b-values. J Magn Reson Imaging 31(5):1100–1105. https://doi.org/10.1002/jmri.22152
Zeilinger MG, Lell M, Baltzer PA, Dorfler A, Uder M, Dietzel M (2017) Impact of post-processing methods on apparent diffusion coefficient values. Eur Radiol 27(3):946–955. https://doi.org/10.1007/s00330-016-4403-6
Tozaki M, Oyama Y, Fukuma E (2010) Preliminary study of early response to neoadjuvant chemotherapy after the first cycle in breast cancer: comparison of 1H magnetic resonance spectroscopy with diffusion magnetic resonance imaging. Jpn J Radiol 28(2):101–109. https://doi.org/10.1007/s11604-009-0391-7
Keenan KE, Peskin AP, Wilmes LJ et al (2016) Variability and bias assessment in breast ADC measurement across multiple systems. J Magn Reson Imaging 44(4):846–855. https://doi.org/10.1002/jmri.25237
Newitt DC, Tan ET, Wilmes LJ et al (2015) Gradient nonlinearity correction to improve apparent diffusion coefficient accuracy and standardization in the American College of Radiology Imaging Network 6698 breast cancer trial. J Magn Reson Imaging 42(4):908–919. https://doi.org/10.1002/jmri.24883
Keenan KE, Biller JR, Delfino JG et al (2019) Recommendations towards standards for quantitative MRI (qMRI) and outstanding needs. J Magn Reson Imaging 49(7):e26–e39. https://doi.org/10.1002/jmri.26598
Li X, Abramson RG, Arlinghaus LR et al (2015) Multiparametric magnetic resonance imaging for predicting pathological response after the first cycle of neoadjuvant chemotherapy in breast cancer. Invest Radiol 50(4):195–204. https://doi.org/10.1097/RLI.0000000000000100
Leibfarth S, Winter RM, Lyng H, Zips D, Thorwarth D (2018) Potentials and challenges of diffusion-weighted magnetic resonance imaging in radiotherapy. Clin Transl Radiat Oncol 13:29–37. https://doi.org/10.1016/j.ctro.2018.09.002
Bickel H, Pinker K, Polanec S et al (2017) Diffusion-weighted imaging of breast lesions: region-of-interest placement and different ADC parameters influence apparent diffusion coefficient values. Eur Radiol 27(5):1883–1892. https://doi.org/10.1007/s00330-016-4564-3
Arponen O, Sudah M, Masarwah A et al (2015) Diffusion-weighted imaging in 3.0 T breast MRI: diagnostic performance and tumor characterization using small subregions versus whole tumor regions of interest. PLoS One 10(10):e0138702. https://doi.org/10.1371/journal.pone.0138702
Kim TH, Kang DK, Yim H, Jung YS, Kim KS, Kang SY (2012) Magnetic resonance imaging patterns of tumor regression after neoadjuvant chemotherapy in breast cancer patients: correlation with pathological response grading system based on tumor cellularity. J Comput Assist Tomogr 36(2):200–206. https://doi.org/10.1097/RCT.0b013e318246abf3
Wielema M, Dorrius MD, Pijnappel RM et al (2020) Diagnostic performance of breast tumor tissue selection in diffusion weighted imaging: a systematic review and meta-analysis. PLoS One 15(5):e0232856. https://doi.org/10.1371/journal.pone.0232856
McKay JA, Church AL, Rubin N et al (2020) A comparison of methods for high-spatial-resolution diffusion-weighted imaging in breast MRI. Radiology 297(2):304–312. https://doi.org/10.1148/radiol.2020200221
Novikov DS, Kiselev VG, Jespersen SN (2018) On modeling. Magn Reson Med 79(6):3172–3193. https://doi.org/10.1002/mrm.27101
Orlando L, Viale G, Bria E et al (2016) Discordance in pathology report after central pathology review: implications for breast cancer adjuvant treatment. Breast 30:151–155. https://doi.org/10.1016/j.breast.2016.09.015
Kim Y, Kim SH, Lee HW et al (2018) Intravoxel incoherent motion diffusion-weighted MRI for predicting response to neoadjuvant chemotherapy in breast cancer. Magn Reson Imaging 48:27–33. https://doi.org/10.1016/j.mri.2017.12.018
Bossuyt V, Provenzano E, Symmans WF et al (2015) Recommendations for standardized pathological characterization of residual disease for neoadjuvant clinical trials of breast cancer by the BIG-NABCG collaboration. Ann Oncol 26(7):1280–1291. https://doi.org/10.1093/annonc/mdv161
Provenzano E, Bossuyt V, Viale G et al (2015) Standardization of pathologic evaluation and reporting of postneoadjuvant specimens in clinical trials of breast cancer: recommendations from an international working group. Mod Pathol 28(9):1185–1201. https://doi.org/10.1038/modpathol.2015.74
Durando M, Gennaro L, Cho GY et al (2016) Quantitative apparent diffusion coefficient measurement obtained by 3.0 T MRI as a potential noninvasive marker of tumor aggressiveness in breast cancer. Eur J Radiol 85(9):1651–1658. https://doi.org/10.1016/j.ejrad.2016.06.019
Martincich L, Deantoni V, Bertotto I et al (2012) Correlations between diffusion-weighted imaging and breast cancer biomarkers. Eur Radiol 22(7):1519–1528. https://doi.org/10.1007/s00330-012-2403-8
Avril S, Muzic RF Jr, Plecha D, Traughber BJ, Vinayak S, Avril N (2016) 18F-FDG PET/CT for monitoring of treatment response in breast cancer. J Nucl Med 57(Suppl 1):34S-39S. https://doi.org/10.2967/jnumed.115.157875
Minarikova L, Bogner W, Pinker K et al (2017) Investigating the prediction value of multiparametric magnetic resonance imaging at 3 T in response to neoadjuvant chemotherapy in breast cancer. Eur Radiol 27(5):1901–1911. https://doi.org/10.1007/s00330-016-4565-2
Halligan S, Altman DG, Mallett S (2015) Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: a discussion and proposal for an alternative approach. Eur Radiol 25(4):932–939. https://doi.org/10.1007/s00330-014-3487-0
Martí-Bonmatí L (2021) Evidence levels in radiology: the insights into imaging approach. Insights Imaging 12(1):45. https://doi.org/10.1186/s13244-021-00995-7
O’Connor JP, Aboagye EO, Adams JE et al (2017) Imaging biomarker roadmap for cancer studies. Nat Rev Clin Oncol 14(3):169–186. https://doi.org/10.1038/nrclinonc.2016.162
Miquel-Cases A, Steuten LM, Rigter LS, van Harten WH (2016) Cost-effectiveness and resource use of implementing MRI-guided NACT in ER-positive/HER2-negative breast cancers in The Netherlands. BMC Cancer 16(1):712. https://doi.org/10.1186/s12885-016-2653-y
Woodhams R, Kakita S, Hata H et al (2010) Identification of residual breast carcinoma following neoadjuvant chemotherapy: diffusion-weighted imaging—comparison with contrast-enhanced MR imaging and pathologic findings. Radiology 254(2):357–366. https://doi.org/10.1148/radiol.2542090405
Fangberget A, Nilsen LB, Hole KH et al (2011) Neoadjuvant chemotherapy in breast cancer-response evaluation and prediction of response to treatment using dynamic contrast-enhanced and diffusion-weighted MR imaging. Eur Radiol 21(6):1188–1199. https://doi.org/10.1007/s00330-010-2020-3
Fujimoto H, Kazama T, Nagashima T et al (2014) Diffusion-weighted imaging reflects pathological therapeutic response and relapse in breast cancer. Breast Cancer 21(6):724–731. https://doi.org/10.1007/s12282-013-0449-3
Hahn SY, Ko EY, Han BK, Shin JH, Ko ES (2014) Role of diffusion-weighted imaging as an adjunct to contrast-enhanced breast MRI in evaluating residual breast cancer following neoadjuvant chemotherapy. Eur J Radiol 83(2):283–288. https://doi.org/10.1016/j.ejrad.2013.10.023
The authors state that this work has not received any funding.
Ethics approval and consent to participate
Consent for publication
The authors of this manuscript declare relationships with the following companies: Ritse M. Mann, MD, PhD (research agreements: Siemens Healthineers, Bayer Healthcare, ScreenPoint Medical, Seno Medical, Koning, Medtronic, BD/Bard). All remaining authors declare no competing interest. Ritse M. Mann and Regina Beets-Tan are members of the Insights into Imaging Advisory Editorial Board. They have not taken part in the review or selection process of this article.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
van der Hoogt, K.J.J., Schipper, R.J., Winter-Warnars, G.A. et al. Factors affecting the value of diffusion-weighted imaging for identifying breast cancer patients with pathological complete response on neoadjuvant systemic therapy: a systematic review. Insights Imaging 12, 187 (2021). https://doi.org/10.1186/s13244-021-01123-1
- Breast cancer