In this study, reliability of visual assessment of breast density for both experienced and inexperienced readers was evaluated, as compared to the semi-automated assessment of breast density using a dedicated software program. Our results showed that there was disagreement between the quantitative BI-RADS categorisation of the experienced and inexperienced readers. When compared to the semi-automated analysis, the experienced reader agreed with the quantitative BI-RADS classification in 58.5% of the cases. The classification was overestimated in 35.5% of the cases, and underestimated in 6.0% of the cases. Results of the inexperienced reader were less accurate. Furthermore, the semi-automated assessment of breast density showed good intra- and interobserver reproducibility.
Breast density is an important risk factor for breast cancer development, independent of other breast cancer risk factors . Also, breast cancer is more difficult to detect in mammographically dense breasts [5, 6]. In our institution, mammograms are evaluated by radiologists and/or (supervised) residents, using the BI-RADS classification for breast density. However, our study results showed disagreement between radiologist (experienced reader) and resident (inexperienced reader) and that breast density is frequently overrated (even by a highly experienced reader). These findings are in line with previously published results . In the majority of cases, the overestimation was only one BI-RADS category (data not shown). Although this may seem a negligible overestimation, a speculative (but nonetheless plausible) assumption is that overrating breast density might lead to more imaging [e.g. additional mammographic projections, ultrasound, or contrast-enhanced magnetic resonance (MR) mammography], more costs, and more patient anxiety.
Due to the improvements in MR mammography, it is worth considering which patients are at increased risk of developing breast cancer (i.e. patients with mammographically dense breasts) and who might benefit from a shorter screening interval or additional MR mammography . Despite the fact that this information is enclosed within the images, it is not used in current clinical settings or screening to identify high-risk patients, since the (visual) BI-RADS density classification is not suitable for the expression of breast cancer risk. Based on our current observations (which show a substantial disagreement between the visual and semi-automated assessment of breast density), we would prefer a (workstation) integrated (semi-)automated analysis of breast density to identify patients at high risk for developing breast cancer or in whom breast cancer is likely to be missed.
For this purpose, several (semi-)automated systems to assess breast density have been proposed in the past . Of these, the so-called thresholding approach using (commercially available) software packages has been extensively studied in the past and is therefore frequently used in quantitative assessment of breast density . Using this thresholding technique, Boyd et al. showed that women with mammographic breast density ≥75% had an increased risk of developing breast cancer when compared to women with breast density less than 10% (odds ratio 4.7, 95% confidence interval 3.0–7.4) . We also used a thresholding approach using the Leica Qwin software package and showed similar results in breast densities of both breasts and in both projections . In addition, our software demonstrated a good intra-observer agreement (Fig. 3). A major advantage of our semi-automated analysis is that even analysts with hardly any experience using our software can achieve good reproducibility (Fig. 4).
Although the thresholding approach is promising, it has several disadvantages. The assessment of the mammographic breast density using this technique is time and labour intensive. For example, our time for the assessment of breast density in a single patient study was estimated to be 5–8 min. Recently, Stone et al. demonstrated that assessing breast density of one breast in one projection is sufficient . This suggestion is supported by the findings of our current study and enables the inclusion of women that have undergone unilateral breast surgery of any kind. Furthermore, the thresholding approach requires proper operator-training to use the software, although our current results (based on our software program) suggest otherwise. In screening or clinical settings, this software needs to be integrated in the mammographic work stations. Because of these disadvantages, mammographic breast density is usually assessed visually, especially in screening/clinical settings and large studies in which a great number of mammograms need to be evaluated .
Previously, Martin et al. performed a similar study using a fully automated software package . In this study mammograms of 65 women were analysed by seven radiologists. However, on close inspection, five of the radiologists had already interacted with the software programme, leaving two radiologists (albeit experienced in mammography reading) untrained for the breast density analysis. The breast density was overestimated by these two radiologists in 37% of the cases, as compared to 36% overestimation by the experienced radiologist in our study. So although it might be difficult to generalise our study results based on the results of two readers, they are in line with previously published results. In addition, there were other interesting differences in the analyses performed by us and Martin et al. In the study of Martin et al., 6% of the images could not be analysed due to technical difficulties versus no technical difficulties in our study. Furthermore, the 95% limits of agreement between the trained radiologists and the automated analyses of the study of Martin et al. were rather large: −16% to +27%. Visual inspection of the Bland-Altman plots of our study show that our limits of agreement were much smaller (Fig. 2).
Our study included relatively small numbers of dense breasts, i.e. quantitative BI-RADS categories 3 and 4. The study of Martin et al. described similar results and included 15 additional women with dense breasts in their study after their initial inclusion of 50 women. Previously published larger studies show slightly higher numbers of dense breasts in their populations, presumably owing to the larger size of the populations used when compared to our study population . For our study, we opted not to include additional women with dense breasts to prevent inclusion bias.
Our study has several limitations. First, a true golden standard for the assessment of breast density is lacking. There is no accurate way to determine breast density other than histopathologic analysis of mastectomy specimens. It is obvious that these specimens are not available for this study, and previously published studies assessing breast density with various computerised methods also lack this (true) golden standard. As we have shown in this study, our software programme acquired similar results to other, more validated programmes. This is why we have chosen to use our software programme as the reference standard (not golden standard) to which we compared the visual assessment of breast density.
A second limitation of our study is that the association between mammographically dense breasts and risk of developing breast cancer has not yet been demonstrated with our software programme. Due to the comparability of acquired results with more validated software programs, we expect that breast density (as assessed by our software program) is also associated with increased breast cancer risk. However, larger studies using our software programme are in progress to prove this hypothesis.
Finally, our analysis remains a semi-automated (and not fully automated) technique, requiring input from an operator and is therefore at risk of introducing observer-dependent bias. Recently, a commercially available software tool (Quantra, Hologic, Bedford, MA, USA) to automatically assess breast density was compared to the Cumulus software program. This study showed a strong density correlation between both breasts and for both methods, suggesting that fully automated assessment of breast density could also aid in breast cancer risk estimation . This was confirmed in another study by Pinker et al., demonstrating that breast density, as assessed with this automated analysis, was strongly associated with breast cancer risk in women younger than 50 years, but not older than 50 years . In line with these developments, another fully automated software program was launched earlier this year: Volpara (Matakina International, Wellington, New Zealand). However, the major drawback of these recently available programs is their limited availability on only GE or Hologic digital mammography systems.
In summary, our results showed that there is a disagreement in quantitative BI-RADS breast density classification between experienced and inexperienced readers and the semi-automated software analysis. In order to accurately assess breast density in a reproducible and observer-independent manner, we would recommend the use of an integrated software tool, which can be applied in both screening and clinical settings. This semi-automated analysis of breast density might aid in identifying patients at high risk for developing breast cancer and/or patients who can benefit from additional MR mammography because of an unreliable mammogram. Currently, the thresholding approach to assess breast density on standard mammograms is preferable, for instance as assessed by the dedicated software program used in this study or any other commercially available software package. In order to rapidly assess breast density and to include patients who have undergone mastectomy, assessment of a single breast in one projection (preferably the CC projection) is sufficient . This proposal is also supported by our current findings. In turn, future studies could investigate whether patients with mammographically dense breasts (i.e. those at high risk of developing breast cancer) can benefit from this accurate breast density assessment, for instance by shortening the screening interval or by performing additional (MR) imaging.