Radiomic features of breast parenchyma: assessing differences between FOR PROCESSING and FOR PRESENTATION digital mammography

Objective To assess the similarity and differences of radiomics features on full field digital mammography (FFDM) in FOR PROCESSING and FOR PRESENTATION data. Methods 165 consecutive women who underwent FFDM were included. Breasts have been segmented into “dense” and “non-dense” area using the software LIBRA. Segmentation of both FOR PROCESSING and FOR PRESENTATION images have been evaluated by Bland–Altman, Dice index and Cohen’s kappa analysis. 74 textural features were computed: 18 features of First Order (FO), 24 features of Gray Level Co-occurrence Matrix (GLCM), 16 features of Gray Level Run Length Matrix (GLRLM) and 16 features of Gray Level Size Zone Matrix (GLSZM). Paired Wilcoxon test, Spearman’s rank correlation, intraclass correlation and canonical correlation have been used. Bilateral symmetry and percent density (PD) were also evaluated. Results Segmentation from FOR PROCESSING and FOR PRESENTATION gave very different results. Bilateral symmetry was higher when evaluated on features computed using FOR PROCESSING images. All features showed a positive Spearman’s correlation coefficient and many FOR-PROCESSING features were moderately or strongly correlated to their corresponding FOR-PRESENTATION counterpart. As regards the correlation analysis between PD and textural features from FOR-PRESENTATION a moderate correlation was obtained only for Gray Level Non Uniformity from GLRLM both on “dense” and “non dense” area; as regards correlation between PD and features from FOR-PROCESSING a moderate correlation was observed only for Maximal Correlation Coefficient from GLCM both on “dense” and “non dense” area. Conclusions Texture features from FOR PROCESSING mammograms seem to be most suitable for assessing breast density.


Introduction
Female breast cancer has now surpassed lung cancer as the leading cause of global cancer incidence in 2020, with an estimated 2.3 million new cases, representing 11.7% of all cancer cases. It is the fifth leading cause of cancer mortality worldwide, with 685,000 deaths [1].
The elevated incidence rates reflect a longstanding higher prevalence of reproductive and hormonal risk factors (early age at menarche, later age at menopause, advanced age at first birth, fewer number of kids, less breastfeeding, menopausal hormone therapy, oral contraceptives) and lifestyle risk factors (alcohol intake, excess body weight, physical inactivity), also increased detection through organized or opportunistic mammographic screening [2]. An exceptionally high prevalence of mutations in high-penetrance genes, such as BRCA1 and BRCA2, in part accounts for the high incidence in Israel and in certain European subpopulations. However, breast cancer mortality has declined over the years due to multiple factors, including more sensitive screening techniques and improved treatment regimen [3].
In the last decade there has been growing consensus regarding the role of breast parenchyma as an independent risk factor for breast cancer [4][5][6]: consequently, a number of approaches to breast parenchyma assessment have been proposed, among which radiomic texture feature extraction is the most spread [7][8][9]. Radiomics is an emerging field and has a keen interest, especially in the oncology field [10][11][12]: it has been shown that radiomics could be predictive of TNM grade, histological grade, response to therapy and survival in various tumors [13][14][15]. Textural radiomic features of breast parenchyma have been shown to be useful for cancer classification, too [16].
Radiomics features, when associated with other important information and correlated with outcomes, can provide accurate and robust evidence-based clinical-decision support systems (CDSS). The main challenge is the optimal gathering and integration of multimodal data sources in a quantitative manner capable to deliver unambiguous clinical information that accurately and robustly enable outcome prediction as a function of the necessary decisions [17][18][19]. The central hypothesis of radiomics is that the quantitative individual voxel-based variables are more sensitively associated with various clinical end points compared with the more qualitative radiologic, histopathologic, and clinical data more commonly used today [17][18][19].
Digital processing of full field digital mammography (FFDM) has enormously increased the possibility to objectively assess textural properties of breast images. Full field digital mammography can be stored as FOR PROCESSING (original or raw images) or FOR PRESENTATION (processed images, usually via proprietary, not publicly available software). Often, in routine clinical environment only FOR PRESENTATION images are available. However, although the latter emphasize certain characteristics of the image useful for masses and calcifications detection, they might not fully retain the original information contained in the FOR PRO-CESSING image, potentially useful for parenchyma characterization.
Previous studies [7][8][9] have evaluated a number of features for breast parenchyma assessment. However, a few recent changes in the field require further deeper analysis. In particular, recently, texture features have been standardized by the Image Biomarker Standardization Initiative (IBSI) [18]. It is important therefore to perform a comprehensive evaluation of differences between FOR PROCESSING and FOR PRESENTATION using the standardized features which include several additive texture features with respect to Gastounioti et al. [7]. Moreover, in Gastounioti et al. [7], texture features have been computed using a 'lattice' approach for characterization of the whole breast: however, the lattice has been summarized by an overall averaging: while that approach is directed towards taking approximately into account feature variability across the breast, it does not give precise information about the dense/non-dense areas of the breast. A third point is that previous studies assessed only two mammographic equipment (Siemens and Hologic) [7][8][9]: it is of course of interest to test whether results can be extended to other manufacturers.
The objective of our study was to assess the similarity and differences of radiomics features on FFDM in FOR PROCESSING and FOR PRESENTATION. Expanding previous studies, we addressed the problem using an enlarged set of texture radiomic features, dense/nondense areas comparison and a new manufacturer; appropriate statistical analysis has been used.

Study population
Study population included 165 women who underwent mammography at the Breast Unit of the University Hospital "Luigi Vanvitelli" in Naples, Italy, from June 2020 to November 2020. The study was approved by local ethical committee and each patients enrolled have signed the informed consensus. Patients' characteristics have been summarized in Table 1. Breast density of the sample has been assessed by two expert radiologists in consensus (G.G., M.P.B.) according to BI-RADS 5th edition published in 2013 [20]. It should be underlined that according to [20] "if the breasts are not of apparently equal density, the denser breast should be used to categorize breast density". Therefore, only one category per each woman was available.

Equipment and images
Women have been imaged according to current guidelines consisting of Full Filed Digital mammography (FFDM) in both mediolateral oblique (MLO) and craniocaudal views (CC) using the system Giotto Class produced by IMS GIOTTO S.p.A. (Sasso Marconi-Bologna Italy). The specific operating conditions of mammographic image acquisition have been summarized in Table 2. Specifically, we highlight that the mammography was equipped with a tungsten anode. Tungsten anode has been shown to reduce administered dose while preserving image quality [21,22]. For this work only MLO images have been considered because of the larger presence of breast parenchyma on this kind of projection: a total of 330 images (left/right) have been used.

Breast segmentation
Breasts have been segmented into "dense" area (roughly corresponding to the fibroglandular tissue) and "nondense" area (the remaining part of the breast) using the publicly available softare LIBRA [8,9] available for MAT-LAB (Version: 9.3.0.713579, R2017b. Natick, Massachusetts: The MathWorks Inc.). LIBRA has been specifically developed for breast segmentation, pectoral muscle removal and percent density computation. Both FOR PROCESSING and FOR PRESENTATION images from our dataset have been tested for proper segmentation. Bland-Altman, Dice index and Cohen's kappa analysis ("Statistical analysis" section) has been used to assess differences between the two types of segmentation. Subsequently, radiomic features have been computed both on "dense" and "non-dense" area and on FOR PROCESSING and FOR PRESENTATION images. Percent density from LIBRA has also been computed.
Before LIBRA segmentation, FOR-PROCESSING images underwent minimal pre-processing: logarithm and z-scoring; FOR-PRESENTATION images were subjected only to z-score to align image histogram to FOR-PROCESSING image [7,9].
It should be emphasized that LIBRA has been developed on equipment by two specific manufacturers (Siemens and Hologic). One of the objective of our analysis was to assess whether LIBRA could be used reliably on a different manufacturer (IMS GIOTTO S.p.A.) without any modification.

Radiomic features
Recently, the IBSI [18] has standardized a set of 174 features. Such features have been implemented in PyRadiomics [19] a library available within Python environment [23]. Briefly, IBSI features include texture and morphological features. In this study we considered only textural features. In fact, it has been suggested in literature that   Table 3 for a list of all features. A detailed description of each textural feature is reported in the website https:// pyrad iomics. readt hedocs. io/ en/ latest/ featu res. html. Features have been computed both on dense and non-dense breast areas (see Fig. 1).

Statistical analysis
Our analysis had the objective to assess differences between features computed from FOR PROCESSING and from FOR PRESENTATION images on both 'dense' and 'non-dense' areas of the breast.
First, we assessed differences in LIBRA breast area (dense or non-dense) segmentation using Bland-Altman, Dice index and Cohen's kappa analysis [24]: while the first is mainly a graphical approach and has been performed on the area expressed in cm 2 , the other two give an agreement measure (Dice is between 0 and 1, while kappa is with − 1, 1) between the two segmentations. Bilateral symmetry (correspondence in breast area and percent density between left/right breast) was also used to evaluate goodness of segmentation. The objective of this analysis was to verify that LIBRA processing was sufficiently accurate for the equipment from IMS GIOTTO S.p.A., as this equipment has not been tested previously for LIBRA.
Second, for each feature, Wilcoxon paired test has been applied between FOR-PROCESSING versus FO-PRES-ENTATION. As a further measurement, Spearman's rank correlation coefficient has been evaluated. Canonical correlation analysis has been used to assess the correlation of linear combination of dense/non-dense features between FOR PROCESSING and FOR PRESENTATION [25].
Third, percent density (PD) correlation with each feature has been assessed via Spearman's coefficient. Finally, for each feature bilateral symmetry (correspondence between the two breasts of the same woman) has been assessed using intraclass correlation coefficient (ICC) [7].
Dependence of correlation from equipment and women factors such as kVp, mAs, body part thickness, body mass index (BMI), age, menopause has been assessed via linear mixed effect models [26].

Segmentation assessment
In Fig. 1 we reported an exemplificative case of breast area (whole breast without pectoral muscle, dense area, non-dense area) segmentation: FOR PROCESSING and FOR PRESENTATION images gave very different results. This can be further appreciated in Fig. 2a, b reporting the Bland-Altman analysis of the whole breast and dense area. Dice index and Cohen's kappa applied to the whole breast gave an average agreement of 0.97 ± 0.02 and 0.96 ± 0.03 respectively.
As regards dense and non-dense area, as can be seen in Figs. 1 and 2a, often dense area segmented on FOR PROCESSING was very small with respect to the FOR PROCESSING counterpart. In this case it was not possible to use Dice index or Cohen's kappa and bilateral symmetry for breast and dense areas have been evaluated: results have been reported in Fig. 3 showing that bilateral symmetry was higher when using FOR PROCESSING images.
Recognizing these limitations and such large differences between breast areas, for subsequent feature computation we decided to use only the segmentation of dense and non-dense areas from FOR PROCESSING images.     (Table 3, PD FOR PROC no-dense column), a moderate correlation was obtained for MCC of GLCM and Gray Level Non Uniformity obtained of GLSZM and of GLRLM (only MCC of GLCM was included among the 14 features with moderate correlation on "dense" area).

Other findings
In Fig. 4 was reported the bilateral symmetry (intra-class correlation coefficient between left/right breast) per each feature on dense and non-dense areas. It can be seen that bilateral symmetry is higher when feature are computed on FOR PROCESSING images. In Fig. 5, percent density (PD) association with BI-RADS assigned by radiologists was reported. Kruskal-Wallis test was significant (p < 0.001). Multiple comparison test (Tukey HSD) indicates that BI-RADS density A is not significantly different from BI-RADS density B (p > 0.05).
No significant dependence of the correlation from equipment and women factors such as kVp, mAs, part thickness, BMI, age, menopause assessed via linear mixed effect models was found. Weak correlations were observed between equipment variables (PD, BMI, Age) and patient features (BPT, KVP, ED) (Fig. 6).

Discussion
In the last two decades FFDM has replaced screen film mammography (SFM) in breast cancer screening [27][28][29]. FFDM image acquisition initially generates an image which is proportional to the X-ray attenuation through the breast, known as the raw image (i.e., FOR PROCESS-ING; often with a 14-bit gray-level depth). Then, vendor specific post-processing algorithms are applied to increase lesion conspicuity before radiological presentation, creating what is known as the processed image (i.e., FOR PRESENTATION; often with a 12-bit gray-level depth). It seems reasonable to assume that breast parenchyma analysis should be performed directly from raw images since they retain the original relationship with the physical properties of the breast tissue [5][6][7][8][9].
In this study we assessed differences between texture features computed on automatically segmented dense (manly fibro-glandular) and non-dense (mainly fat) area within the breast both on FOR PROCESSING and on FOR PRESENTATION data. Our findings can be resumed as follows. Mainly, all features showed a positive Spearman's correlation coefficient and many feature of FOR-PROCESSING were moderately or strongly correlated to their corresponding FOR-PRESENTATION counterpart; nonetheless, Wilcoxon test suggested differences for most of the features except for ID, IDM, Inverse Variance, Maximum Probability of GLCM and Small Area Low Gray Level Emphasis, Long Run High Gray Level Emphasis, Long Run Low Gray Level Emphasis of GLSZM.
Moreover, our results showed that the segmentation from FOR PROCESSING and FOR PRESENTATION might give very different results: the breast area segmented from the FOR PRESENTATION images is different because the pectoral muscle has not been properly removed. Moreover, often the dense area is really very small when segmented on FOR PRESENTATION: this might cause loss of potentially important texture information. In addition, the bilateral symmetry was higher when using features computed form FOR PROCESSING images.
As regards, the correlation analysis between PD and textural features on FOR-PRESENTATION a moderate correlation was obtained only for Gray Level Non Uniformity of GLRLM both on "dense" and "non dense" area. On the other side, considering the correlation analysis between PD and textural features on FOR-PROCESSING a moderate correlation was observed only for MCC of GLCM both on "dense" and "non dense" area.
Our results are in line with the findings in [7]; however a number of differences with that study must be highlighted. First, in [7] a limited number of features (28) has been investigated; however, thanks to the effort of IBSI [18] it is today possible to examine a very large number of features. In our study we used 74 features (computed on the original image without wavelet transform) subdivided into four main groups. Moreover, the group of GLSZM has not been investigated at all in [7].
Second, in [7] a "lattice" approach has been used to compute features, however an averaging over the lattice has been made to resume the behavior of the breast; in our study, instead, we segmented the breast into two main regions "dense" and "non-dense" and can take values from + 1 to − 1. A rs of + 1 indicates a perfect association of ranks, a rs of zero indicates no association between ranks and a rs of − 1 indicates a perfect negative association of ranks. The closer rs is to zero, the weaker the association between the ranks. PD percent density (%), BMI; BPT body part thickness (mm), KVP kVp; ED entrance dose (mGy). In small font (red in color version) not significant correlations (p > 0.05)