Development and validation of MRI-based model for the preoperative prediction of macrotrabecular hepatocellular carcinoma subtype

Background Macrotrabecular hepatocellular carcinoma (MTHCC) has a poor prognosis and is difficult to diagnose preoperatively. The purpose is to build and validate MRI-based models to predict the MTHCC subtype. Methods Two hundred eight patients with confirmed HCC were enrolled. Three models (model 1: clinicoradiologic model; model 2: fusion radiomics signature; model 3: combined model 1 and model 2) were built based on their clinical data and MR images to predict MTHCC in training and validation cohorts. The performance of the models was assessed using the area under the curve (AUC). The clinical utility of the models was estimated by decision curve analysis (DCA). A nomogram was constructed, and its calibration was evaluated. Results Model 1 is easier to build than models 2 and 3, with a good AUC of 0.773 (95% CI 0.696–0.838) and 0.801 (95% CI 0.681–0.891) in predicting MTHCC in training and validation cohorts, respectively. It performed slightly superior to model 2 in both training (AUC 0.747; 95% CI 0.689–0.806; p = 0.548) and validation (AUC 0.718; 95% CI 0.618–0.810; p = 0.089) cohorts and was similar to model 3 in the validation (AUC 0.866; 95% CI 0.801–0.928; p = 0.321) but inferior in the training (AUC 0.889; 95% CI 0.851–0.926; p = 0.001) cohorts. The DCA of model 1 had a higher net benefit than the treat-all and treat-none strategy at a threshold probability of 10%. The calibration curves of model 1 closely aligned with the true MTHCC rates in the training (p = 0.355) and validation sets (p = 0.364). Conclusion The clinicoradiologic model has a good performance in diagnosing MTHCC, and it is simpler and easier to implement, making it a valuable tool for pretherapeutic decision-making in patients. Supplementary Information The online version contains supplementary material available at 10.1186/s13244-022-01333-1.


Background
Being the most common primary liver malignancy and second-most deadly cancer [1], the precise preoperative diagnosis of hepatocellular carcinoma (HCC) has been increasingly investigated for proper management and improvement in patients' quality of life. Pretherapeutic identification of the tumor subtype is essential because different HCC subtypes respond differently to therapies due to distinct genetic, clinical, and prognostic characteristics [2,3].
Therefore, an accurate diagnosis of MTHCC can assist in effective clinical decision-making to achieve personalized patient care. Currently, pathologic examination of HCC specimens is the goal standard for diagnosing histologic subtypes. Though the subtype of many HCCs can be identified on biopsy, a routine preoperative biopsy is controversial, and the clinical risk-benefit balance remains unclear [7]. A biopsy is associated with sampling bias from tumor heterogeneity [8] because highly heterogenous lesions are characterized by varying proportions of necrotic and regressive changes, which may make sampled specimens inadequate for pathologic examination and often warrant a repeat biopsy [9]. Consequently, given the risk of the need for a repeat sampling, inadvertent hemorrhage, and tumor embolization [10], a biopsy may be unsuitable for routine preoperative HCC subtype identification. Thus, a noninvasive approach that uses CT or MRI before surgery to identify the MTHCC is crucial in patients' prognostication.
Presently, a few studies have described some MRI features of MTHCC. Mule et al. identified substantial necrosis on MRI as an independent predictor of the MTHCC subtype [11]. Additionally, Cannella et al. identified a larger tumor size and the presence of tumor-in-vein [12], Liang et al. [13] found the absence of enhancing capsule and blood products in the lesion, and Rhee et al. [14] showed arterial phase peritumoral enhancement (corona enhancement), intratumoral arteries, and rough tumor margin to be significantly associated with the MTHCC subtype on MR imaging. However, these studies were based on the qualitative evaluation of MR or CT-based imaging findings without incorporating quantitative texture features.
Radiomics is a noninvasive approach that can quantitatively and more objectively identify the histologic characteristics of a tumor by evaluating the grayscale differences in an image. Thus, radiomics can potentially aid in the preoperative identification of the macrotrabecular hepatocellular subtype. To our knowledge, only one study [15] on the quantitative assessment of HCC texture features for preoperative prediction of the MTHCC subtype has been published-it was limited by a small sample size (32 MTHCC) and the lack of a validation cohort.
Therefore, the aim of this study was to build and validate three models (clinicoradiologic model, fusion radiomics signature, and combined radiomics model) to predict MTHCC noninvasively, assess their clinical utility, and decide on the suitable model that will assist in patient management.

Patients' enrollment
The Medical Ethics Committee of our institution (Xiangya Hospital, Central South University) approved this retrospective study (Approval Number: 2018111101) and waived the requirement for patient consent.
We searched the database of our institution to retrieve the pathologic, radiologic, and clinical data of patients with histologically confirmed hepatocellular carcinoma who had hepatectomy between July 2017 and July 2020. Consecutive patients with pathologically confirmed HCC were identified and enrolled based on the inclusion criteria: (1) histologically confirmed HCC with a detailed pathologic report; (2) MR imaging performed within two months before surgery; and (3) sufficient image quality to allow accurate interpretation of radiologic features. Patients were excluded if they fell within one or more of the following: (1) unavailability of MR imaging or imaging was done in another institution; (2) suboptimal image quality making the evaluation of imaging characteristics difficult; (3) previous intervention therapy (LRT) or systemic chemotherapy in new HCC patients and those with recurrence; and (4) presence of an MTHCC and another subtype in the same patient. A total of 208 patients out of the 644 subjects with HCC were enrolled and randomly split into training and validation cohorts in a ratio of 7:3 using the non-exhaustive tenfold cross-validation (Fig. 1).

Histologic characteristics
The hepatectomy specimens were reviewed independently by two pathologists (D.F. and QQ.H. with 20 and 15 years of experience, respectively) who were blinded to the patients' clinical data. The following histologic characteristics were recorded: tumor differentiation based on Edmondson-Steiner grade (the predominant grade was allocated to lesions showing different grades); histologic subtype including MTHCC (Fig. 2) [3]; microvascular invasion, and satellite nodules. Discrepancies in assessments were resolved by consensus.

MR image acquisition
All patients had MR imaging using a 3.0T MR scanner (Discovery MR750w, GE Healthcare or Siemens Healthcare). The public protocol was adopted, and the imaging parameters are shown in detail in Table 1. One hundred and sixty-one patients had MRI with an extracellular contrast agent: 15 ml of gadodiamide (Omniscan, GE Healthcare) was injected at a rate of 0.2 ml/kg, while 47 had imaging with a hepatobiliary agent: 10 ml of gadoxetic acid disodium (Primovist; Bayer-Global) was injected, followed by 20 ml of 0.9% saline at an injection rate of 1 ml/s or 2 ml/s. Following the pre-contrast images, contrast-enhanced dynamic images and hepatobiliary phase images were acquired. The arterial phase (AP), portal venous phase (PVP), delayed phase or transitional phase (for gadoxetic acid disodium agent), and hepatobiliary phases (HBP) were acquired within 25-30 s, 65-75 s, 130-150 s, and 15-20 min, respectively, after the contrast injection.

Construction of model 1
Univariate analysis was used to compare the differences in clinical factors and qualitative MRI features between MTHCC and non-MTHCC. A p value < 0.05 indicates a significant difference. The significant predictors in the training set were entered into a logistic regression analysis to build a clinicoradiologic model (model 1). The odds ratio (OR) and 95% confidence intervals (CI) for each independent predictor were computed.

Model 2 Tumor segmentation
The ITK-SNAP software (Version 3.8.0) was used to manually segment the entire tumor from T2W, AP, and PVP images as previously reported by studies [11,14] on the imaging differences between MTHCC and non-MTHCC. Two readers (I.B. and C.J., with 4 years of experience in abdominal imaging) performed the segmentation. Only the largest lesion was included in patients with multiple tumors (the lesion with a clearer margin was used in patients with equally sized lesions). All patients with multiple lesions in the MTHCC group have only MTHCCs, while those in the non-MTHCC group have non-MTH-CCs. The whole tumor was segmented slice by slice by outlining the contour 1-2 mm within its boundary and avoiding major vessels and adjacent liver parenchyma. The tumor in the last slices was not included to avoid volume averaging with adjacent structures. Initially, both readers independently segmented 50 randomly selected tumors (including 36 MTHCCs and 14 non-MTHCCs). After that, a reader (I.B.) repeated the same segmentation on the 50 tumors a week later to obtain intra-and inter-rater intra-class correlation coefficients (ICC) as described by [22]. Texture features with ICC > 0.75 were considered to have a good agreement. Reader 1 (I.B.) continued with the remaining image segmentation.

Feature extraction
The reader (I.B.) transferred the ROIs into the radiomics platform of AK software 3.3.0 (Analysis kit, GE Healthcare). The MR images and segmented ROIs were first preprocessed. Tumors and ROIs were compared side by side to ensure all ROIs match exactly with their corresponding tumors. Images were resampled using a 1.0 mm voxel size along the X, Y, and Z coordinates, respectively. A Gaussian filter of 0.5 mm bandwidth was used to filter noise from the images. First-, second-, and higher-order features were extracted. A total of 3111 features-1037 features from each phase-were extracted (Additional file 1: Table S1).
To select the robust radiomics features, the features with outliers (under the first quartile or above the third quartile of the feature distribution) and missing values were replaced by the feature's median value in the dataset. Finally, all T2W, AP, and PVP features were standardized using zero-mean normalization to remove pixels that fall outside a specified range of gray levels.

Construction of model 2
First, the radiomics features with ICC > 0.75 from the T2W, AP, and PVP images in the training cohort were selected to train the predictive model. Subsequently, potential features that were significantly different between the MTHCC and non-MTHCC groups were obtained using the Mann-Whitney U test. Thereafter, the features were entered into the least absolute shrinkage and selection operator (LASSO) regression model to select the most valuable features by tuning the hyperparameter λ with the smallest tenfold cross-validation error in the training set for each set of the images. Model 2 (fusion radiomics signature) was thus built using the combined robust features from T2W, AP, and PVP images.
The radiomics score for each patient was calculated based on the formula in Additional file 1: Appendix E2.

Construction and validation of model 3
Variables from model 1 and model 2 were entered into a logistic regression to form a combined radiomics model (model 3) to predict MTHCC. The workflow of model construction is shown in Fig. 3. The predictive performance of the models (models 1, 2, and 3) in training and validation cohorts was decided by the maximum area under the curve (AUC) and the associated cutoff according to the Youden index.

Estimating the clinical utility of the models
A decision curve analysis (DCA) was used to estimate the clinical utility of the models by calculating the net benefits for a range of threshold probabilities (percentage risk threshold of detecting the subtype). For each decision curve, the net clinical benefit was computed using the following formula [23]: where p t is the threshold probability for detecting a positive patient.
The decision curve plots net clinical benefit (y-axis) against threshold probability (x-axis). The clinical utility of the curve is indicated by the highest net clinical benefit at the lowest threshold probability.

Nomogram construction
A nomogram for the most clinically applicable model was constructed based on the AUC performance and clinical utility at the lowest threshold probability. The process of graphical presentation of the nomogram is described in Additional file 1: Appendix E3.

Statistical analysis
Statistical analyses were performed using IBM SPSS Statistics (version 25, IBM SPSS Inc.) and R statistical software (Version 3.5.1). Univariate analyses were used to compare differences between MTHCC and non-MTHCC patients regarding clinicopathologic characteristics: using chi-square or Fisher's exact tests for categorical variables and Mann-Whitney U test for continuous variables. Continuous variables were  summarized as median and interquartile range (IQR). Cohen's kappa coefficient was used to evaluate the inter-reader agreement for each qualitative MRI feature. Intra-and inter-rater intra-class correlation coefficients (ICC) were obtained to evaluate the reliability of the manual segmentation [24]. The diagnostic performance of the models 1, 2, and 3 in differentiating MTHCC from non-MTHCC was assessed by the AUC (with 95% CI), sensitivity, specificity, and accuracy in training and validation cohorts. A model is considered to have excellent, good, or poor performance when it has an AUC of 0.85-1.0, 0.7-0.85, or < 0.7, respectively [25]. The curves of the models were compared using the Delong test. The receiver operating characteristic (ROC) curves were plotted using MedCalc software (Version 19.5.0). The "glmnet, " "rms, " and "dca" packages were used to perform LASSO regression, nomogram construction, and DCA on R, respectively. A two-tailed p value < 0.05 indicated statistical significance.
Clinical, pathologic, and imaging variables were not significantly different between the training and validation cohorts (p = 0.087-0.97). By univariate analysis, MTHCC significantly differed from non-MTHCC in the levels of serum AFP, tumor grade, tumor capsule, heterogeneity, intratumoral arteries, corona enhancement, and absence of washout (all p < 0.05). Tumor grade and intratumoral arteries were not significantly different in the two groups in the validation cohort (Tables 2 and 3).

Performance of model 1
Higher  Table 4).

Performance of model 2
A total of 1585 texture features with intra-and intercorrelation coefficients > 0.75 (951 features from T2W images, 380 from AP, and 253 from PVP) were selected for further analysis. One hundred and eighty features significantly differed between MTHCC and non-MTHCC with ANOVA (p < 0.05). And 5 optimal features were selected by LASSO, and logistic regression identified four features (AP-derived glcm_Contrast, PVP-based skewness, and T2W images-based gldm_Dependence Variance and glszm_Small AreaEmphasis) associated with MTHCC (Table 5). Radiomics scores based on the above four features are presented in Additional file 1: Appendix E4 and, Figs. S5B and C.

Clinical utility of the models
The decision curve of model 1 yielded a higher net clinical benefit than the treat-all and treat-none strategy at an MTHCC threshold probability of 10% (which is lower than the reported prevalence of MTHCC) (Fig. 5). A nomogram was constructed for model 1 as it had similar predictive performance to model 3 in the validation cohort and is easier to implement in routine clinical practice (Fig. 6a). It is predicted probabilities closely aligned with the true MTHCC rates in both training (p = 0.355) and validation (p = 0.364) cohorts (Fig. 6b, c).

Discussion
MTHCC is an aggressive HCC subtype with poor outcomes. Its identification during patient workup will assist in prognostication and selecting patients who may benefit from more vigorous therapies like radical resection with wide margins or anatomical hepatectomy, as well as strict follow-up schedules. Thus, we developed and validated various predictive models based on MRI to detect the MTHCC subtype preoperatively. The 2018 EASL clinical practice guideline on HCC management suggests that tumor subtyping has no impact on clinical decision-making [1]. But recent HCC studies indicate that different HCC subtypes exhibit distinct biologic behavior, respond differentially to HCCdirected therapies, and consequently show varied clinical outcomes. For example, the latest WHO classification of HCC [26] includes the MTHCC subtype because of its morphologic peculiarity, clinical relevance, and prognostic implications-it is highly aggressive [6,27], with greater metastasis rates [28]; has relatively more advanced Barcelona stage [11]; and is an independent indicator of poor clinical endpoints in patients (early and overall recurrence and overall survival) [3,6,27]. Therefore, pretherapeutic identification of the MTHCC subtype will be clinically crucial to deciding the best treatment approach and providing individualized patient care.
In our study, serum AFP, tumor heterogeneity, corona enhancement, and absence of the characteristic nonperipheral washout were independent predictors of MTHCC. Similar to previous reports, we found higher levels of serum AFP and greater heterogeneity in the MTHCC subtype [11,29]. Heterogeneity reflects tumor aggression and cellular and metabolic alterations [30]. Corona enhancement occurs due to the blockage of venous tributaries around the tumor, leading to compensatory peritumoral arterial flow into the surrounding stroma [31]. Also, 12 of our MTHCC did not demonstrate the characteristic washout in both training and validation sets, which was significantly different from the non-MTHCC group (12 vs 3 in the training set, respectively; p < 0.001). Washout appearance occurs due to rapid drainage of arterially delivered contrast out of the tumor extracellular compartment and a concomitant reduction in the tumor's portal supply [32]. Hypoxia-inducing proteins and other pro-angiogenic factors lead to hemodynamic alterations in the MTHCC [4,33]. Washout disappearance may be due to a reduced outflow of arterially delivered blood or improved portal supply. On univariate analysis, more MTHCCs showed intratumor arteries compared to non-MTHCCs. This is in keeping with findings by Rhee et al., which showed that intratumor arteries are significant predictors of MTHCC [14]. Overall, model 1 showed a good performance in predicting the MTHCC subtype.
Four vital radiomics features (AP-derived glcm_Contrast, PVP-based Skewness, and T2W images-based gldm_DependenceVariance and glszm_ Small AreaEmphasis) associated with MTHCC were obtained from model 2. The MTHCC group demonstrated higher glcm_Contrast, which might result from more intratumor arteries seen in the MTHCCs. Higher skewness occurs when a tumor contains regions of different intensities, implying greater tumor heterogeneity in the MTHCC group. Also, intratumor arteries might skew the lesion's gray-level distribution as areas of contrast-laden blood vessels in a less enhanced tumor background lead to greater skewness [34]. We attribute the T2W-based radiomics features to MTHCC's histologic architecture. The gldm_DependenceVariance might reflect the area corresponding to the thick macrotrabecular architecture, while the glszm_SmallAreaEmphasis represents the surrounding web-like vascular spaces [35]. Based on these independent features, model 2 also had a good performance in predicting the MTHCC subtype; its AUC being slightly inferior to model 1 in both training (AUC 0.747 vs. 0.773; p = 0.548) and validation (AUC 0.718 vs. 0.801; p = 0.089) sets. In a previous study, the fusion radiomics signature and the clinicoradiologic model were performed similarly in both cohorts regarding the preoperative prediction of microvascular invasion [36]. Also, the findings of Ma et al. showed that the HCC radiomics signature performed much lower than the clinical factor model in the validation set (AUC 0.681 vs 0.761), which is in keeping with our findings [37]. This could be because the type of segmentation technique, feature selection and  reduction methods, modeling approach, and ultimately variation in the imaging parameters all influence the robustness, reproducibility, and performance of the predictive radiomics signature [38].
Although model 3 (the combined radiomics model) performed better than models 1 and 2 in the training cohort (AUC 0.889; sensitivity: 0.866; specificity: 0.784), its performance was not significantly higher than model 1 in the validation cohort (AUC 0.866 vs 0.801; p = 0.321). This indicates that model 1 has a similar diagnostic performance comparable to model 3. Likewise, the performance of the clinicoradiologic model and combined radiomics did not differ in the validation cohorts of several other studies on preoperative HCC histologic characterization [36,37,39]. In the validation cohort, Nie et al. [22] reported that the clinicoradiologic and combined radiomics models performed similarly in differentiating focal nodular hyperplasia and HCC (AUC 0.769 vs 0.917; p = 0.376). Another study found that the two models had the same diagnostic performance in predicting microvascular invasion of HCC in the validation cohort (AUC 0.850 vs 0.943, respectively; p = 0.111) [36]. In HCC, building and clinical validation of a radiomics-based model are challenging and currently limited by the need for large datasets, heterogeneity of image acquisition protocols, as well as difficulty in harmonizing study findings [40,41]. Since there was no added diagnostic benefit of using the radiomics signature and combined radiomics model in the validation cohort compared to the more simple and easily implementable clinicoradiologic model, this study demonstrates that the radiomics model is of limited benefit in the clinical management of patients. The clinicoradiologic model is thus more suited for the everyday clinical management of MTHCC patients.
The prevalence of MTHCC in this study is 33%, which falls within the range of 15% and 35% reported in the literature [11,27,29]. According to the clinical utility analysis, model 1 is beneficial in detecting MTHCC at a lower threshold probability of 10%-a threshold lower than the reported prevalence-reiterating its value in assisting clinicians in improving pretherapeutic decision-making. As it is easier and simpler to build than models 2 and 3, it is more suitable to use in routine clinical practice to predict the MTHCC subtype.
This study has some limitations. First, our retrospective study needs to be externally validated by a larger multicenter prospective study. Second, we used only  Fig. 4 Comparison of receiver operating characteristics curves of the three models in training (a) and validation (b) cohorts three phases and did not include HBP because not all patients had gadoxetic acid-enhanced MRI; we also have not included T1W images because some lesions had inconspicuous margins making tumor segmentation difficult. DWI is a powerful MR tool for detecting and characterizing liver lesions. Likewise, we did not include DW images as not all patients underwent DWI during workup; the reliability and stability of HCC radiomics features extracted from DW images are further limited by lower spatial resolution, sensitivity to motion, and varying signal-to-noise ratio with different b values [42,43]. Third, we did not use the LI-RADS categorization in this study. The MTHCC has been associated with a tumor in the vein [12]. Our subsequent study will examine the relationship between the LI-RADS and ADC ratio in the MTHCC subtype, as well as the utility of the ADC as a biomarker for tumor recurrence. Forth, to obtain a reasonably large sample size, the study included patients imaged with two scanners and two different contrast agents, which may inevitably affect the reproducibility of segmentation and feature extraction [44]. Nevertheless, we attempted to achieve reliable segmentation and feature extraction by ensuring a good ICC. Finally, we did not incorporate genomics factors related to MTHCC due to the cost of gene assays. We intend to incorporate radiogenomics and evaluate the impact of different imaging protocols on the robustness of the model in subsequent multicentered series.
In conclusion, three models were developed to predict the MTHCC subtype preoperatively. The clinicoradiologic model (model 1) had a good diagnostic performance in predicting MTHCC in training and validation cohorts. It was slightly superior to the fusion radiomics signature (model 2) in both cohorts and similar to the combined radiomics model (model 3) in the validation cohort. Furthermore, the clinicoradiologic model is easier and simpler to build than the fusion radiomics signature and combined radiomics model in clinical work. Thus, it will be helpful in predicting the MTHCC subtype in routine clinical practice. ; the curve has a higher net benefit than treat-all strategies at a low threshold probability of 10%; (c) the decision curve comparison between models 1 (blue), 2 (red), and 3 (green) for all patients Fig. 6 Development of nomogram (a) and calibration curves (b, c) for model 1. It incorporates alpha-fetoprotein levels (AFP), tumor heterogeneity, non-peripheral washout, and corona enhancement. AFP: 0, 1, and 2 represent serum AFP levels of < 20 ng/ml, 20-400 ng/ml, and > 400 ng/ml, respectively. Tumor heterogeneity: 0 and 1 represent homogenous and heterogenous tumor appearance, respectively. 0 represents absence, and 1 represents the presence of washout and APE, respectively. The calibration curves in training (b) and validation (c) cohorts show the calibration of the nomogram. The diagonal gray dotted line represents the true MTHCC rates, while the blue line demonstrates the predictive performance of the nomogram