Machine learning combined with radiomics and deep learning features extracted from CT images: a novel AI model to distinguish benign from malignant ovarian tumors

Jan, Ya-Ting; Tsai, Pei-Shan; Huang, Wen-Hui; Chou, Ling-Ying; Huang, Shih-Chieh; Wang, Jing-Zhe; Lu, Pei-Hsuan; Lin, Dao-Chen; Yen, Chun-Sheng; Teng, Ju-Ping; Mok, Greta S. P.; Shih, Cheng-Ting; Wu, Tung-Hsin

doi:10.1186/s13244-023-01412-x

Original Article
Open access
Published: 24 April 2023

Machine learning combined with radiomics and deep learning features extracted from CT images: a novel AI model to distinguish benign from malignant ovarian tumors

Ya-Ting Jan^1,2,3,4,
Pei-Shan Tsai^1,2,3,4,
Wen-Hui Huang^1,2,3,4,
Ling-Ying Chou^2,3,4,
Shih-Chieh Huang^2,3,4,
Jing-Zhe Wang^2,3,4,
Pei-Hsuan Lu^2,3,4,
Dao-Chen Lin^5,6,7,
Chun-Sheng Yen¹,
Ju-Ping Teng¹,
Greta S. P. Mok⁸,
Cheng-Ting Shih⁹^na1 &
…
Tung-Hsin Wu ORCID: orcid.org/0000-0002-0583-7921¹^na1

Insights into Imaging volume 14, Article number: 68 (2023) Cite this article

3650 Accesses
10 Citations
4 Altmetric
Metrics details

Abstract

Background

To develop an artificial intelligence (AI) model with radiomics and deep learning (DL) features extracted from CT images to distinguish benign from malignant ovarian tumors.

Methods

We enrolled 149 patients with pathologically confirmed ovarian tumors. A total of 185 tumors were included and divided into training and testing sets in a 7:3 ratio. All tumors were manually segmented from preoperative contrast-enhanced CT images. CT image features were extracted using radiomics and DL. Five models with different combinations of feature sets were built. Benign and malignant tumors were classified using machine learning (ML) classifiers. The model performance was compared with five radiologists on the testing set.

Results

Among the five models, the best performing model is the ensemble model with a combination of radiomics, DL, and clinical feature sets. The model achieved an accuracy of 82%, specificity of 89% and sensitivity of 68%. Compared with junior radiologists averaged results, the model had a higher accuracy (82% vs 66%) and specificity (89% vs 65%) with comparable sensitivity (68% vs 67%). With the assistance of the model, the junior radiologists achieved a higher average accuracy (81% vs 66%), specificity (80% vs 65%), and sensitivity (82% vs 67%), approaching to the performance of senior radiologists.

Conclusions

We developed a CT-based AI model that can differentiate benign and malignant ovarian tumors with high accuracy and specificity. This model significantly improved the performance of less-experienced radiologists in ovarian tumor assessment, and may potentially guide gynecologists to provide better therapeutic strategies for these patients.

Key points

1.
CT-based radiomics and deep learning features could differentiate ovarian tumors.
2.
Radiomics, deep learning features, and clinical data provided complementary tumor information.
3.
The ensemble model improved the radiologists’ performance in assessing ovarian tumors.

Background

Ovarian cancer is the leading cause of gynecological cancer related deaths [1], and a misdiagnosis may delay the treatment and worsen the prognosis. Expedited referral of patients with ovarian cancer to a gynecologic oncologist for complete surgical staging and optimal cytoreduction correlates with better survival rates [2]. In contrast, patients with benign ovarian tumor only need conservative treatment or laparoscopic cystectomy [3]. Therefore, accurate distinction between benign and malignant ovarian tumors is of paramount importance in guiding treatment and it remains a great challenge in clinical practice.

Currently, distinction between benign and malignant ovarian tumors is largely based on imaging appearance [4,5,6]. Ultrasound is typically the first-line screening imaging tool. Due to the excellent spatial resolution and wide availability, computed tomography (CT) is often ordered for further tumor characterization. However, a definitive differentiation between benign and malignant ovarian tumors by CT remains challenging, especially in excluding the possibility of malignancy in multiseptated cystic tumors. Given that benign ovarian tumors greatly outnumber malignant ones, it is not uncommon that patients with tumor of indeterminate image features undergo surgery and the tumors are later proven to be benign. It is estimated that approximately 28% of oophorectomies performed are of benign tumors [7]. These unnecessary surgeries represent a huge clinical concern with long-term consequences of decreased fertility and premature menopause [8, 9]. Therefore, a noninvasive method that can accurately distinguish benign from malignant ovarian tumors to prevent delayed treatment in malignant cases and save patients with benign tumors from unnecessary surgery is of significant clinical impact.

Artificial intelligence (AI) has been shown to improve the performance of tumor detection, tumor classification, and treatment monitoring in cancer imaging [10,11,12,13]. In contrast with subjective radiological imaging evaluation by humans, image feature extraction using radiomics or deep learning (DL) can provide quantified image information undetectable by human eyes and has shown promising results in tumor analysis [14,15,16,17,18,19,20,21,22,23,24,25]. Several recent studies used radiomics on CT images and applied machine learning (ML) classifiers to differentiate ovarian tumors [26,27,28]. However, there is limited research on applying DL to differentiate ovarian tumor using CT images. Christiansen et al. [29] and Wang et al. [30] applied DL for ovarian tumor differentiation using ultrasound and magnetic resonance imaging (MRI) respectively. In addition to studies that directly applied DL networks for ovarian tumor differentiation, there were few studies using DL networks for feature extraction from CT images to predict ovarian cancer recurrence or classify pulmonary nodule subtypes [24, 25]. To our best knowledge, the performance of applying ML based on combined radiomics and DL features extracted from CT images on differentiating ovarian tumors remains unknown.

In this study, we aimed to develop a CT-based AI model with feature extraction using radiomics and DL to distinguish benign from malignant ovarian tumors. We applied classifiers with radiomics and DL features extracted from CT images to classify benign and malignant ovarian tumors. The performance of various combinations of classifiers and feature sets were compared with radiologists on the classification task using pathologic diagnosis as the gold standard. Moreover, the performance improvement of radiologists with assistance of the optimal model was also assessed.

Methods

Study population

In this institutional review board-approved study, we retrospectively collected 245 consecutive patients with suspected ovarian tumors from the MacKay Memorial Hospital between July 2018 and December 2019. Patients meeting the following criteria were included: (1) pathologically confirmed ovarian tumor resected by surgery, (2) contrast-enhanced CT scan performed prior to surgery, (3) clear CT images without artifacts and fit for analysis. The final cohort consisted of 149 patients with 185 ovarian tumors (Fig. 1).

The data were divided into training and testing sets in a 7:3 ratio. The training set was used to develop five models with different combinations of feature sets: radiomics model, DL model, clinical model, combined radiomics and DL model, and ensemble model (combined radiomics, DL, and clinical feature sets). The models were then tested on the unseen testing set. Figure 2 illustrates the flowchart of study design.

Image acquisition and segmentation

CT examinations were performed on 4 different multidetector CT scanners: Siemens Somatom Definition Flash, Siemens Somatom Definition AS, Toshiba Aquilion ONE (TSX-301C), Toshiba Aquilion PRIME (TSX-303A). The scanning parameters were as follows: tube voltage, 120 kVp; tube current, 200–230 mA; gantry rotation time, 0.5 s; beam pitch, 1.0; reconstruction thickness, 2 mm; reconstruction interval, 1.5 mm. Contrast medium (Iodine concentration: 300 mg/mL) 80–100 mL was injected using a mechanical injector at a rate of 2.5–3.5 mL/sec. The time delay from contrast agent injection to image acquisition was 70 s.

The preoperative contrast-enhanced CT images were collected from the PACS. Tumors were manually segmented by an experienced radiologist using 3D slicer (IEEE Cat No. 04EX821). The boundary of the whole tumor was manually defined on each axial CT slice.

Feature extraction, selection, and tumor classification

After resolution and intensity normalization, radiomics features were extracted from the tumor images. A total of 129 radiomics features were extracted from each tumor, including 12 histogram features, 9 gray-level co-occurrence matrix (GLCM) features, 96 wavelet features, and 12 Laplacian of Gaussian (LoG) features (Additional file 1: Table S1).

In addition to the radiomics, a 3D U-Net convolutional neural network (CNN) was applied as a feature extractor. Figure 3 illustrates the architecture of the U-Net applied in this study, which consists of an encoder and a decoder. The basic idea of the use of the U-net as a feature extractor is that the features extracted by the encoder from an input tumor image could represent the tumor if the image reconstructed by the decoder using the features is similar to the input image [31,32,33,34]. In this study, the U-net was trained and validated respectively by 90% and 10% of the training set using Adam optimizer with a loss function of half mean squared error. A batch size of 1 was used due to the limited memory size of the applied graphic card. The learning rate and the number of epochs for the training were adjusted based on the averaged root mean squared error (RMSE) between the input and reconstructed images to ensure the images reconstructed by the decoder were as much as similar to the input images. By inputting the tumor images to the trained U-net, the features output by the last activation layer of the encoder were adopted as DL features of the tumor. For each tumor, 224 DL features were extracted.

Using the radiomics and U-net, 353 features were extracted from each tumor. However, the performance of classification using such a large number of features could be low due to multiple collinearity and over-fitting. We used a least absolute shrinkage and selection operator (LASSO) regression with tenfold cross-validation to eliminate irrelevant features [35]. Features with regression coefficients > 0.1 were selected for the classification.

After feature selection, benign and malignant tumors were classified using four classifiers, including K-nearest neighbor (KNN), support vector machine (SVM), logistic regression (LR), and random forest (RF), with five types of feature sets, including radiomics features, DL features, clinical features, combined radiomics and DL features, and ensemble features (all features combined). The classification result would output a probability (0–100%) of malignancy for each tumor. The performance of the classification using different combinations of classifiers and feature sets were evaluated and compared using the training data with tenfold cross-validation. In this study, feature extraction, selection, and classifier training and evaluation were implemented using MATLAB R2020a (MathWorks, Natick, MA).

Radiologist evaluation

Based on the years of experience reading abdominal CT images, radiologists were divided into two groups, including juniors (3 radiologists, experience < 10 years) and seniors (2 radiologists, experience > 10 years). All radiologists were blinded to patients’ pathologic diagnoses. They were asked to independently interpret the CT images of the testing set and record each tumor as benign or malignant with the given information of patients’ age and CA-125 level. After one month, they were asked to interpret the images again with the assistance of the best performing model.

Statistical analysis

In order to evaluate the performance of the AI models and radiologists, the following indices were calculated: accuracy, sensitivity, specificity, receiver operating characteristic curve (ROC), area under the ROC curve (AUC), and F1 score. Interobserver reliability was assessed by using Krippendorff’s alpha coefficient. When assessing the clinical characteristics between groups, differences in continuous variables and categorical variables were examined using the independent samples t-test and chi-squared test, respectively. p < 0.05 was considered significant difference. Statistical analysis was performed using SPSS version 24.0 (IBM Corporation, Armonk, NY, USA).

Results

Patient demographics

The final cohort consisted of 149 patients with 185 ovarian tumors, 112 benign and 73 malignant. The patients’ age ranged from 18 to 80 years old (mean 46.4 ± 12.4 years). There were 78 patients (52.3%) with elevated CA-125 and 36 patients (24.2%) with bilateral tumors. There were significant differences in age (p < 0.0001), tumor volume (p < 0.0001), and CA-125 (p = 0.0003) between the benign and malignant groups (Table 1). The training and testing sets were balanced in terms of all clinical variables (Additional file 1: Table S2). Tumor histological subtypes are summarized in Table 2. For classification purposes, borderline and malignant tumors were grouped into a single category and referred to as malignant.

Table 1 Patient and tumor characteristics for the benign and malignant groups

Full size table

Table 2 Summary of pathological subtypes

Full size table

Feature selection and tumor classification

The details of features selected by LASSO method are described in Table 3. In the radiomics model, 4 features were selected from initial 129 radiomics features. For the DL features, the feature extraction DL model (U-net) was trained using a learning rate of 0.001 s and 25 epochs. The average RMSE between the input and reconstructed images was 25.45 ± 39.05. Four features were selected from initial 224 DL features for DL model. In the combined radiomics and DL model, 6 features were selected from the total 353 radiomics and DL features, including one radiomics feature and five DL features. The clinical model had four clinical features: age, CA-125, tumor volume, and tumor side. The ensemble model consisted of 10 features including 4 clinical features and 6 features used in the combined radiomics and DL model. The detailed model performance on training and testing sets using different classifiers, i.e., KNN, SVM, LR, and RF, can be found in Additional file 1: Tables S3–S4. Due to the overall better performance of the LR classifier compared with other classifiers on the testing set, its analysis results were presented for evaluation for the rest of the study.

Table 3 Radiomics and deep learning features selected by LASSO

Full size table

Performance of AI models

The performance metrics of the AI models and radiologists on the testing set are summarized in Table 4. The accuracy of models in descending order were ensemble model 82%, DL model 73%, clinical model 73%, combined radiomics and DL model 71%, and radiomics model 61%. The best performing model was the ensemble model with the highest accuracy (82%), sensitivity (68%), negative predictive rate (85%), and F1 score (0.72). The ensemble model achieved a specificity of 89%, AUC of 0.83, and positive predictive rate of 77%. The DL model had the highest AUC (0.89), specificity (100%), and positive predictive rate (100%) but the lowest sensitivity (21%).

Table 4 Performance metrics of AI models and radiologists

Full size table

Performance of radiologists

The senior radiologists achieved higher accuracy, specificity, AUC, positive predictive rate, and F1 score than all junior radiologists (Table 4). With AI model assistance, all junior radiologists showed an overall improvement in performance metrics, while the senior radiologists had only mild improvement in accuracy, AUC, and F1 score. The interobserver reliability of junior radiologists (Krippendorff’s alpha, 0.4757 vs 0.6333) and senior radiologists (Krippendorff’s alpha, 0.4806 vs 0.7331) also revealed improvement with AI assistance. The averaged performance results of radiologists are summarized in Table 5. With the assistance of ensemble model, the junior radiologists achieved a significant improvement in averaged accuracy (81% vs 66%), sensitivity (82% vs 67%), and specificity (80% vs 65%) that were comparable with senior radiologists. The senior radiologists only displayed a mild improvement in average accuracy (85% vs 83%) and specificity (87% vs 83%) and the same sensitivity (82%) with AI assistance. Aided by the ensemble-produced probabilities, junior radiologists also achieved an improvement in AUC that showed no statistically significant difference from senior radiologists. Comparisons of AUC between radiologists can be found in Additional file 1: Tables S5–S7.

Table 5 Performance comparison of radiologists and ensemble model

Full size table

Performance comparison of ensemble model and radiologists

Figure 4 demonstrates the ROC curves of ensemble model and radiologists. The AUC of ensemble model (0.83) was comparable with senior radiologists (0.82–0.83) and better than junior radiologists (0.61–0.73). Compared with junior radiologists averaged results (Table 5), the ensemble model had higher accuracy (82% vs 66%) and specificity (89% vs 65%) with comparable sensitivity (68% vs 67%). Against the senior radiologists averaged results, the ensemble model had a comparable accuracy (82% vs 83%), higher specificity (89% vs 83%), but lower sensitivity (68% vs 82%). Comparison of AUC between the ensemble model and radiologists can be found in Additional file 1: Table S8.

Sample misclassified by AI model and/or radiologists

Figure 5 demonstrates examples of tumor misclassified by AI model and/or radiologists under three scenarios. Figures 5a and b depict ovarian tumors that were misclassified by AI model but correctly differentiated by all junior radiologists, selected from 4 cases of this scenario, including 2 malignant and 2 benign tumors. Figure 5c demonstrated the only one tumor that was misclassified by both AI model and all junior radiologists. Figure 5d depicted an ovarian tumor that was wrongly differentiated by all 3 junior radiologists but correctly classified by AI model, selected from 9 cases of this scenario, including 1 malignant and 8 benign tumors.

Discussion

In this study, we developed a CT-based AI model incorporating radiomics and DL features with clinical data to classify benign and malignant ovarian tumors using ML classifiers. The model can distinguish benign from malignant ovarian tumors with high accuracy (82%) and specificity (89%) for a fair sensitivity (68%). The model performed better than the junior radiologists’ average results. With the probabilities provided by the model, the junior radiologists showed a significant improvement in performance approaching to senior radiologists. These results demonstrate that the AI model can assist less-experienced radiologists in assessing ovarian tumors, providing evidence of the clinical validity of this model.

This is the first study applying ML combined with radiomics and DL features extracted from CT images to differentiate between benign and malignant ovarian tumors. There is limited research on applying DL to differentiate ovarian tumor using CT images. Christiansen et al. [29] and Wang et al. [30] applied DL for ovarian tumor differentiation using ultrasound and MRI respectively. Both studies used the CNN to build an end-to-end classification model which needed to be trained with a larger dataset. However, under common medical conditions, collecting a large uniform tumor image dataset with pathological diagnosis is very difficult. DL features, quantified image features extracted through an encoder-decoder CNN [31,32,33,34], may provide an alternative way for tumor imaging analysis on a relatively small dataset. Wang et al. [24] extracted DL features from CT images to predict tumor recurrence in high-grade serous ovarian cancer. Xia et al. [25] developed a CT-based scheme to classify ground-glass lung nodules by fusing radiomics and DL features. So far, there is no study using DL features or incorporating radiomics with DL features to differentiate ovarian tumors. Since radiomics, DL features, and clinical data represent different characteristics of tumor, we assume that an AI model integrating these features can accurately distinguish benign and malignant ovarian tumors. The better performance of the ensemble model verified our assumption that radiomics, DL features, and clinical data may provide complementary information on ovarian tumors and work better together in distinguishing benign from malignancy.

ML is often considered as a black box. In order to understand the decisions and mistakes that the AI model and radiologists made, we analyzed three scenarios of misclassified results. In the first scenario where the tumors were misclassified by AI model but correctly differentiated by all junior radiologists, the malignant tumor (Fig. 5a) had obvious solid portion, while the benign one (Fig. 5b) was a hypoattenuation tumor without solid portion or mural nodule. In traditional radiological evaluation, solid portion, mural nodule, and thick septa of an ovarian tumor are clues for malignancy. Tumors with typical CT image features, such as the above two tumors (Fig. 5a and b), would not be misdiagnosed by radiologists even though they were misclassified by AI model. In the second scenario where both AI model and all junior radiologists were wrong, the benign tumor (Fig. 5c) was a multiseptated cystic tumor with uneven thick septa that might raise the suspicion of malignancy in radiological evaluation. In the third scenario where the AI model was correct, but all junior radiologists were wrong, the tumor (Fig. 5d) was a benign multiseptated cystic tumor with thick septum. As mentioned before, it is challenging for radiologists in excluding the possibility of malignancy in such multiseptated cystic ovarian tumors. The AI model may do better than radiologists in identifying subtle features unexplainable by traditional radiological evaluation and help the radiologists to make correct decisions in difficult cases like the one in Fig. 5d.

The proposed model may potentially assist radiologists and gynecologists to assess ovarian tumors and guide therapeutic strategies for these patients, especially in hospitals that lack experienced radiologists. With the growing global physician shortage problem, the availability of an AI-assistance system is very important. Although MRI may provide better performance than CT in tumor differentiation due to its superior tissue contrast [36, 37], we believe a CT-based AI model would benefit more patients, especially those in remote areas. Although the sensitivity of our model is relatively low, its intended clinical application is not for screening. High specificity of the model is considerably more important than sensitivity since CT study usually serves as a confirmation modality for workup of indeterminate tumors on sonogram.

There are several limitations in this study. First, the data size is relatively small and without external validation cohort, and the study design is retrospective. Future studies using larger dataset from different institutions with prospective study design are essential to improve and validate the performance of the model. Second, manual segmentation of the ovarian tumors by a single radiologist can bias the results. However, considering accurate tumor segmentation is important for radiomics and DL feature extraction, we decided to use manual segmentation by an experienced radiologist. Third, recall of cases from the first session may be a concern when the radiologists were asked to reevaluate the CT images with AI assistance. To address this issue, we arranged a time delay of at least one month between the two sessions. Fourth, we chose CT as our imaging tool because it is far more available than MR. However, this remains a potential weakness for the developed tool applicability since an MRI-based model might outperform the proposed CT-based model. Fifth, we applied ML classifiers rather than DL method for tumor classification due to the limitation of small data size.

Conclusions

In this study, we developed a CT-based AI model incorporating radiomics and DL features with clinical data to distinguish benign from malignant ovarian tumors using ML classifiers. The model can distinguish benign from malignant ovarian tumors with high accuracy and specificity. Besides, the model can improve the performance of less-experienced radiologists in assessing ovarian tumors, and potentially guide gynecologists to provide better therapeutic strategies for these patients.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

AI:: Artificial intelligence
AUC:: Area under the ROC curve
CA-125:: Cancer antigen 125
CNN:: Convolutional neural network
CT:: Computed tomography
DL:: Deep learning
GLCM:: Gray-level co-occurrence matrix
KNN:: K-nearest neighbor
LASSO:: Least absolute shrinkage and selection operator
LoG:: Laplacian of Gaussian
LR:: Logistic regression
ML:: Machine learning
MRI:: Magnetic resonance imaging
RF:: Random forest
RMSE:: Root mean squared error
ROC:: Receiver operating characteristic curve
SVM:: Support vector machine

References

Siegel RL, Miller KD, Jemal A (2019) Cancer statistics. CA Cancer J Clinic 69: 7–34
Hand R, Fremgen A, Chmiel JS et al (1993) Staging procedures, clinical management, and survival outcome for ovarian carcinoma. JAMA 269:1119–1122
Article CAS PubMed Google Scholar
American College of Obstetricians and Gynecologists’ Committee on Practice Bulletins—Gynecology (2016) Practice bulletin no. 174: evaluation and management of adnexal masses. Obstet Gynecol 128(5):e210–e226.
Article Google Scholar
Jeong YY, Outwater EK, Kang HK (2000) Imaging evaluation of ovarian masses. Radiographics 20:1445–1470
Article CAS PubMed Google Scholar
Iyer VR, Lee SI (2010) MRI, CT, and PET/CT for ovarian cancer detection and adnexal lesion characterization. AJR Am J Roentgenol 194:311–321
Article PubMed Google Scholar
Kinkel K, Lu Y, Mehdizade A, Pelte MF, Hricak H (2005) Indeterminate ovarian mass at US: incremental value of second imaging test for characterization–meta-analysis and Bayesian analysis. Radiology 236:85–94
Article PubMed Google Scholar
Moore BJ, Steiner CA, Davis PH, Stocks C, Barrett ML (2006) Trends in hysterectomies and oophorectomies in hospital inpatient and ambulatory settings, 2005–2013: statistical brief #214healthcare cost and utilization project (HCUP) statistical briefs. Agency for healthcare research and quality (US), Rockville (MD)
Lass A (1999) The fertility potential of women with a single ovary. Hum Reprod Update 5:546–550
Article CAS PubMed Google Scholar
Parker WH, Broder MS, Liu Z, Shoupe D, Farquhar C, Berek JS (2005) Ovarian conservation at the time of hysterectomy for benign disease. Obstet Gynecol 106:219–226
Article PubMed Google Scholar
Bi WL, Hosny A, Schabath MB et al (2019) Artificial intelligence in cancer imaging: Clinical challenges and applications. CA: A Cancer J Clinic 69:127–157
Zhou J, Zeng ZY, Li L (2020) Progress of artificial intelligence in gynecological malignant tumors. Cancer Manage Res 12:12823–12840
Article CAS Google Scholar
Akazawa M, Hashimoto K (2021) Artificial intelligence in gynecologic cancers: current status and future challenges – a systematic review. Artif Intell Med 120:102164
Article PubMed Google Scholar
Shrestha P, Poudyal B, Yadollahi S et al (2022) A systematic review on the use of artificial intelligence in gynecologic imaging - background, state of the art, and future directions. Gynecol Oncol. https://doi.org/10.1016/j.ygyno.2022.07.024
Article PubMed Google Scholar
Sun R, Limkin EJ, Vakalopoulou M et al (2018) A radiomics approach to assess tumour-infiltrating CD8 cells and response to anti-PD-1 or anti-PD-L1 immunotherapy: an imaging biomarker, retrospective multicohort study. Lancet Oncol 19:1180–1191
Article CAS PubMed Google Scholar
Chiappa V, Interlenghi M, Salvatore C et al (2021) Using rADioMIcs and machine learning with ultrasonography for the differential diagnosis of myometRiAL tumors (the ADMIRAL pilot study). Radiomics and differential diagnosis of myometrial tumors. Gynecol Oncol 161:838–844
Article CAS PubMed Google Scholar
Chaudhary K, Poirion OB, Lu L, Garmire LX (2018) Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res 24:1248–1259
Article CAS PubMed Google Scholar
Chiappa V, Interlenghi M, Bogani G et al (2021) A decision support system based on radiomics and machine learning to predict the risk of malignancy of ovarian masses from transvaginal ultrasonography and serum CA-125. Eur Radiol Exp 5:28
Article PubMed PubMed Central Google Scholar
Newtson AM, Mattson JN, Goodheart MJ et al (2019) Prediction of optimal surgical outcomes with radiologic images using deep learning artificial intelligence. Gynecol Oncol 154:156
Article Google Scholar
Rizzo S, Botta F, Raimondi S et al (2018) Radiomics of high-grade serous ovarian cancer: association between quantitative CT features, residual tumour and disease progression within 12 months. Eur Radiol 28:4849–4859
Article PubMed Google Scholar
Song XL, Ren JL, Zhao D, Wang L, Ren H, Niu J (2021) Radiomics derived from dynamic contrast-enhanced MRI pharmacokinetic protocol features: the value of precision diagnosis ovarian neoplasms. Eur Radiol 31:368–378
Article CAS PubMed Google Scholar
Vargas HA, Veeraraghavan H, Micco M et al (2017) A novel representation of inter-site tumour heterogeneity from pre-treatment computed tomography textures classifies ovarian cancers by clinical outcome. Eur Radiol 27:3991–4001
Article PubMed PubMed Central Google Scholar
Jian J, Ya Li, Pickhardt PJ et al (2021) MR image-based radiomics to differentiate type Ι and type ΙΙ epithelial ovarian cancers. Eur Radiol 31:403–410
Article PubMed Google Scholar
Zhang H, Mao Y, Chen X et al (2019) Magnetic resonance imaging radiomics in categorizing ovarian masses and predicting clinical outcome: a preliminary study. Eur Radiol 29:3358–3371
Article PubMed Google Scholar
Wang S, Liu Z, Rong Y et al (2019) Deep learning provides a new computed tomography-based prognostic biomarker for recurrence prediction in high-grade serous ovarian cancer. Radiother Oncol 132:171–177
Article PubMed Google Scholar
Xia X, Gong J, Hao W et al (2020) Comparison and fusion of deep learning and radiomics features of ground-glass nodules to predict the invasiveness risk of stage-I lung adenocarcinomas in CT scan. Front Oncol 10:418
Article PubMed PubMed Central Google Scholar
Yu XP, Wang L, Yu HY et al (2021) MDCT-based radiomics features for the differentiation of serous borderline ovarian tumors and serous malignant ovarian tumors. Cancer Manage Res 13:329–336
Article Google Scholar
An H, Wang Y, Wong EMF et al (2021) CT texture analysis in histological classification of epithelial ovarian carcinoma. Eur Radiol 31:5050–5058
Article PubMed Google Scholar
Park H, Qin L, Guerra P, Bay CP, Shinagare AB (2021) Decoding incidental ovarian lesions: use of texture analysis and machine learning for characterization and detection of malignancy. Abdom Radiol (NY) 46:2376–2383
Article PubMed Google Scholar
Christiansen F, Epstein EL, Smedberg E, Åkerlund M, Smith K, Epstein E (2021) Ultrasound image analysis using deep neural networks for discriminating between benign and malignant ovarian tumors: comparison with expert subjective assessment. Ultrasound Obstet Gynecol 57:155–163
Article CAS PubMed PubMed Central Google Scholar
Wang R, Cai Y, Lee IK et al (2020) Evaluation of a convolutional neural network for ovarian tumor differentiation based on magnetic resonance imaging. Eur Radiol. https://doi.org/10.1007/s00330-020-07266-x
Article PubMed PubMed Central Google Scholar
Masci J, Meier U, Cireşan D, Schmidhuber J (2011) Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela T, Duch W, Girolami M, Kaski S (eds) Artificial neural networks and machine learning – ICANN 2011. Springer, Berlin Heidelberg, pp 52–59
Chapter Google Scholar
Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269
Dara S, Tumma P (2018) Feature extraction by using deep learning: a survey2018 second international conference on electronics, communication and aerospace technology (ICECA), pp 1795–1801
Vununu C, Lee S-H, Kwon K-R (2019) A deep feature extraction method for HEp-2 cell image classification. Electronics 8:20
Article Google Scholar
Fonti V, Belitser E (2017) Feature selection using lasso. VU Amsterdam Res Paper Business Anal 30:1–25
Google Scholar
Hricak H, Chen M, Coakley FV et al (2000) Complex adnexal masses: detection and characterization with MR imaging–multivariate analysis. Radiology 214:39–46
Article CAS PubMed Google Scholar
Foti PV, Attinà G, Spadola S et al (2016) MR imaging of ovarian masses: classification and differential diagnosis. Insights Imaging 7:21–41
Article PubMed Google Scholar

Download references

Funding

The study was supported by the National Science and Technology Council (MOST 111-2314-B-039-042) and China Medical University (CMU111-MF-62).

Author information

Cheng-Ting Shih and Tung-Hsin Wu contributed equally to this work

Authors and Affiliations

Department of Biomedical Imaging and Radiological Sciences, National Yang Ming Chiao Tung University, Taipei, 112, Taiwan
Ya-Ting Jan, Pei-Shan Tsai, Wen-Hui Huang, Chun-Sheng Yen, Ju-Ping Teng & Tung-Hsin Wu
Department of Radiology, MacKay Memorial Hospital, Taipei, Taiwan
Ya-Ting Jan, Pei-Shan Tsai, Wen-Hui Huang, Ling-Ying Chou, Shih-Chieh Huang, Jing-Zhe Wang & Pei-Hsuan Lu
Department of Medicine, MacKay Medical College, New Taipei City, Taiwan
Ya-Ting Jan, Pei-Shan Tsai, Wen-Hui Huang, Ling-Ying Chou, Shih-Chieh Huang, Jing-Zhe Wang & Pei-Hsuan Lu
MacKay Junior College of Medicine, Nursing and Management, New Taipei City, Taiwan
Ya-Ting Jan, Pei-Shan Tsai, Wen-Hui Huang, Ling-Ying Chou, Shih-Chieh Huang, Jing-Zhe Wang & Pei-Hsuan Lu
Division of Endocrine and Metabolism, Department of Medicine, Taipei Veterans General Hospital, Taipei, Taiwan
Dao-Chen Lin
Department of Radiology, Taipei Veterans General Hospital, Taipei, Taiwan
Dao-Chen Lin
School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
Dao-Chen Lin
Biomedical Imaging Laboratory (BIG), Department of Electrical and Computer Engineering, Faculty of Science and Technology, University of Macau, Macau, China
Greta S. P. Mok
Department of Biomedical Imaging and Radiological Science, China Medical University, Taichung, 404, Taiwan
Cheng-Ting Shih

Authors

Ya-Ting Jan
View author publications
You can also search for this author in PubMed Google Scholar
Pei-Shan Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Hui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ling-Ying Chou
View author publications
You can also search for this author in PubMed Google Scholar
Shih-Chieh Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jing-Zhe Wang
View author publications
You can also search for this author in PubMed Google Scholar
Pei-Hsuan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Dao-Chen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Sheng Yen
View author publications
You can also search for this author in PubMed Google Scholar
Ju-Ping Teng
View author publications
You can also search for this author in PubMed Google Scholar
Greta S. P. Mok
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Ting Shih
View author publications
You can also search for this author in PubMed Google Scholar
Tung-Hsin Wu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

CTS and THW contributed equally to this work. YTJ, CTS, and THW conceived and designed the study, analyzed and interpreted the data, prepared the draft and gave final approval of the version to be submitted. PST, WHH, LYC, SCH, JZW, and PHL undertook data analysis and interpretation. DCL, CSY, and JPT collected the data and performed the statistical analysis. GSPM critically reviewed the intellectual content and gave final approval of the version to be submitted. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Cheng-Ting Shih or Tung-Hsin Wu.

Ethics declarations

Ethics approval and consent to participate

The Institutional Review Board approved this study and waived the requirement for patient consent.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1. Table S1.

Radiomics features extracted in this study. Table S2. Patient and tumor characteristics for the training and testing sets. Table S3. Performance metrics of AI models on training set. Table S4. Performancemetrics of AI models on testing set. Table S5. Comparison of AUC between radiologists with and without AI assistance. Table S6. Comparison of AUC between junior radiologists and senior radiologists. Table S7. Comparison of AUC between junior radiologists with AI and senior radiologists. Table S8. Comparison of AUC between ensemble model and radiologists.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jan, YT., Tsai, PS., Huang, WH. et al. Machine learning combined with radiomics and deep learning features extracted from CT images: a novel AI model to distinguish benign from malignant ovarian tumors. Insights Imaging 14, 68 (2023). https://doi.org/10.1186/s13244-023-01412-x

Download citation

Received: 04 January 2023
Accepted: 20 March 2023
Published: 24 April 2023
DOI: https://doi.org/10.1186/s13244-023-01412-x

Machine learning combined with radiomics and deep learning features extracted from CT images: a novel AI model to distinguish benign from malignant ovarian tumors

Abstract

Background

Methods

Results

Conclusions

Key points

Background

Methods

Study population

Image acquisition and segmentation

Feature extraction, selection, and tumor classification

Radiologist evaluation

Statistical analysis

Results

Patient demographics

Feature selection and tumor classification

Performance of AI models

Performance of radiologists

Performance comparison of ensemble model and radiologists

Sample misclassified by AI model and/or radiologists

Discussion

Conclusions

Availability of data and materials

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1. Table S1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords