Development of an artificial intelligence-based precision medicine decision support system for radiogenomics data sets

Abdulvahap Pınar; Ahmet Kadir Arslan; Emek Güldoğan

doi:10.7197/cmj.1713462

EN TR

Development of an artificial intelligence-based precision medicine decision support system for radiogenomics data sets

Abstract

Aim: This study aims to apply deep learning algorithms for superpixel segmentation, herbaceous thresholding, and disease reference position estimation from DICOM images and clinical data of Non-Small Cell Lung Cancer (NSCLC) patients. Quantitative imaging data was integrated with clinical information. Various machine learning algorithms were employed to identify biomarkers and evaluate classification performance based on clinical data, imaging data, and their combination, assessing the model improvement rates. Materials and Methods: The clinical dataset included 43 patients with and 168 without an Epidermal Growth Factor Receptor (EGFR) mutation, and 38 with and 173 without a Kirsten Rat Sarcoma Viral Oncogene Homolog (KRAS) mutation, totaling 211 NSCLC cases. A total of 2,231 images were analyzed. Using the VGG16 deep learning model, 25,088 features were extracted from each image. XGBoost, CatBoost, Random Forest, and Support Vector Machine (SVM) classification algorithms were used to predict mutation status. Findings: Clinical data revealed significant differences in mutation status among NSCLC patients. The Random Forest algorithm was employed for feature selection, identifying the 50 most important variables for model training. XGBoost and CatBoost achieved the highest classification performance, with results for accuracy, balanced accuracy, precision, sensitivity, F1-score, and ROC-AUC as follows: 0.965 ± 0.015, 0.954 ± 0.021, 0.953 ± 0.024, 0.994 ± 0.007, 0.973 ± 0.011, and 0.990 ± 0.005, respectively. Result: The study’s findings demonstrate that XGBoost and CatBoost models were highly effective in predicting KRAS mutation status from imaging data. CatBoost also performed best in determining EGFR mutation status, outperforming other machine learning methods.

Keywords

Radyogenomik veri setleri için yapay zeka tabanlı bir hassas tıp karar destek sisteminin geliştirilmesi

Öz

Amaç: Bu çalışma, küçük hücreli dışı akciğer kanseri (KHDAK) hastalarına ait DICOM görüntüleri ve klinik verilerden süperpiksel segmentasyonu, otsu eşikleme ve hastalık referans pozisyonu tahmini için derin öğrenme algoritmalarını uygulamayı amaçlamaktadır. Nicel görüntüleme verileri, klinik bilgilerle entegre edilmiştir. Klinik veriler, görüntüleme verileri ve bunların kombinasyonuna dayalı olarak biyobelirteçleri tanımlamak ve sınıflandırma performansını değerlendirmek için çeşitli makine öğrenmesi algoritmaları kullanılmış; model iyileşme oranları değerlendirilmiştir. Gereç ve Yöntem: Klinik veri seti, Epidermal Büyüme Faktörü Reseptör (EGFR) mutasyonu olan 43 ve olmayan 168, Kirsten Rat Sarkom Viral Onkogen Homoloğu (KRAS) mutasyonu olan 38 ve olmayan 173 hasta olmak üzere toplam 211 KHDAK vakasını içermektedir. Toplam 2.231 görüntü analiz edilmiştir. VGG16 derin öğrenme modeli kullanılarak her bir görüntüden 25.088 özellik çıkarılmıştır. Mutasyon durumunu tahmin etmek için XGBoost, CatBoost, Random Forest ve Destek Vektör Makineleri (SVM) sınıflandırma algoritmaları kullanılmıştır. Bulgular: Klinik veriler, KHDAK hastaları arasında mutasyon durumlarına göre anlamlı farklılıklar olduğunu ortaya koymuştur. Model eğitimi için en önemli 50 değişkeni belirlemek amacıyla Random Forest algoritması ile özellik seçimi yapılmıştır. XGBoost ve CatBoost, en yüksek sınıflandırma performansını elde etmiştir. Elde edilen doğruluk, dengelenmiş doğruluk, kesinlik, duyarlılık, F1 skoru ve ROC-AUC değerleri sırasıyla şu şekildedir: 0.965 ± 0.015, 0.954 ± 0.021, 0.953 ± 0.024, 0.994 ± 0.007, 0.973 ± 0.011 ve 0.990 ± 0.005. Sonuç: Çalışmanın bulguları, XGBoost ve CatBoost modellerinin görüntüleme verilerinden KRAS mutasyon durumunu tahmin etmede son derece etkili olduğunu göstermektedir. Ayrıca CatBoost, EGFR mutasyon durumunun belirlenmesinde de diğer makine öğrenmesi yöntemlerinden daha iyi performans göstermiştir.

Anahtar Kelimeler

References

1. Lambin, P. et al. Radiomics: extracting more information from medical images using advanced feature analysis. European journal of cancer 48, 441-446 (2012).
2. Van Griethuysen, J. J. et al. Computational radiomics system to decode the radiographic phenotype. Cancer research 77, e104-e107 (2017).
3. Chaddad, A., Daniel, P., Sabri, S., Desrosiers, C. & Abdulkarim, B. Integration of radiomic and multi-omic analyses predicts survival of newly diagnosed IDH1 wild-type glioblastoma. Cancers 11, 1148 (2019).
4. Song, L. et al. Clinical, conventional CT and radiomic feature-based machine learning models for predicting ALK rearrangement status in lung adenocarcinoma patients. Frontiers in Oncology 10, 369 (2020).
5. Xu, Y. et al. Deep learning predicts lung cancer treatment response from serial medical imaging. Clinical Cancer Research 25, 3266-3275 (2019).
6. Yamamoto, S. et al. ALK molecular phenotype in non–small cell lung cancer: CT radiogenomic characterization. Radiology 272, 568-576 (2014).
7. Shi, L. et al. Radiomics for response and outcome assessment for non-small cell lung cancer. Technology in cancer research & treatment 17, 1533033818782788 (2018).
8. Armanious, K. et al. MedGAN: Medical image translation using GANs. Computerized medical imaging and graphics 79, 101684 (2020).

9. Razzak, M. I., Naz, S. & Zaib, A. Deep learning for medical image processing: Overview, challenges and the future. Classification in BioApps: Automation of decision making, 323-350 (2017).
10. Yu, D. et al. in 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). 569-572 (IEEE).
11. Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A. & Mougiakakou, S. Lung pattern classification for interstitial lung diseases using a deep convolutional neural network. IEEE transactions on medical imaging 35, 1207-1216 (2016).
12. Little, M. P., Wakeford, R., Tawn, E. J., Bouffler, S. D. & Berrington de Gonzalez, A. Risks associated with low doses and low dose rates of ionizing radiation: why linearity may be (almost) the best we can do. Radiology 251, 6-12 (2009).
13. Jacobs, C. et al. Computer-aided detection of pulmonary nodules: a comparative study using the public LIDC/IDRI database. European radiology 26, 2139-2147 (2016).
14. Bakr S, Gevaert O, Echegaray S, Ayers K, Zhou M, Shafiq M, et al. Data for NSCLC radiogenomics collection. The Cancer Imaging Archive. 2017;10:K9.
15. Saxena, S. et al. Role of artificial intelligence in radiogenomics for cancers in the era of precision medicine. Cancers 14, 2860 (2022).
16. Brahmer, J. R. et al. The Society for Immunotherapy of Cancer consensus statement on immunotherapy for the treatment of non-small cell lung cancer (NSCLC). Journal for immunotherapy of cancer 6, 1-15 (2018).
17. Reck, M. et al. Updated analysis of KEYNOTE-024: pembrolizumab versus platinum-based chemotherapy for advanced non–small-cell lung cancer with PD-L1 tumor proportion score of 50% or greater. Journal of clinical oncology 37, 537-546 (2019).
18. Armato, S. G. et al. Lung cancer: performance of automated lung nodule detection applied to cancers missed in a CT screening program. Radiology 225, 685-692 (2002).
19. Armato III, S. G., Giger, M. L. & MacMahon, H. Automated detection of lung nodules in CT scans: preliminary results. Medical physics 28, 1552-1561 (2001).
20. Wang, C., Elazab, A., Wu, J. & Hu, Q. Lung nodule classification using deep feature fusion in chest radiography. Computerized Medical Imaging and Graphics 57, 10-18 (2017).
21. Nabiyev, V. Yapay Zeka, Seçkin Yayıncılık San. Ve Tic. AŞ, Ankara, 724s (2003).
22. Rajpurkar, P. et al. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS medicine 15, e1002686 (2018).
23. Russell, S. J. & Norvig, P. (Pearson Education Limited London, UK:, 2016).
24. Karakaya, A. Meme kanseri tahmininde makine öğrenmesi algoritmaları ve AutoML, Pamukkale University, (2024).
25. Fan, C., Chen, M., Wang, X., Wang, J. & Huang, B. A review on data preprocessing techniques toward efficient and reliable knowledge discovery from building operational data. Frontiers in energy research 9, 652801 (2021).
26. Park, K., Chae, M. & Cho, J. H. Image pre-processing method of machine learning for edge detection with image signal processor enhancement. Micromachines 12, 73 (2021).
27. Ge, G., Shi, Z., Zhu, Y., Yang, X. & Hao, Y. Land use/cover classification in an arid desert-oasis mosaic landscape of China using remote sensed imagery: Performance assessment of four machine learning algorithms. Global Ecology and Conservation 22, e00971 (2020).
28. Breiman, L. Random forests. Machine learning 45, 5-32 (2001).
29. Okumus, H. & Nuroglu, F. M. A random forest-based approach for fault location detection in distribution systems. Electrical Engineering 103, 257-264 (2021).
30. Chen, T. & Guestrin, C. in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785-794.
31. Dorogush, A. V., Ershov, V. & Gulin, A. CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363 (2018).
32. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems 31 (2018).
33. Yue, S., Li, P. & Hao, P. SVM classification: Its contents and challenges. Applied Mathematics-A Journal of Chinese Universities 18, 332-342 (2003).
34. Ibm, C. IBM SPSS statistics for Windows. Armonk (NY): IBM Corp (2012).
35. Bakr, S. et al. Data for NSCLC radiogenomics collection. The Cancer Imaging Archive 10, K9 (2017).
36. Huang, Y. et al. Radiomics signature: a potential biomarker for the prediction of disease-free survival in early-stage (I or II) non—small cell lung cancer. Radiology 281, 947-957 (2016).
37. Ball, D. L. et al. The complex relationship between lung tumor volume and survival in patients with non-small cell lung cancer treated by definitive radiotherapy: a prospective, observational prognostic factor study of the Trans-Tasman Radiation Oncology Group (TROG 99.05). Radiotherapy and Oncology 106, 305-311 (2013).
38. Clark, K. et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. Journal of digital imaging 26, 1045-1057 (2013).
39. Bakr, S. et al. A radiogenomic dataset of non-small cell lung cancer. Scientific data 5, 1-9 (2018).

Details

Primary Language

English

Subjects

Health Informatics and Information Systems

Journal Section

Research Article

Authors

Abdulvahap Pınar ^*
0000-0002-3662-2579
Türkiye

Ahmet Kadir Arslan
0000-0001-8626-9542
Türkiye

Emek Güldoğan
0000-0002-5436-8164
Türkiye

Publication Date

June 21, 2025

Submission Date

June 3, 2025

Acceptance Date

June 11, 2025

Published in Issue

Year 2025 Volume: 47 Number: 2

DOI

https://doi.org/10.7197/cmj.1713462

IZ

https://izlik.org/JA97FP27TJ

Cite

RIS / Bibtex

AMA

1.Pınar A, Arslan AK, Güldoğan E. Development of an artificial intelligence-based precision medicine decision support system for radiogenomics data sets. CMJ. 2025;47(2):35-44. doi:10.7197/cmj.1713462