Journal Home > Volume 1 , Issue 2
Background

Kidney cancer originates from the urinary tubule epithelial system of the renal parenchyma, accounting for 20% of all urinary system tumors. Approximately 70% of cases are localized at diagnosis, and 30% are metastatic. Most localized kidney cancers can be cured by surgery, but most metastatic patients relapse after surgery and eventually die of kidney cancer. Therefore, accurately predicting patient survival and identifying high‐risk metastatic patients will effectively guide interventions and improve prognosis.

Methods

This study used the data of 12,394 kidney cancer patients from the surveillance, epidemiology, and end results database to construct a research cohort related to kidney cancer survival and metastasis. Eight machine learning models (including support vector machines, logistic regression, decision tree, random forest, XGBoost, AdaBoost, K‐nearest neighbors, and multilayer perceptron) were developed to predict the survival and metastasis of kidney cancer and six evaluation indicators (accuracy, precision, sensitivity, specificity, F1 score, and area under the receiver operating characteristic [AUROC]) were used to verify, evaluate, and optimize the models.

Results

Among the eight machine learning models, Logistic Regression has the highest AUROC in both prediction scenarios. For 3‐year survival prediction, the Logistic Regression model had an accuracy of 0.684, a sensitivity of 0.702, a specificity of 0.670, a precision of 0.686, an F1 score of 0.683, and an AUROC of 0.741. For tumor metastasis prediction, the Logistic Regression model had an accuracy of 0.800, a sensitivity of 0.540, a specificity of 0.830, a precision of 0.769, an F1 score of 0.772, and an AUROC of 0.804.

Conclusion

In this study, we selected appropriate variables from both statistical and clinical significance and developed and compared eight machine learning models for predicting 3‐year survival and metastasis of kidney cancer. The prediction results and evaluation results demonstrated that our model could provide decision support for early intervention for kidney cancer patients.


menu
Abstract
Full text
Outline
About this article

Machine learning‐based prognostic and metastasis models of kidney cancer

Show Author's information Yuxiang Zhang1Na Hong2Sida Huang3Jie Wu1Jianwei Gao2Zheng Xu2Fubo Zhang1Shaohui Ma1Ye Liu1,4Peiyuan Sun1Yanping Tang1Chun Liu2Jianzhong Shou1( )Meng Chen1( )
National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
Digital Health China Technologies, Co., Ltd., Beijing, China
Department of public policy, Cornell University, Ithaca, New York, USA
The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology of National Health Commission, Beijing, China

Abstract

Background

Kidney cancer originates from the urinary tubule epithelial system of the renal parenchyma, accounting for 20% of all urinary system tumors. Approximately 70% of cases are localized at diagnosis, and 30% are metastatic. Most localized kidney cancers can be cured by surgery, but most metastatic patients relapse after surgery and eventually die of kidney cancer. Therefore, accurately predicting patient survival and identifying high‐risk metastatic patients will effectively guide interventions and improve prognosis.

Methods

This study used the data of 12,394 kidney cancer patients from the surveillance, epidemiology, and end results database to construct a research cohort related to kidney cancer survival and metastasis. Eight machine learning models (including support vector machines, logistic regression, decision tree, random forest, XGBoost, AdaBoost, K‐nearest neighbors, and multilayer perceptron) were developed to predict the survival and metastasis of kidney cancer and six evaluation indicators (accuracy, precision, sensitivity, specificity, F1 score, and area under the receiver operating characteristic [AUROC]) were used to verify, evaluate, and optimize the models.

Results

Among the eight machine learning models, Logistic Regression has the highest AUROC in both prediction scenarios. For 3‐year survival prediction, the Logistic Regression model had an accuracy of 0.684, a sensitivity of 0.702, a specificity of 0.670, a precision of 0.686, an F1 score of 0.683, and an AUROC of 0.741. For tumor metastasis prediction, the Logistic Regression model had an accuracy of 0.800, a sensitivity of 0.540, a specificity of 0.830, a precision of 0.769, an F1 score of 0.772, and an AUROC of 0.804.

Conclusion

In this study, we selected appropriate variables from both statistical and clinical significance and developed and compared eight machine learning models for predicting 3‐year survival and metastasis of kidney cancer. The prediction results and evaluation results demonstrated that our model could provide decision support for early intervention for kidney cancer patients.

Keywords: machine learning, metastasis, survival, kidney cancer, prognostic model

References(26)

Siegel RL, Miller KD, Jemal A.Cancer statistics, 2020. CA Cancer J Clin. 2020;70(1):7–30.
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A.Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.
Chen W, Zheng R, Baade PD, Zhang S, Zeng H, Bray F. et al. Cancer statistics in China, 2015. CA Cancer J Clin. 2016;66(2):115–32.
National Cancer Institute. Surveillance, Epidemiology, and End Results program [Internet]. 2022 [cited 2022 Jun 6]. Available from: http://www.seer.cancer.gov
Scelo G, Larose TL.Epidemiology and risk factors for kidney cancer. J Clin Oncol. 2018;36(36):Jco2018791905.
Vogelzang NJ, Stadler WM.Kidney cancer. Lancet. 1998;352(9141):1691–6.
Thakur A, Jain SK.Kidney cancer: current progress in treatment. World J Oncol. 2011;2(4):158–65.
Meissner MA, McCormick BZ, Karam JA, Wood CG.Adjuvant therapy for advanced renal cell carcinoma. Expert Rev Anticancer Ther. 2018;18(7):663–71.
Byun SS, Heo TS, Choi JM, Jeong YS, Kim YS, Lee WK, et al. Deep learning based prediction of prognosis in nonmetastatic clear cell renal cell carcinoma. Sci Rep. 2021;11(1):1242.
Ji GW, Zhu FP, Xu Q, Wang K, Wu MY, Tang WW, et al. Machine‐learning analysis of contrast‐enhanced CT radiomics predicts recurrence of hepatocellular carcinoma after resection: a multi‐institutional study. EBioMedicine. 2019;50:156–65.
Cortes C, Vapnik V. Support‐vector networks. Mach Learn. 2004;20:27397. 10.1007/BF00994018
Tolles J, Meurer WJ.Logistic regression: relating patient characteristics to outcomes. JAMA. 2016;316(5):533–4.
Kamiński B, Jakubczyk M, Szufel P.A framework for sensitivity analysis of decision trees. Cent Eur J Oper Res. 2018;26(1):135–59.
Tin Kam H, Random decision forests. Proceedings of 3rd International Conference on Document Analysis and Recognition; 1995.
XGBoost Documentation [Internet]. 2021 [cited 2022 Jun 6]. Available from: XGBoost Documentation—xgboost 1.5.0 documentation.
DOI
Kégl B.The return of AdaBoost MH: multi‐class Hamming trees. 2013. arXiv:1312.6086. https://doi.org/10.48550/arXiv.1312.6086
DOI
Altman NS.An introduction to Kernel and nearest‐neighbor nonparametric regression. Am Stat. 1992;46(3):175–85.
Van Der Malsburg C, editor. Frank Rosenblatt: principles of neurodynamics: perceptrons and the theory of brain mechanisms. Brain theory. Springer Berlin Heidelberg; 1986.
Evaluation of measurement data—Guide to the expression of uncertainty in measurement[Internet]. The JCGM member organizations; 2008 [updated 2008 Sep; cited 2022 Jun 6]. JCGM 100:2008 (GUM 1995 with minor corrections—Evaluation of measurement data (bipm.org).
Powers D, Ailab. Evaluation: from precision, recall and F‐measure to ROC, informedness, markedness & correlation. J Mach Learn Technol. 2011;2:2229–3981.
Parikh R, Mathai A, Parikh S, Chandra Sekhar G, Thomas R.Understanding and using sensitivity, specificity and predictive values. Indian J Ophthalmol. 2008;56(1):45–50.
Fawcett T.An introduction to ROC analysis. Patt Recognit Lett. 2006;27(8):8617410.1016/j.patrec.2005.10.010
Lai Y, Tang F, Huang Y, He C, Chen C, Zhao J, et al. The tumour microenvironment and metabolism in renal cell carcinoma targeted or immune therapy. J Cell Physiol. 2021;236(3):1616–27.
Pavlovich CP, Schmidt LS.Searching for the hereditary causes of renal‐cell carcinoma. Nat Rev Cancer. 2004;4(5):381–93.
Rini B, Goddard A, Knezevic D, Maddala T, Zhou M, Aydin H, et al. A 16‐gene assay to predict recurrence after surgery in localised renal cell carcinoma: development and validation studies. Lancet Oncol. 2015;16(6):676–85.
Brooks SA, Brannon AR, Parker JS, Fisher JC, Sen O, Kattan MW, et al. ClearCode34: a prognostic risk predictor for localized clear cell renal cell carcinoma. Eur Urol. 2014;66(1):77–84.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 18 March 2022
Revised: 17 June 2022
Accepted: 07 July 2022
Published: 08 August 2022
Issue date: August 2022

Copyright

© 2022 The Authors.

Acknowledgements

None.

Rights and permissions

This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.

Return