Machine Learning Algorithms for Predicting CKD Progression: A Real-World Hospital Dataset Analysis
DOI:
https://doi.org/10.65327/kidneys.v15i1.605Keywords:
Chronic Kidney Disease, Machine Learning, Random Forest, SHAP, Clinical Decision Support.Abstract
Background. Chronic Kidney Disease (CKD) is a progressive condition associated with substantial global morbidity and mortality. Early detection remains critical for reducing complications and slowing progression to end-stage kidney disease. Traditional diagnostic approaches depend on laboratory markers that may not fully capture nonlinear interactions among clinical parameters. Machine learning offers promising capabilities for improving early identification and supporting clinical decision-making. Methods. This study developed an end-to-end machine learning framework for CKD prediction using the Early Stage CKD dataset. The workflow included rigorous data preprocessing, exploratory data analysis, and feature engineering prior to model development. A Random Forest classifier was trained using an 80/20 stratified split, and performance was assessed using accuracy, precision, recall, F1-score, confusion matrix, and ROC–AUC. To enhance transparency, SHAP (SHapley Additive exPlanations) analysis was applied to interpret feature contributions and validate clinical relevance. Results. The Random Forest model demonstrated excellent predictive performance, achieving an accuracy of 96.25% and a ROC–AUC of 1.00. The confusion matrix indicated zero false positives and only three false negatives, reflecting strong diagnostic reliability. SHAP analysis identified hemoglobin, serum creatinine, packed cell volume, and specific gravity as the most influential predictors, aligning with established CKD biomarkers. Conclusion. The proposed machine learning framework offers a robust, interpretable approach for early CKD prediction. Its strong performance and explainability make it suitable for integration into real-world clinical decision-support systems, particularly in resource-limited healthcare settings.
Downloads
References
Levey AS, Eckardt KU, Tsukamoto Y, Levin A, Coresh J, Rossert J, Zeeuw DD, Hostetter TH, Lameire N, Eknoyan G. Definition and classification of chronic kidney disease: A position statement from Kidney Disease: Improving Global Outcomes (KDIGO). Kidney Int. 2005;67(6):2089-2100.
Morton R, Webster A, Masson P, Nagler E. Chronic kidney disease. Lancet. (No year provided; please supply year if available.)
Bikbov B, Purcell CA, Levey AS, Smith M, Abdoli A, Abebe M, Adebayo OM, Afarideh M, Agarwal SK, Agudelo-Botero M, Ahmadian E, et al. Global, regional, and national burden of chronic kidney disease, 1990–2017: A systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2020;395(10225):709-733.
Kovesdy CP. Epidemiology of chronic kidney disease: An update 2022. Kidney Int Suppl. 2022;12(1):7-11.
Deng L, Guo S, Liu Y, Zhou Y, Liu Y, Zheng X, Yu X, Shuai P. Global, regional, and national burden of chronic kidney disease and its underlying etiologies: GBD Study 2021. BMC Public Health. 2025;25(1):636.
Xie K, Cao H, Ling S, Zhong J, Chen H, Chen P, Huang R. Global burden of chronic kidney disease, 1990–2021: GBD 2021 analysis. Front Endocrinol. 2025;16:1526482.
Ketteler M, Block GA, Evenepoel P, Fukagawa M, Herzog CA, McCann L, Moe SM, Shroff R, Tonelli MA, Toussaint ND, Vervloet MG. Diagnosis, evaluation, prevention, and treatment of CKD–MBD: KDIGO 2017 guideline update. Ann Intern Med. 2018;168(6):422-430.
Zanchi A, Jehle AW, Lamine F, Vogt B, Czerlau C, Bilz S, Seeger H, de Seigneux S. Diabetic kidney disease in type 2 diabetes: Consensus statement of the Swiss Societies of Diabetes and Nephrology. Swiss Med Wkly. 2023;153(1):40004.
Francis A, Harhay MN, Ong ACM, Tummalapalli SL, Ortiz A, Fogo AB, Fliser D, Roy-Chaudhury P, Fontana M, Nangaku M, Wanner C. Chronic kidney disease and the global public health agenda: An international consensus. Nat Rev Nephrol. 2024;20(7):473-485.
US Renal Data System. USRDS Annual Data Report: Atlas of CKD & ESRD in the United States. NIH NIDDK. 2013.
Debal DA, Sitote TM. Chronic kidney disease prediction using machine learning techniques. J Big Data. 2022;9(1):109.
Islam MA, Majumder MZH, Hussein MA. Chronic kidney disease prediction based on machine learning algorithms. J Pathol Inform. 2023;14:100189.
Subasi A, Alickovic E, Kevric J. Diagnosis of chronic kidney disease by using random forest. In: CMBEBIH 2017; 2017:589-594. Singapore: Springer.
Pal S. Chronic kidney disease prediction using machine learning techniques. Biomed Mater Devices. 2023;1(1):534-540.
Sanmarchi F, Fanconi C, Golinelli D, Gori D, Hernandez-Boussard T, Capodici A. Predict, diagnose, and treat chronic kidney disease with machine learning: A systematic literature review. J Nephrol. 2023;36(4):1101-1117.
Dritsas E, Trigka M. Machine learning techniques for chronic kidney disease risk prediction. Big Data Cogn Comput. 2022;6(3):98.
Mendapara K. Development and evaluation of a chronic kidney disease risk prediction model using random forest. Front Genet. 2024;15:1409755.
Singamsetty S, Ghanta S, Biswas S, Pradhan A. Enhancing machine learning–based forecasting of chronic renal disease with explainable AI. PeerJ Comput Sci. 2024;10:e2291.
Liu P, Liu Y, Liu H, Xiong L, Mei C, Yuan L. A random forest algorithm for assessing CKD risk factors: Observational study. Asian Pac Isl Nurs J. 2024;8:e48378.
Rezk NG, Alshathri S, Sayed A, Hemdan EED. Explainable AI for chronic kidney disease prediction in medical IoT: Integrating GANs and few-shot learning. Bioengineering. 2025;12(4):356.
Rubini LJ, Soundarapandian P, Eswaran P. Early stage chronic kidney disease dataset. UCI Machine Learning Repository. 2015. Available from: https://archive.ics.uci.edu/dataset/336/chronic+kidney+disease
Dritsas E, Trigka M. Machine learning techniques for chronic kidney disease risk prediction. Big Data Cogn Comput. 2022;6(3):98.
Debal DA, Sitote TM. Chronic kidney disease prediction using machine learning techniques. J Big Data. 2022;9(1):109.
Subasi A, Alickovic E, Kevric J. Diagnosis of chronic kidney disease by using random forest. In: CMBEBIH 2017; 2017:589-594. Singapore: Springer.
Dritsas E, Trigka M. Machine learning techniques for chronic kidney disease risk prediction. Big Data Cogn Comput. 2022;6(3):98.
Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4765-4774.Tjoa E, Guan C. A survey on explainable artificial intelligence (XAI): Toward medical XAI. IEEE Trans Neural Netw Learn Syst. 2020;32(11):4793-4813.
Arjaria SK, Rathore AS, Choubey G, Mishra AK. Chronic kidney disease prediction and interpretation using explainable AI. In: International Conference on Machine Intelligence and Smart Systems. 2023:29-44. Cham: Springer.

ISSN 2307-1257
ISSN 2307-1265
















