Visualizing the Full Spectrum Optimization of K-Nearest Neighbors From Data Preprocessing to Hyperparameter Tuning and K-Fold Validation for Cardiovascular Disease Prediction
Abstract
Cardiovascular disease (CVD) is a prominent cause of death worldwide. This alarming need requires an accurate prediction model using machine learning that can detect and help prevent or mitigate the risk. This study focuses on this issue and has come up with new dimensional capabilities to enhance the KNearest Neighbors (KNN) algorithm to predict cardiovascular diseases at an early stage by incorporating various techniques for data preprocessing and feature selection thereby improving the efficiency of the model. The proposed model identifies the most relevant features using Principal Component Analysis. The main innovation revolves around fine tuning the hyperparameter of K-Nearest Neighbors, specifically the choice of neighbors (K), using a data driven approach to ensure accuracy across different datasets. The performance of the optimized K-Nearest Neighbors algorithm is evaluated using the Framingham heart disease dataset. This model achieved an impressive prediction accuracy of 92.46% and outperformed methods that solely rely on traditional K-Nearest Neighbors. As machine learning techniques plays an important role in the development of prediction models for early detection and prevention of cardiovascular disease, this model can be considered as a valuable tool for healthcare professionals and researchers. The core contribution of this study lies in offering a comprehensive optimization of the traditional K-Nearest Neighbors (KNN) algorithm. This includes robust data preprocessing using the Hampel filter for outlier removal, feature selection through Principal Component Analysis (PCA), and performance enhancement using grid search for hyperparameter tuning combined with 10-fold cross-validation. Unlike prior studies that apply KNN with minimal adjustments, this research emphasizes the importance of an end-to-end machine learning pipeline. This holistic refinement significantly improves the predictive performance and reliability of KNN for cardiovascular disease prediction, achieving 92.46% accuracy on the Framingham dataset.References
E. Maini, B. Venkateswarlu, B. Maini, and D. Marwaha, “Machine learning–based heart disease prediction system for Indian population: An exploratory study done in South India,” Med. J. Armed Forces India, vol. 77, no. 3, pp. 302–311, Jul. 2021, doi: 10.1016/j.mjafi.2020.10.013.
A. Rajdhan, A. Agarwal, M. Sai, D. Ravi, and D. P. Ghuli, “Heart Disease Prediction using Machine Learning,” Int. J. Eng. Res., vol. 9, no. 04.
D. A. Anggoro, “Comparison of Accuracy Level of Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) Algorithms in Predicting Heart Disease,” Int. J. Emerg. Trends Eng. Res., vol. 8, no. 5, pp. 1689–1694, May 2020, doi: 10.30534/ijeter/2020/32852020.
K. Taunk, S. De, S. Verma, and A. Swetapadma, “A Brief Review of Nearest Neighbor Algorithm for Learning and Classification,” in 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India: IEEE, May 2019, pp. 1255–1260. doi: 10.1109/ICCS45141.2019.9065747.
H. Yepdjio and S. Vajda, “Optimization Strategies for the k-Nearest Neighbor Classifier,” SN Comput. Sci., vol. 4, Nov. 2022, doi: 10.1007/s42979-022-01469-3.
M. Muzammal, R. Talat, A. H. Sodhro, and S. Pirbhulal, “A multi-sensor data fusion enabled ensemble approach for medical data from body sensor networks,” Inf. Fusion, vol. 53, pp. 155–164, Jan. 2020, doi: 10.1016/j.inffus.2019.06.021.
H. Yang and J. M. Garibaldi, “A hybrid model for automatic identification of risk factors for heart disease,” Suppl. Proc. 2014 I2b2UTHealth Shar.-Tasks Workshop Chall. Nat. Lang. Process. Clin. Data, vol. 58, pp. S171–S182, Dec. 2015, doi: 10.1016/j.jbi.2015.09.006.
V. Nagavallika, “Prediction of Heart Disease Using Machine Learning Techniques,” vol. 4, no. 56, 2022.
A. C. Dimopoulos et al., “Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk,” BMC Med. Res. Methodol., vol. 18, no. 1, p. 179, Dec. 2018, doi: 10.1186/s12874-018-0644-1.
Prof. Madhavi Tota, Manthan Moon, Pranit Nagrale, Akshay Pandav, and Gunjan Das, “Heart Diseases Prediction System using ML,” Int. J. Adv. Res. Sci. Commun. Technol., pp. 337–345, Dec. 2022, doi: 10.48175/IJARSCT-7798.
B. Jin, C. Che, Z. Liu, S. Zhang, X. Yin, and X. Wei, “Predicting the Risk of Heart Failure With EHR Sequential Data Modeling,” IEEE Access, vol. 6, pp. 9256–9261, 2018, doi: 10.1109/ACCESS.2017.2789324.
A. S. S. Kotia, M. Rastogi, and R. A. Bhongade, “Use of machine learning techniques for effective prediction of heart disease,” CARDIOMETRY, no. 26, pp. 315–321, Mar. 2023, doi: 10.18137/cardiometry.2023.26.315321.
D. Shah, S. Patel, and S. K. Bharti, “Heart Disease Prediction using Machine Learning Techniques,” SN Comput. Sci., vol. 1, no. 6, p. 345, Oct. 2020, doi: 10.1007/s42979-020-00365-y.
E. D. Adler et al., “Improving risk prediction in heart failure using machine learning,” Eur. J. Heart Fail., vol. 22, no. 1, pp. 139–147, Jan. 2020, doi: 10.1002/ejhf.1628.
I. M. Pires, G. Marques, N. M. Garcia, and V. Ponciano, “Machine learning for the evaluation of the presence of heart disease,” Procedia Comput. Sci., vol. 177, pp. 432–437, 2020, doi: 10.1016/j.procs.2020.10.058.
M. M. Ali, B. K. Paul, K. Ahmed, F. M. Bui, J. M. W. Quinn, and M. A. Moni, “Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison,” Comput. Biol. Med., vol. 136, p. 104672, Sep. 2021, doi: 10.1016/j.compbiomed.2021.104672.
H. Kahramanli and N. Allahverdi, “Design of a hybrid system for the diabetes and heart diseases,” Expert Syst. Appl., vol. 35, no. 1–2, pp. 82–89, Jul. 2008, doi: 10.1016/j.eswa.2007.06.004.
A. Kondababu, V. Siddhartha, B. B. Kumar, and B. Penumutchi, “WITHDRAWN: A comparative study on machine learning based heart disease prediction,” 2021.
S. F. Waris and S. Koteeswaran, “Heart disease early prediction using a novel machine learning method called improved K-means neighbor classifier in python,” Mater. Today Proc., 2021.
R. Gopal and V. Ranganathan, “Evaluation of effect of unsupervised dimensionality reduction techniques on automated arrhythmia classification,” Biomed. Signal Process. Control, vol. 34, pp. 1–8, Apr. 2017, doi: 10.1016/j.bspc.2016.12.017.
I. Guyon, S. R. Gunn, M. Nikravesh, and L. Zadeh, “Feature extraction: foundations and applications,” Jan. 2006.
N. R. Ratnasari, A. Susanto, I. Soesanti, and Maesadji, “Thoracic X-ray features extraction using thresholding-based ROI template and PCA-based features selection for lung TB classification purposes,” in 2013 3rd International Conference on Instrumentation, Communications, Information Technology and Biomedical Engineering (ICICI-BME), Nov. 2013, pp. 65–69. doi: 10.1109/ICICI-BME.2013.6698466.
P. Kamencay, R. Hudec, M. Benco, and M. Zachariasova, “Feature extraction for object recognition using PCA-KNN with application to medical image analysis,” in 2013 36th International Conference on Telecommunications and Signal Processing (TSP), Jul. 2013, pp. 830–834. doi: 10.1109/TSP.2013.6614055.
Yuda Syahidin, Aditya Pratama Ismail, and Fawwaz Nafis Siraj, “Application of Artificial Neural Network Algorithms to Heart Disease Prediction Models with Python Programming,” J. E-Komtek Elektro-Komput.-Tek., vol. 6, no. 2, pp. 292–302, Dec. 2022, doi: 10.37339/e-komtek.v6i2.932.
Yichun Wang, “Heart disease prediction with discriminative deep neural network,” presented at the Proc.SPIE, May 2023, p. 126401P. doi: 10.1117/12.2673756.
S. .S, S. Lavanya, M. R. Chandhini, R. Bharathi, and K. Madhulekha, “Hybrid Machine Learning Techniques for Heart Disease Prediction,” Int. J. Adv. Eng. Res. Sci., vol. 7, pp. 44–48, Jan. 2020, doi: 10.22161/ijaers.73.7.
N. A. Rajendran and D. R. Vincent, “Heart disease prediction system using ensemble of machine learning algorithms,” Recent Pat. Eng., vol. 15, no. 2, pp. 130–139, 2021.
M. Sultana and A. Haider, “Heart Disease Prediction using WEKA tool and 10-Fold cross-validation,” presented at the The institute of electrical and electronics engineers, 2017, pp. 6766–6773.
O. Akbilgic et al., “ARTIFICIAL INTELLIGENCE APPLIED TO ECG IMPROVES HEART FAILURE PREDICTION ACCURACY,” ACC.21, vol. 77, no. 18, Supplement 1, p. 3045, May 2021, doi: 10.1016/S0735-1097(21)04400-4.
O. W. Samuel et al., “A new technique for the prediction of heart failure risk driven by hierarchical neighborhood component-based learning and adaptive multi-layer networks,” Future Gener. Comput. Syst., vol. 110, pp. 781–794, Sep. 2020, doi: 10.1016/j.future.2019.10.034.
S. Alagarsamy, K. Kamatchi, K. Selvaraj, A. Subramanian, L. R. Fernando, and R. Kirthikaa, “Identification of Brain Tumor using Deep Learning Neural Networks,” in 2019 IEEE International Conference on Clean Energy and Energy Efficient Electronics Circuit for Sustainable Development (INCCES), Dec. 2019, pp. 1–5. doi: 10.1109/INCCES47820.2019.9167685.
K. Kartheeban, K. Kalyani, S. K. Bommavaram, D. Rohatgi, M. N. Kathiravan, and S. Saravanan, “Intelligent Deep Residual Network based Brain Tumor Detection and Classification,” in 2022 International Conference on Automation, Computing and Renewable Systems (ICACRS), Dec. 2022, pp. 785–790. doi: 10.1109/ICACRS55517.2022.10029146.
V. Shankar, V. Kumar, U. Devagade, V. Karanth, and K. Rohitaksha, “Heart Disease Prediction Using CNN Algorithm,” SN Comput. Sci., vol. 1, no. 3, p. 170, May 2020, doi: 10.1007/s42979-020-0097-6.
A. Dutta, T. Batabyal, M. Basu, and S. T. Acton, “An efficient convolutional neural network for coronary heart disease prediction,” Expert Syst. Appl., vol. 159, p. 113408, Nov. 2020, doi: 10.1016/j.eswa.2020.113408.
S. A. H. Fazlur and S. K. Thillaigovindan, “Integrated Deep Learning Model for Heart Disease Prediction Using Variant Medical Data Sets,” Int. J. Online Biomed. Eng. IJOE, vol. 18, no. 09, pp. 178–191, Jul. 2022, doi: 10.3991/ijoe.v18i09.30801.
M. Sudipta, E. Abdel-Raheem, and L. Rueda, Heart Disease Prediction Using Adaptive Infinite Feature Selection and Deep Neural Networks. 2022, p. 240. doi: 10.1109/ICAIIC54071.2022.9722652.
K. V. Reddy, I. Elamvazuthi, A. A. Aziz, S. Paramasivam, H. N. Chua, and S. Pranavanand, “Heart Disease Risk Prediction Using Machine Learning Classifiers with Attribute Evaluators,” Appl. Sci., vol. 11, no. 18, 2021, doi: 10.3390/app11188352.
S. Ahmed et al., “Prediction of Cardiovascular Disease on Self-Augmented Datasets of Heart Patients Using Multiple Machine Learning Models,” J. Sens., vol. 2022, p. 3730303, Dec. 2022, doi: 10.1155/2022/3730303.
K. V. V. Reddy, I. Elamvazuthi, A. A. Aziz, S. Paramasivam, and H. N. Chua, “Heart Disease Risk Prediction using Machine Learning with Principal Component Analysis,” in 2020 8th International Conference on Intelligent and Advanced Systems (ICIAS), Jul. 2021, pp. 1–6. doi: 10.1109/ICIAS49414.2021.9642676.
S. Mohan, C. Thirumalai, and G. Srivastava, “Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques,” IEEE Access, vol. 7, pp. 81542–81554, 2019, doi: 10.1109/ACCESS.2019.2923707.
A. B. Ambrews, E. Gubin Moung, A. Farzamnia, F. Yahya, S. Omatu, and L. Angeline, “Ensemble Based Machine Learning Model for Heart Disease Prediction,” in 2022 International Conference on Communications, Information, Electronic and Energy Systems (CIEES), Nov. 2022, pp. 1–6. doi: 10.1109/CIEES55704.2022.9990665.
S. P. Patro, N. Padhy, and R. D. Sah, “Classification model for heart disease prediction using correlation and feature selection techniques,” in 2022 OITS International Conference on Information Technology (OCIT), Dec. 2022, pp. 29–34. doi: 10.1109/OCIT56763.2022.00016.
M. I. Ahmed and F. Shefaq, “A Study on Machine Learning and Supervised and Deep Learning Algorithms to Predict the Risk of Patients: Ten Year Coronary Heart Disease,” Int. J. Pract. Healthc. Innov. Manag. Tech. IJPHIMT, vol. 9, no. 1, pp. 1–12, 2022, doi: 10.4018/IJPHIMT.305127.
S. Patro and Dr. N. Padhy, “An RHMIoT Framework for Cardiovascular Disease Prediction and Severity Level Using Machine Learning and Deep Learning Algorithms,” Int. J. Ambient Comput. Intell., vol. 13, pp. 1–37, Jan. 2022, doi: 10.4018/IJACI.311062.
R. Aggrawal and S. Pal, “Elimination and Backward Selection of Features (P-Value Technique) In Prediction of Heart Disease by Using Machine Learning Algorithms,” Turk. J. Comput. Math. Educ. TURCOMAT, vol. 12, pp. 2650–2665, Apr. 2021, doi: 10.17762/turcomat.v12i6.5765.
A. Garg, B. Sharma, and R. Khan, “Heart disease prediction using machine learning techniques,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1022, no. 1, p. 012046, Jan. 2021, doi: 10.1088/1757-899X/1022/1/012046.
S. Yousefi and M. Poornajaf, “Analysis of Accuracy Metric of Machine Learning Algorithms in Predicting Heart Disease,” Front. Health Inform. Vol 12 2023 Contin. Vol. - 1030699fhiv12i0402, Apr. 2023, [Online]. Available: https://www.ijmi.ir/index.php/IJMI/article/view/402
T. Poojitha and R. Mahaveerakannan, “Prediction Analysis of Novel Random Forest Algorithm and K Nearest Neighbor Algorithm in Heart Disease Prediction with an Improved Accuracy Rate,” CARDIOMETRY, no. 25, pp. 1554–1561, Feb. 2023, doi: 10.18137/cardiometry.2022.25.15541561.
W. A. Mahmoud and D. M. Aborizka, “Heart Disease Prediction Using Machine Learning and Data Mining Techniques: Application of Framingham Dataset,” 2021.
A. Chanchal, A. S. Singh, and K. Anandhan, “A Modern Comparison of ML Algorithms for Cardiovascular Disease Prediction,” in 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Sep. 2021, pp. 1–5. doi: 10.1109/ICRITO51393.2021.9596228.
R. Ahmed, M. Bibi, and S. Syed3, “Improving Heart Disease Prediction Accuracy Using a Hybrid Machine Learning Approach: A Comparative study of SVM and KNN Algorithms,” Int. J. Comput. Inf. Manuf. IJCIM, vol. 3, p. 2023, Jun. 2023, doi: 10.54489/ijcim.v3i1.223.
H. Pallathadka, M. Naved, K. Phasinam, and M. M. Arcinas, “A Machine Learning Based Framework for Heart Disease Detection,” ECS Trans., vol. 107, no. 1, pp. 8667–8673, Apr. 2022, doi: 10.1149/10701.8667ecst.
C. Gupta, A. Saha, N. V. Subba Reddy, and U. Dinesh Acharya, “Cardiac Disease Prediction using Supervised Machine Learning Techniques.,” J. Phys. Conf. Ser., vol. 2161, no. 1, p. 012013, Jan. 2022, doi: 10.1088/1742-6596/2161/1/012013.
M. Pal, S. Parija, G. Panda, K. Dhama, and R. K. Mohapatra, “Risk prediction of cardiovascular disease using machine learning classifiers,” Open Med., vol. 17, no. 1, pp. 1100–1113, Jun. 2022, doi: 10.1515/med-2022-0508.
C. M. Bhatt, P. Patel, T. Ghetia, and P. L. Mazzeo, “Effective Heart Disease Prediction Using Machine Learning Techniques,” Algorithms, vol. 16, no. 2, p. 88, Feb. 2023, doi: 10.3390/a16020088.
DOI:
https://doi.org/10.31449/inf.v49i2.7774Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







