Visualizing the Full Spectrum Optimization of K-Nearest Neighbors From Data Preprocessing to Hyperparameter Tuning and K-Fold Validation for Cardiovascular Disease Prediction
Abstract
Cardiovascular disease (CVD) is a prominent cause of death worldwide. This alarming need requires an accurate prediction model using machine learning that can detect and help prevent or mitigate the risk. This study focuses on this issue and has come up with new dimensional capabilities to enhance the KNearest Neighbors (KNN) algorithm to predict cardiovascular diseases at an early stage by incorporating various techniques for data preprocessing and feature selection thereby improving the efficiency of the model. The proposed model identifies the most relevant features using Principal Component Analysis. The main innovation revolves around fine tuning the hyperparameter of K-Nearest Neighbors, specifically the choice of neighbors (K), using a data driven approach to ensure accuracy across different datasets. The performance of the optimized K-Nearest Neighbors algorithm is evaluated using the Framingham heart disease dataset. This model achieved an impressive prediction accuracy of 92.46% and outperformed methods that solely rely on traditional K-Nearest Neighbors. As machine learning techniques plays an important role in the development of prediction models for early detection and prevention of cardiovascular disease, this model can be considered as a valuable tool for healthcare professionals and researchers. The core contribution of this study lies in offering a comprehensive optimization of the traditional K-Nearest Neighbors (KNN) algorithm. This includes robust data preprocessing using the Hampel filter for outlier removal, feature selection through Principal Component Analysis (PCA), and performance enhancement using grid search for hyperparameter tuning combined with 10-fold cross-validation. Unlike prior studies that apply KNN with minimal adjustments, this research emphasizes the importance of an end-to-end machine learning pipeline. This holistic refinement significantly improves the predictive performance and reliability of KNN for cardiovascular disease prediction, achieving 92.46% accuracy on the Framingham dataset.
Full Text:
PDFReferences
E. Maini, B. Venkateswarlu, B. Maini, and D. Marwaha, “Machine learning–based heart disease prediction system for Indian population: An exploratory study done in South India,” Med. J. Armed Forces India, vol. 77, no. 3, pp. 302–311, Jul. 2021, doi: 10.1016/j.mjafi.2020.10.013.
A. Rajdhan, A. Agarwal, M. Sai, D. Ravi, and D. P. Ghuli, “Heart Disease Prediction using Machine Learning,” Int. J. Eng. Res., vol. 9, no. 04.
D. A. Anggoro, “Comparison of Accuracy Level of Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) Algorithms in Predicting Heart Disease,” Int. J. Emerg. Trends Eng. Res., vol. 8, no. 5, pp. 1689–1694, May 2020, doi: 10.30534/ijeter/2020/32852020.
K. Taunk, S. De, S. Verma, and A. Swetapadma, “A Brief Review of Nearest Neighbor Algorithm for Learning and Classification,” in 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India: IEEE, May 2019, pp. 1255–1260. doi: 10.1109/ICCS45141.2019.9065747.
H. Yepdjio and S. Vajda, “Optimization Strategies for the k-Nearest Neighbor Classifier,” SN Comput. Sci., vol. 4, Nov. 2022, doi: 10.1007/s42979-022-01469-3.
M. Muzammal, R. Talat, A. H. Sodhro, and S. Pirbhulal, “A multi-sensor data fusion enabled ensemble approach for medical data from body sensor networks,” Inf. Fusion, vol. 53, pp. 155–164, Jan. 2020, doi: 10.1016/j.inffus.2019.06.021.
H. Yang and J. M. Garibaldi, “A hybrid model for automatic identification of risk factors for heart disease,” Suppl. Proc. 2014 I2b2UTHealth Shar.-Tasks Workshop Chall. Nat. Lang. Process. Clin. Data, vol. 58, pp. S171–S182, Dec. 2015, doi: 10.1016/j.jbi.2015.09.006.
V. Nagavallika, “Prediction of Heart Disease Using Machine Learning Techniques,” vol. 4, no. 56, 2022.
A. C. Dimopoulos et al., “Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk,” BMC Med. Res. Methodol., vol. 18, no. 1, p. 179, Dec. 2018, doi: 10.1186/s12874-018-0644-1.
Prof. Madhavi Tota, Manthan Moon, Pranit Nagrale, Akshay Pandav, and Gunjan Das, “Heart Diseases Prediction System using ML,” Int. J. Adv. Res. Sci. Commun. Technol., pp. 337–345, Dec. 2022, doi: 10.48175/IJARSCT-7798.
B. Jin, C. Che, Z. Liu, S. Zhang, X. Yin, and X. Wei, “Predicting the Risk of Heart Failure With EHR Sequential Data Modeling,” IEEE Access, vol. 6, pp. 9256–9261, 2018, doi: 10.1109/ACCESS.2017.2789324.
A. S. S. Kotia, M. Rastogi, and R. A. Bhongade, “Use of machine learning techniques for effective prediction of heart disease,” CARDIOMETRY, no. 26, pp. 315–321, Mar. 2023, doi: 10.18137/cardiometry.2023.26.315321.
D. Shah, S. Patel, and S. K. Bharti, “Heart Disease Prediction using Machine Learning Techniques,” SN Comput. Sci., vol. 1, no. 6, p. 345, Oct. 2020, doi: 10.1007/s42979-020-00365-y.
E. D. Adler et al., “Improving risk prediction in heart failure using machine learning,” Eur. J. Heart Fail., vol. 22, no. 1, pp. 139–147, Jan. 2020, doi: 10.1002/ejhf.1628.
I. M. Pires, G. Marques, N. M. Garcia, and V. Ponciano, “Machine learning for the evaluation of the presence of heart disease,” Procedia Comput. Sci., vol. 177, pp. 432–437, 2020, doi: 10.1016/j.procs.2020.10.058.
M. M. Ali, B. K. Paul, K. Ahmed, F. M. Bui, J. M. W. Quinn, and M. A. Moni, “Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison,” Comput. Biol. Med., vol. 136, p. 104672, Sep. 2021, doi: 10.1016/j.compbiomed.2021.104672.
H. Kahramanli and N. Allahverdi, “Design of a hybrid system for the diabetes and heart diseases,” Expert Syst. Appl., vol. 35, no. 1–2, pp. 82–89, Jul. 2008, doi: 10.1016/j.eswa.2007.06.004.
A. Kondababu, V. Siddhartha, B. B. Kumar, and B. Penumutchi, “WITHDRAWN: A comparative study on machine learning based heart disease prediction,” 2021.
S. F. Waris and S. Koteeswaran, “Heart disease early prediction using a novel machine learning method called improved K-means neighbor classifier in python,” Mater. Today Proc., 2021.
R. Gopal and V. Ranganathan, “Evaluation of effect of unsupervised dimensionality reduction techniques on automated arrhythmia classification,” Biomed. Signal Process. Control, vol. 34, pp. 1–8, Apr. 2017, doi: 10.1016/j.bspc.2016.12.017.
I. Guyon, S. R. Gunn, M. Nikravesh, and L. Zadeh, “Feature extraction: foundations and applications,” Jan. 2006.
N. R. Ratnasari, A. Susanto, I. Soesanti, and Maesadji, “Thoracic X-ray features extraction using thresholding-based ROI template and PCA-based features selection for lung TB classification purposes,” in 2013 3rd International Conference on Instrumentation, Communications, Information Technology and Biomedical Engineering (ICICI-BME), Nov. 2013, pp. 65–69. doi: 10.1109/ICICI-BME.2013.6698466.
P. Kamencay, R. Hudec, M. Benco, and M. Zachariasova, “Feature extraction for object recognition using PCA-KNN with application to medical image analysis,” in 2013 36th International Conference on Telecommunications and Signal Processing (TSP), Jul. 2013, pp. 830–834. doi: 10.1109/TSP.2013.6614055.
Yuda Syahidin, Aditya Pratama Ismail, and Fawwaz Nafis Siraj, “Application of Artificial Neural Network Algorithms to Heart Disease Prediction Models with Python Programming,” J. E-Komtek Elektro-Komput.-Tek., vol. 6, no. 2, pp. 292–302, Dec. 2022, doi: 10.37339/e-komtek.v6i2.932.
Yichun Wang, “Heart disease prediction with discriminative deep neural network,” presented at the Proc.SPIE, May 2023, p. 126401P. doi: 10.1117/12.2673756.
S. .S, S. Lavanya, M. R. Chandhini, R. Bharathi, and K. Madhulekha, “Hybrid Machine Learning Techniques for Heart Disease Prediction,” Int. J. Adv. Eng. Res. Sci., vol. 7, pp. 44–48, Jan. 2020, doi: 10.22161/ijaers.73.7.
N. A. Rajendran and D. R. Vincent, “Heart disease prediction system using ensemble of machine learning algorithms,” Recent Pat. Eng., vol. 15, no. 2, pp. 130–139, 2021.
M. Sultana and A. Haider, “Heart Disease Prediction using WEKA tool and 10-Fold cross-validation,” presented at the The institute of electrical and electronics engineers, 2017, pp. 6766–6773.
O. Akbilgic et al., “ARTIFICIAL INTELLIGENCE APPLIED TO ECG IMPROVES HEART FAILURE PREDICTION ACCURACY,” ACC.21, vol. 77, no. 18, Supplement 1, p. 3045, May 2021, doi: 10.1016/S0735-1097(21)04400-4.
O. W. Samuel et al., “A new technique for the prediction of heart failure risk driven by hierarchical neighborhood component-based learning and adaptive multi-layer networks,” Future Gener. Comput. Syst., vol. 110, pp. 781–794, Sep. 2020, doi: 10.1016/j.future.2019.10.034.
S. Alagarsamy, K. Kamatchi, K. Selvaraj, A. Subramanian, L. R. Fernando, and R. Kirthikaa, “Identification of Brain Tumor using Deep Learning Neural Networks,” in 2019 IEEE International Conference on Clean Energy and Energy Efficient Electronics Circuit for Sustainable Development (INCCES), Dec. 2019, pp. 1–5. doi: 10.1109/INCCES47820.2019.9167685.
K. Kartheeban, K. Kalyani, S. K. Bommavaram, D. Rohatgi, M. N. Kathiravan, and S. Saravanan, “Intelligent Deep Residual Network based Brain Tumor Detection and Classification,” in 2022 International Conference on Automation, Computing and Renewable Systems (ICACRS), Dec. 2022, pp. 785–790. doi: 10.1109/ICACRS55517.2022.10029146.
V. Shankar, V. Kumar, U. Devagade, V. Karanth, and K. Rohitaksha, “Heart Disease Prediction Using CNN Algorithm,” SN Comput. Sci., vol. 1, no. 3, p. 170, May 2020, doi: 10.1007/s42979-020-0097-6.
A. Dutta, T. Batabyal, M. Basu, and S. T. Acton, “An efficient convolutional neural network for coronary heart disease prediction,” Expert Syst. Appl., vol. 159, p. 113408, Nov. 2020, doi: 10.1016/j.eswa.2020.113408.
S. A. H. Fazlur and S. K. Thillaigovindan, “Integrated Deep Learning Model for Heart Disease Prediction Using Variant Medical Data Sets,” Int. J. Online Biomed. Eng. IJOE, vol. 18, no. 09, pp. 178–191, Jul. 2022, doi: 10.3991/ijoe.v18i09.30801.
M. Sudipta, E. Abdel-Raheem, and L. Rueda, Heart Disease Prediction Using Adaptive Infinite Feature Selection and Deep Neural Networks. 2022, p. 240. doi: 10.1109/ICAIIC54071.2022.9722652.
K. V. Reddy, I. Elamvazuthi, A. A. Aziz, S. Paramasivam, H. N. Chua, and S. Pranavanand, “Heart Disease Risk Prediction Using Machine Learning Classifiers with Attribute Evaluators,” Appl. Sci., vol. 11, no. 18, 2021, doi: 10.3390/app11188352.
S. Ahmed et al., “Prediction of Cardiovascular Disease on Self-Augmented Datasets of Heart Patients Using Multiple Machine Learning Models,” J. Sens., vol. 2022, p. 3730303, Dec. 2022, doi: 10.1155/2022/3730303.
K. V. V. Reddy, I. Elamvazuthi, A. A. Aziz, S. Paramasivam, and H. N. Chua, “Heart Disease Risk Prediction using Machine Learning with Principal Component Analysis,” in 2020 8th International Conference on Intelligent and Advanced Systems (ICIAS), Jul. 2021, pp. 1–6. doi: 10.1109/ICIAS49414.2021.9642676.
S. Mohan, C. Thirumalai, and G. Srivastava, “Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques,” IEEE Access, vol. 7, pp. 81542–81554, 2019, doi: 10.1109/ACCESS.2019.2923707.
A. B. Ambrews, E. Gubin Moung, A. Farzamnia, F. Yahya, S. Omatu, and L. Angeline, “Ensemble Based Machine Learning Model for Heart Disease Prediction,” in 2022 International Conference on Communications, Information, Electronic and Energy Systems (CIEES), Nov. 2022, pp. 1–6. doi: 10.1109/CIEES55704.2022.9990665.
S. P. Patro, N. Padhy, and R. D. Sah, “Classification model for heart disease prediction using correlation and feature selection techniques,” in 2022 OITS International Conference on Information Technology (OCIT), Dec. 2022, pp. 29–34. doi: 10.1109/OCIT56763.2022.00016.
M. I. Ahmed and F. Shefaq, “A Study on Machine Learning and Supervised and Deep Learning Algorithms to Predict the Risk of Patients: Ten Year Coronary Heart Disease,” Int. J. Pract. Healthc. Innov. Manag. Tech. IJPHIMT, vol. 9, no. 1, pp. 1–12, 2022, doi: 10.4018/IJPHIMT.305127.
S. Patro and Dr. N. Padhy, “An RHMIoT Framework for Cardiovascular Disease Prediction and Severity Level Using Machine Learning and Deep Learning Algorithms,” Int. J. Ambient Comput. Intell., vol. 13, pp. 1–37, Jan. 2022, doi: 10.4018/IJACI.311062.
R. Aggrawal and S. Pal, “Elimination and Backward Selection of Features (P-Value Technique) In Prediction of Heart Disease by Using Machine Learning Algorithms,” Turk. J. Comput. Math. Educ. TURCOMAT, vol. 12, pp. 2650–2665, Apr. 2021, doi: 10.17762/turcomat.v12i6.5765.
A. Garg, B. Sharma, and R. Khan, “Heart disease prediction using machine learning techniques,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1022, no. 1, p. 012046, Jan. 2021, doi: 10.1088/1757-899X/1022/1/012046.
S. Yousefi and M. Poornajaf, “Analysis of Accuracy Metric of Machine Learning Algorithms in Predicting Heart Disease,” Front. Health Inform. Vol 12 2023 Contin. Vol. - 1030699fhiv12i0402, Apr. 2023, [Online]. Available: https://www.ijmi.ir/index.php/IJMI/article/view/402
T. Poojitha and R. Mahaveerakannan, “Prediction Analysis of Novel Random Forest Algorithm and K Nearest Neighbor Algorithm in Heart Disease Prediction with an Improved Accuracy Rate,” CARDIOMETRY, no. 25, pp. 1554–1561, Feb. 2023, doi: 10.18137/cardiometry.2022.25.15541561.
W. A. Mahmoud and D. M. Aborizka, “Heart Disease Prediction Using Machine Learning and Data Mining Techniques: Application of Framingham Dataset,” 2021.
A. Chanchal, A. S. Singh, and K. Anandhan, “A Modern Comparison of ML Algorithms for Cardiovascular Disease Prediction,” in 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Sep. 2021, pp. 1–5. doi: 10.1109/ICRITO51393.2021.9596228.
R. Ahmed, M. Bibi, and S. Syed3, “Improving Heart Disease Prediction Accuracy Using a Hybrid Machine Learning Approach: A Comparative study of SVM and KNN Algorithms,” Int. J. Comput. Inf. Manuf. IJCIM, vol. 3, p. 2023, Jun. 2023, doi: 10.54489/ijcim.v3i1.223.
H. Pallathadka, M. Naved, K. Phasinam, and M. M. Arcinas, “A Machine Learning Based Framework for Heart Disease Detection,” ECS Trans., vol. 107, no. 1, pp. 8667–8673, Apr. 2022, doi: 10.1149/10701.8667ecst.
C. Gupta, A. Saha, N. V. Subba Reddy, and U. Dinesh Acharya, “Cardiac Disease Prediction using Supervised Machine Learning Techniques.,” J. Phys. Conf. Ser., vol. 2161, no. 1, p. 012013, Jan. 2022, doi: 10.1088/1742-6596/2161/1/012013.
M. Pal, S. Parija, G. Panda, K. Dhama, and R. K. Mohapatra, “Risk prediction of cardiovascular disease using machine learning classifiers,” Open Med., vol. 17, no. 1, pp. 1100–1113, Jun. 2022, doi: 10.1515/med-2022-0508.
C. M. Bhatt, P. Patel, T. Ghetia, and P. L. Mazzeo, “Effective Heart Disease Prediction Using Machine Learning Techniques,” Algorithms, vol. 16, no. 2, p. 88, Feb. 2023, doi: 10.3390/a16020088.
DOI: https://doi.org/10.31449/inf.v49i2.7774

This work is licensed under a Creative Commons Attribution 3.0 License.