Prediction of Heart Diseases Using Data Mining Algorithms

Data mining has been successfully used in numerous businesses and sectors as a result of its success in great visible areas like e-commerce and marketing. Healthcare is one of the recently identified industries. The healthcare sector continues to be "information-rich." Healthcare systems have access to a multitude of datasets and can use them to find hidden links and trends in data. There aren't enough efficient analysis tools, though. The dataset is analyzed using various machine learning algorithms, i


Introduction
Data mining technology offers a user-focused method for discovering new and hidden patterns in medical data sets.Medical data mining has a lot of potential for clinical diagnosis, and these patterns can be used.However, the readily accessible raw medical data is dispersed, diverse, and substantial.This data must be gathered in a structured manner.A hospital information system can then be created by integrating the acquired data.According to the World Health Organization, heart disease kills 12 million people every year.Cardiovascular illnesses are responsible for half of all the United States and other affluent nations' fatalities.It is also the main factor in deaths in several countries [1].The main cause of death in the worldwide is heart disease.In the US, a person dies of heart disease every 34 seconds.There are some types of cardiac disorders, including cardiomyopathy, coronary heart disease, and cardiovascular disease.The term "cardiovascular disease" refers to a broad spectrum of disorders that have an impact on the heart, blood arteries, and how the body pumps and circulates blood.Cardiovascular disease (CVD) causes a variety of ailments, disabilities, and fatalities.Disease diagnosis is a crucial and complex task in medicine [2].Medical diagnosis is thought of as a significant but challenging duty that must be carried out precisely and effectively.This system would greatly benefit from automation.An automatic medical diagnosis system would likely be incredibly helpful.Clinical tests can be conducted at a lower cost with the help of suitable computer-based information and/or decision support systems.A comparison study of many methodologies available is necessary for the effective and precise implementation of automated systems.In this research, different ways of using predictive and descriptive data mining to diagnose heart disease are looked at [3].

Technology for data mining
An artificial neural network (ANN) is a mathematical or computational model that is based on the structural and functional characteristics of biological brain networks.They derived their inspiration from the type of computation carried out by the human brain.ANN is a network of synthetic neurons that uses a connectionist method of computation to analyze input.According to the basic connection principle, mental processes can be modelled as networks of simple, typically uniform units that are interconnected.During the learning phase, ANN frequently acts as an adaptive system, changing its structure in response to external or internal data.In order to find patterns from sets of data, modern neural networks are frequently used to describe complicated interactions between inputs and outputs [4].ANN is seen as a nonlinear statistical data modelling tool.It is made up of numerous extremely linked small processing units (artificial neurons).Data is input into ANN using a model of the human brain.An extensive training set is required because ANN is an iterative process.Its unique capability is to extract patterns and directions from complicated data that are too challenging for humans or other computer abilities to identify [5].In the medical field, medical devices can be monitored by artificial neural networks, which include continuous updating of many requirements, such as heart rate, blood pressure, etc. Neural networks can be trained to learn a classification task and to predict diseases [6].In the medical field, medical devices can be monitored by artificial neural networks, which include continuous updating of many requirements, such as heart rate, blood pressure, etc. Neural networks can be trained to learn a classification task and to predict diseases [7].

Decision tree
Data mining software is essential to the process of discovering knowledge, It uncovers important hidden information.To create fresh target patterns, vast data collection can also be processed.Decision trees are used in many fields, including machine learning, information extraction, applications in biomedicine, and categorization research in science.Systems that produce classifiers are one of the most widely used data mining techniques.Data classification algorithms in data mining can process a large volume of data or knowledge [8].It can be applied to infer conclusions about category class names, to categorize information according to training materials and class descriptions, and to categorize newly available machine learning techniques for data classification containing multiple algorithms, and this work used the general decision tree algorithm [9].The decision tree can process nominal and numerical data simultaneously, can be visually explained, visually analyzed, and easily extract rules.When the data set is tested, the decision tree's size is independent of the database size, its running speed is relatively quick, and it can be extended to large databases.The decision tree does not require more expertise in the subject.Fast and simple to understand.Decision trees can handle a variety of data types, including binary, real, ordinal, and nominal values [10].

Support vector machine
Support vector machines are a supervised machine learning method It functions both as a predictor and a classifier; it locates a hyper-plane in the feature space for categorization that distinguishes between classes [11].After that, the test data points are mapped in the same area and are categorized according to either side of a wide margin [12].

Heart disease data
We pick a dataset from UCI Machine Learning and download it [13].We present 13 attributes in this database were extracted from a larger set of 75.The dataset, which includes 13 variables related to heart disease, was created using data from 270 individuals, some of whom were diagnosed with heart disease.While others were not.It is thought that 14 characteristics are a class.Data analysis aims to determine whether or not there is heart disease (1 is none and 2 is present).Three classifiers were utilized in the procedures to identify the new suspect patient's condition.

Results
Apply classification models to the following steps that have been taken with the Rapid Miner Framework: split validation for training and test data.90% of the data is used for training and 10% is used for testing in the ANN classifier The model is then optimized for maximum performance, and the ANN's class detection accuracy is improved by using a confusion matrix.With the first step's default configuration to get the accuracy.We have some steps to configuration.The first step is to add only one hidden layer and increase its neuron count.The second step is to check shuffle data.We normalize values, and we do some steps in other models, like SVM and Decision Tree, Table 1 shows Confusion Matrix for ANN.By adding the new case's results to the training set, our model will get better.After some time, the amount of training data we have will grow.After many more steps, there will be two types of records in the training data.The original data collect before but wasn't check it by a doctor, and the other records have been checked one after one.More verified data records will make the model more accurate, and the training data will also continue to grow.We can make less mistakes in training data if we add new patient data that has been checked.SVM and doctors classify this information about patients.The results of the SVM classification algorithm are much better than those of ANN classification.There is a clear difference in how accurate they are.When we go beyond the training data, the following Table 3  On the other hand, the experiment in Table 3 shows that the SVM classifier gives the best results.By comparing the tables, the result of SVM is a better model than other algorithms for classifying heart disease shown in Table 4. so, we have done some tests to see how well and how practical different classification algorithms are for making predictions about Heart Patients shows in Figure 3.And in the medical field, getting things accurate is very important.Figure 2 shows the Diagnoses Model.

Compare the results
We describe three key categorization models, decision trees, artificial neural networks, support vector machines and using overfitting and hyper parameters to forecast and identify disease.SVM classifier gives the best result that mean SVM was more accurate than ANN Artificial Neural Networks and Decision Tree by 88.89% and we obtain ANN with the greatest accuracy of 82.72%, and the decision tree is (81.48%).
The best model that can be used for achieving the results is support vector machines SVM.

Conclusion
This project included research about one of the most wellknown data mining tasks.The main objective of this study was to assess if a person has heart disease or not by comparing three classification algorithms.Since more informative models produce more accurate results, we use SVM, which is more accurate than ANN or decision trees.
We describe three key categorization models, decision trees, artificial neural networks, support vector machines and using overfitting and hyper parameters to forecast and identify disease.Overfitting conditions can result from tedious configuration operations, such as setting arguments.Additionally, our experimental results showed that train sets and test sets of data determined model performance and accuracy to evaluate the model's correctness, we employ a confusion matrix.Therefore, the same factors can be utilized to diagnose a state.270 instances of dataset are used in this study's experiments, which are carried out using RapidMiner and validated using split validation techniques.We conclude from our experiments that, when used to solve the classification issue for the heart disease data analysis task, the SVM classification model performs more accurately than ANN and decision trees, which use sequential minimal optimization.We conducted these tests to make predictions about human health and whether or not he has heart disease.According to the computer's learning theory, the system may forecast new unclassified circumstances after learning from previously classified data.

Figure 1 showsFigure 1 :
Figure 1 shows Confusion Matrix of ANN, SVM, and Decision Tree

Table 1 :
Confusion Matrix for ANNTable2shows the Confusion Matrix for Support vector machine.

Table 2 :
Confusion Matrix for SVM Table 3 shows the Confusion Matrix of the Decision Tree.

Table 3 :
Confusion Matrix of the Decision Tree shows that SVM is better at diagnosing heart disease than ANN learning and Decision Tree.This happens because the ANN model was trained with some examples.This shows that the ANN model should be used in deep learning with large datasets to get a better result.Regarding accuracy reflex, the results of the two model-based training methods were the same, and the result of decision tree model is less than SVM in accuracy reflex that the model-based the technique in the training.