Abstract
Objective: The aim of this study is to determine whether people have heart disease by using different machine learning algorithms with the data provided by the University of Cleveland.
Method: 303 patient data provided by the University of Cleveland were classified using Gaussian Bayes, K-Nearest Neighbor and Random Forest Algorithms with and without feature scaling. With each algorithm, the data is divided into random training and test sets. This process was repeated 50 times for each algorithm. The test results were subjected to the T-test to check statistical independence.
Results: In this study, 80.52% accuracy with K-Nearest Neighbor algorithm, 80.52% with Gaussian Bayes and 82.50% with Random Forest were observed with data scaling. The results of the three algorithms produced similar values and did not show statistical independence (p> 0.05). Without data scaling, 65.28% accuracy with the K-Nearest Neighbor algorithm, 80.52% with Gaussian Bayes and 82.19% with Random Forest were observed. The test results obtained with three algorithms showed statistical independence.
Conclusions: Although there were data from 303 patients in the study, over 80% accurate prediction was obtained. The presence of endpoints that distort the distribution in the data used results in differences in the methods used. It has been confirmed that much closer estimates can be obtained on a scaled patient data. This study is an example of the use of artificial intelligence in detecting cardiac diseases that pose a risk all over the world. With a more detailed patient data, much higher accuracy rates can be obtained and included in health management processes in the pre-diagnosis of heart disease in the future.