Data mining model and application for stroke prediction: A combination of demographic and medical screening data approach

Main Article Content

Sotarat Thammaboosadee Teerapat Kansadub


This paper presents the data mining process that was used for building a stroke prediction model based on demographic information
and medical screening data. The data that was gathered from a physical therapy center in Thailand comprised of
outpatients’ medical records, medical screening forms, and a target variable. A group of 147 stroke patients and 294 non-stroke
individuals with six demographic predictors were selected for the study. Three classification algorithms were used in the study.
These were; Na¨ıve Bayes, Decision Tree, and Artificial Neural Network (ANN). They were used to analyze the data collected
and the results were compared. They were evaluated by use of a 10-fold cross-validation method. The selection criteria were
primarily measured by accuracy and the area under ROC curve (AUC). The secondary selection criteria were indicated by
False-Positive Rate (FPR) and False-Negative Rate (FNR). The results showed that the best performing algorithm that was
studied was ANN combined with integrated data. This approach have an overall accuracy of 0.84, an AUC of 0.90, a FPR of
0.12 and an FNR of 0.25. The results of the study demonstrated that ANN with the integration of demographic and medical
screening data produced the best predictive performance compared to the other models. This result was found according to
both the primary and secondary model selection criteria.


Article Details

Research Articles