Temporal Data and Diabetes Classification in Thailand

Main Article Content

สมภพ ปฐมนพ
กฤษฎา ศรีแผ้ว
ม.ล.กุลธร เกษมสันต์

Abstract

- Diabetes mellitus is a chronic disease that reduces quality of life since it often causes other complications such as heart disease, high blood pressure, neuropathy and the loss of some organs in the body. This work proposes a temporal features extraction model which extracts the features embedded in historical data such as health examination data for classification. The proposed model can be used with any promising classification methods such as Naïve Bayes, Logistic Regression, C4.5 (J48), Bagging and SVMs. This work evaluates the proposed method on health examination data during 2004-2010 (7 years) of factory employees in Thailand. It consists of 43,523 employees in total where 28,808 employees have only one record and 14,715 employees are examined more than once. Resampling with replacement is applied to the dataset for balancing training instances among the classes before proceeding to training process. Features used for diabetes classification are categorized into three groups: Physical Examination, Urinalysis and Biochemistry. The results of experiments show that the data with temporal feature gains higher classification performance than the data without temporal feature.

Article Details

How to Cite
[1]
ปฐมนพ ส., ศรีแผ้ว ก., and เกษมสันต์ ม., “Temporal Data and Diabetes Classification in Thailand”, JIST, vol. 4, no. 1, pp. 49–56, Jun. 2013.
Section
Research Article: Soft Computing (Detail in Scope of Journal)

References

1. National Center for Chronic Disease Prevention and Health Promotion. National Diabetes Fact Sheet. Online]. Available:https://www.cdc.gov/diabetes/pubs/pdf/ndfs_2011.pdf, 2011.

2. สานักงานสารวจสุขภาพประชาชนไทย. “รายงานการสารวจสุขภาพประชาชนไทยโดยการตรวจร่างกาย.” ออนไลน์]. เข้าถึงได้จาก : https://nheso.or.th/loadfile/diabetes_mellitus.pdf, 2554.

3. B. H. Cho, H. Yu, K. Kim, T. H. Kim, I. Y. Kim and S. I. Kim. Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods. Journal Artificial Intelligence in Medicine. 2008, 42 : 37-53.

4. H. N. A. Pham and E. Triantaphyllou. Prediction of Diabetes by Employing a New Data Mining Approach Which Balance Fitting and Generalization. Computer and Information Science. 2008, 131:11-26.

5. K. Takahashi, H. Uchiyama, S. Yanagisawa and I. Kamae. The Logistic Regression and ROC Analysis of Group-based Screening for Predicting Diabetes Incidence in Four Years. The Kobe journal of medical science. 2006, 52 (6): 171-180.

6. B. A. Tama, Rodiyatul F.S. and Hermansyah. An Early Detection Method of Type-2 Diabetes Mellitus in Public Hospital. Proceeding of The International Conference on Informatics, Cybernetic,and Computer Applications. Bangalore. 2010, 9 (2): 287-294.

7. R. Peter and T. Thomas. Temporal Data Classification using Linear Classifiers. Journal Information Systems. 2011, 36 (1): 30-41.

8. G. Parthiban, A. Rajesh, and S. K. Srivatsa. Diagnosis of Heart Disease for Diabetic Patients using Naïve Bayes Method. International Journal of Computer Applications. 24 (2011) : 7-11.

9. World Health Organization. BMI Classification. Online]. Available : https://apps.who.int/bmi/index.jsp? introPage=intro_3.html, 2011.

10. World Health Organization. 2003 World Health Organization (WHO)/International Society of Hypertension (ISH) statement on management of hypertension. Online]. Available : https://www.who.int/ cardiovascular_diseases/guidelines/hypertension/en/, 2011.

11. I. H. Witten, E. Frank. Data mining: Practical Machine Learning Tools and Techniques, 3rd Edition. San Francisco: Morgan Kaufmann, 2011.

12. A. T. Arnholt. Resample with R. Teaching Statistics, 2007, 29(1), 21-26.