เทคนิคการจำแนกข้อมูลที่พัฒนาสำหรับชุดข้อมูลที่ไม่สมดุลของภาวะข้อเข่าเสื่อมในผู้สูงอายุ

พุทธิพร ธนธรรมเมธี; เยาวเรศ ศิริสถิตย์กุล

PDF

Published: Jul 25, 2019

Keywords:

knee osteoarthritis imbalanced data SMOTE ADASYN Gentleboost

พุทธิพร ธนธรรมเมธี

เยาวเรศ ศิริสถิตย์กุล

Abstract

Abstract

This research aimed to develop a predicting model for the assessment of elderly’s knee osteoarthritis incidence. The 370 personal health records were collected from the osteoarthritis assessed from Ban Han Sub-district Health Promoting Hospital, Thasala district, Nakhon Si Thammarat province. The data are classified into 4 classes: class 0 refers to Excellent (200 records), class 1 refers to Good (115 records), class 2 refers to Moderate (39 records) and class 3 refers to Poor (16 records). In medical diagnosis application, the minority class was the class of primary interest and had a much higher misclassification than the majority class. The difference in total number of class 0 and class 1 from class 2 and class 3 indicated that these data were imbalanced. Thus the predicting model from these imbalanced data might limit the performance. The minority data of class 2 and class 3 were then adjusted by using the method of oversampling through ADASYN (adaptive synthetic sampling technique) and SMOTE (synthetic minority over-sampling technique). Subsequently, the data were divided into training data and testing data by using 10-fold cross validation. In addition, the multi-class imbalanced data classification algorithms; one-vs-one and one-vs-all; were employed in conjunction with the Gentleboost. The experimental results showed that the ADASYN and one-vs-one method achieved the best accuracy of 97.31 % on the imbalanced data. Moreover, our proposed predicting model was also tested with another imbalanced data from Ban Hua Ku Sub-district Health Promoting Hospital, Thasala district, Nakhon Si Thammarat province. The data are classified into 4 classes: class 0 refers to Excellent (141 records), class 1 refers to Good (63 records), class 2 refers to Moderate (16 records) and class 3 refers to Poor (12 records). It was found that the correct classification was 85.78 %. Furthermore, it also achieved the best performance in class 2 and class 3, especially in class 3 (Poor), the correct classification was increased from 0 to 75 %. In conclusion, the health promotion scheme can employ this model for diagnosis and plan the treatment for senior citizens.

Keywords: knee osteoarthritis; imbalanced data; SMOTE; ADASYN; Gentleboost

Issue

Vol.27 No.6 (November - December 2019)

Section

Engineering and Architecture

Author Biographies

พุทธิพร ธนธรรมเมธี

หลักสูตรวิศวกรรมซอฟต์แวร์ สำนักวิชาสารสนเทศศาสตร์ มหาวิทยาลัยวลัยลักษณ์ ตำบลไทยบุรี อำเภอท่าศาลา จังหวัดนครศรีธรรมราช 80160

เยาวเรศ ศิริสถิตย์กุล

หลักสูตรวิศวกรรมซอฟต์แวร์ สำนักวิชาสารสนเทศศาสตร์ มหาวิทยาลัยวลัยลักษณ์ ตำบลไทยบุรี อำเภอท่าศาลา จังหวัดนครศรีธรรมราช 80160

References

[1] สำนักงานคณะกรรมการพัฒนาการเศรษฐกิจและสังคมแห่งชาติ, 2554, แผนพัฒนาเศรษฐกิจและสังคมแห่งชาติ ฉบับที่ 11 พ.ศ. 2555-2559. แหล่งที่มา : https://www.nesdb.go.th/Portals/0/news/plan/p11/plan11.pdf, 20 สิงหาคม 2558.
[2] ปภัสรา หาญมนตรี, พรรณี ปึงสุวรรณ, ภาวินี เสริมชีพ, วิชัย อึงพินิจพงศ์, อุไรวรรณ ชัชวาล และรุ้งทิพย์ พันธุเมธากุล, 2557, ความเที่ยงในการทดสอบซ้ำและความสัมพันธ์ของแบบประเมิน Western Ontario and McMaster Universities Osteoarthritis ฉบับภาษาไทยกับคะแนนปวดในผู้สูงอายุที่มีภาวะข้อเข่าเสื่อม, ว. เทคนิคการแพทย์และกายภาพบำบัด 6: 84-92.
[3] รังสิยา นารินทร์, วิลาวัณย์ เตือนราษฎร์ และวราภรณ์ บุญเชียง, 2558, การพัฒนาโปรแกรมดูแลผู้สูงอายุข้อเข่าเสื่อมโดยการมีส่วนร่วมของชุมชน, พยาบาลสาร 42: 170-181.
[4] ยุวดี สารบูรณ์, สุภาพ อารีเอื้อ และสุจินดา จารุพัฒน์ มารุโอ, 2557, อาการ ความรู้ และการรับรู้ความเจ็บป่วยด้วยโรคข้อเข่าเสื่อมของผู้สูงอายุในชุมชน : การศึกษานำร่อง, ว.วิทยาลัยพยาบาลบรมราชชนนี 30: 12-24.
[5] จันทร์จิรา เกิดวัน, จิราภรณ์ บุญอินทร์, ชุติมา ธีระสมบัติ และวิไล คุปต์นิรัติศัยกุล, 2559, การสำรวจความชุกของโรคข้อเข่าเสื่อมผู้สูงอายุในชุมชน, ว.กายภาพบำบัด 38: 59-70.
[6] นงพิมล นิมิตอานันท์, 2557, สถานการณ์ทางระบาดวิทยาและการประเมินความเสี่ยงโรคข้อเข่าเสื่อมในคนไทย, ว.พยาบาลทหารบก 15: 185-194.
[7] พุทธิพร ธนธรรมเมธี และเยาวเรศ ศิริสถิตย์กุล, 2561, การวิเคราะห์ข้อคำถามที่มีผลต่อการทำนายภาวะเข่าเสื่อมในผู้สูงอายุโดยใช้เทคนิคเหมืองข้อมูล, ว.วิทยาศาสตร์และเทคโนโลยี มหาวิทยาลัยมหาสารคาม 37: 726-735.
[8] Boonchuay, K., Sinapiromsaran, K. and Lursinsap, C., 2011, Minority split and gain ratio for a class imbalance, pp. 2060-2064, 8th International Conference on Fuzzy Systems and Knowledge Discovery.
[9] Han, J. and Kamber, M., 2000, Data Mining Concept and Techniques, Morgan Kaufmann Publishers, Canada.
[10] ดิษฐพล มั่นธรรม และลี่ลี อิงศรีสว่าง, 2553, การประยุกต์ขั้นตอนวิธีต้นไม้ตัดสินใจกับการวินิจฉัยโรคระบบการหายใจ : กรณีศึกษาที่โรงพยาบาลพระนครศรีอยุธยา, ว.วิจัยระบบสาธารณสุข 4: 73-81.
[11] Boonlue, S., Kammanat, T. and Kawsan, K., 2009, Diagnosis preliminary for leucorrhea system using back-propagation neural network, p. 163, 5th National Conference on Computing and Information Technology (NCCIT’09), Thailand.
[12] Kim, D.H., Uhmn, S., Ko, Y.W., Cho, S.W., Cheong, J.Y. and Kim, J., 2007, Chronic hepatitis and cirrhosis classification using SNP data, decision tree and decision rule, pp. 585-596, International Conference on Computational Science and Its Applica tions.
[13] ธิดาภัทร อนุชาญ และนิติ เอี่ยมชื่น, 2561, การวิเคราะห์ความเสี่ยงพื้นที่น้ำท่วมโดยใช้แบบจำลองต้นไม้การตัดสินใจ บริเวณลุ่มน้ำทะเลสาบสงขลา, ว.วิทยาศาสตร์และเทคโนโลยี มหาวิทยาลัยมหาสารคาม 37: 98-107.
[14] อนันต์ ปินะเต, 2560, การใช้เทคนิคเหมืองข้อมูลในการเลือกสาขาวิชาเพื่อโอกาสในการเข้าศึกษาต่อระดับปริญญาตรี, ว.วิทยาศาสตร์และเทคโนโลยี มหาวิทยาลัยมหาสารคาม 36: 704-712.
[15] จามรี ชูบัวทอง และสมศรี บัณฑิตวิไล, 2560, การพัฒนาตัวแบบเพื่อพยากรณ์คุณภาพผลิตภัณฑ์ฮาร์ดดิสก์ด้วยการถดถอยโลจิสติกส์และโครงข่ายประสาทเทียม, ว.วิทยาศาสตร์และเทคโนโลยี 25: 1-13.
[16] Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmayer, W.P., 2002, SMOTE: Synthetic minority over-sampling technique, J. Artif. Intell. Res. 16: 321-357.
[17] He, H. and Garcia, E.A., 2009, Learning from Imbalanced Data, IEEE T. Knowl. Data. En. 21: 1263-1284.
[18] วิไลรัตน์ วิศวไพศาล, บุญช่วย ศรีธรรมศักดิ์ และสาธิษฐ์ นากกระแสร์, 2559, ปัจจัยที่มีผลต่อการทำนายการคลอดก่อนกำหนดในหญิงตั้งครรภ์ในโรงพยาบาลตำรวจ, ว.พยาบาลตำรวจ 8: 83-90.
[19] ภรัณยา ปาลวิสุทธิ์, 2559, การเพิ่มประสิทธิภาพเทคนิคต้นไม้ตัดสินใจบนชุดข้อมูลที่ไม่สมดุลโดยวิธีการสุ่มเพิ่มตัวอย่างกลุ่มน้อยสำหรับข้อมูลการเป็นโรคติดอินเทอร์เน็ต, ว.เทคโนโลยีสารสนเทศ 12: 54-63.
[20] Sukmak, V. and ThongKam, J., 2013, Improving quality of breast cancer data through pre-processing, KKU Eng. J. 40: 493-504.
[21] เชาวนันท์ โสโท, พุธษดี ศิริแสงตระกูล และวรชัย ตั้งวรพงศ์ชัย, 2556, แบบจำลองการทำนายผลการรักษาผู้ป่วยมะเร็งปากมดลูกด้วยโครงข่ายประสาทเทียม, ว.วิจัย มข. (ฉบับบัณฑิตศึกษา) 13: 39-49.
[22] วีระยุทธ มายุศิริ, จารี ทองคำ และวาทินี สุขมาก, 2557, การพัฒนาแบบจำลองเพื่อการพยากรณ์การรักษาซ้ำของผู้ป่วยโรคจิตเภทโดยเทคนิคเหมืองข้อมูล, ว.วิทยาศาสตร์และเทคโนโลยี มหาวิทยาลัยมหาสารคาม ฉบับพิเศษ: 144-153.
[23] Sakr, S., Elshawi, R., Ahmed, A.M., Qureshi, W.T., Brawner, C.A., Keteyian, S.J., Blaha, M.J. and Al-Mallah, M.H., 2017, Comparison of machine learning techniques to predict all-cause mortality using fitness data: The Henry ford exercise testing (FIT) project, BMC Med. Inform. Decis. 17: 174-188.
[24] Richardson, A.M and Lidbury, B.A., 2017, Enhancement of hepatitis virus immune assay outcome predictions in imbalanced routine pathology data by data balancing and feature selection before the application of support vector machines, BMC Med. Inform. Decis. 17: 121-131.
[25] Wang, K.J., Makond, B. and Wang, K.M., 2013, An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data, BMC Med. Inform. Decis. 13: 124-137.
[26] He, H., Bai, Y., Garcia, E.A. and Li, S., 2008, ADASYN: Adaptive synthetic sampling approach for imbalanced, pp. 1322-1328, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).
[27] Aditsania, A., Adiwijaya and Saonard A.L., 2017, Handling imbalanced data in churn prediction using ADASYN and backpropagation algorithm, pp. 533-536, 3rd International Conference on Science in Information Technology.
[28] Hastie, T. and Tibshirani, R., 1998, Classification by pairwise coupling, Ann. Stat. 26: 451-471.
[29] Rifkin, R. and Klautau, A., 2004, In defense of one-vs-all classification, J. Mach. Learn. Res. 5: 101-141.
[30] Galar, M., Fernandez, A., Barrenechea, E., Bustince, H. and Herrera, F., 2011, An overview of ensemble methods for binary classifier in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes, Pattern Recogn. 44: 1761-1776.
[31] Friedman, J., Hastie, T. and Tibshirani, R., 2000, Additive logistic regression: A statistical view of boosting, Ann. Stat. 28: 337-374.
[32] Mekhalfa, F. and Nacereddine, N., 2017, Gentle adaboost algorithm for weld defect classification, pp. 301-306, 21th Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA 2017).
[33] Fernandez, A., Lopez, V., Galar, M., Jesus, M.J. and Herrera, F., 2013, Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches, Knowl-Based. Syst. 42: 97-110.
[34] Shoorangiz, R., Weddell, S.J. and Jones, R.D., 2016, Prediction of microsleeps from EEG: Preliminary results, pp. 4650-4653, 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).
[35] Fausta, O., Hagiwarab, Y., Hong, T.J., Lih, O.H. and Acharya, U.R., 2018, Deep learning for healthcare applications based on physiological signals: A review, Comp. Methods Programs Biomed. 161: 1-13.

Article Sidebar

Main Article Content

Abstract

Article Details

พุทธิพร ธนธรรมเมธี

เยาวเรศ ศิริสถิตย์กุล

References