COMPARISON OF MISSING DATA ESTIMATION IN SIMPLE LINEAR REGRESSION BETWEEN SINGH AND EXPECTATION MAXIMIZATION ALGORITHM

Main Article Content

Phitcha Khrueapaeng Bandhita Plubin Putipong Bookkamana Manachai Rodchuen

Abstract

This study was focus on comparing the estimation methods for missing data in simple linear regression. The methods that used to estimate missing data are Singh method and Expectation Maximization Algorithm (EM). The comparison was done under condition of sample sizes 40, 100, 500 and 1,000; variances 1, 10 and 50; percentages of missing data 5%, 10% and 15%; the correlation coefficient levels between the dependent and independent variable are -0.3, -0.6, -0.9, 0.3, 0.6 and 0.9. The criterion of determination is Root Mean Square Error (RMSE). The results show that the EM method is a better estimation method than Singh method for simple linear regression due to EM method give the lowest RMSE values for all levels of correlation coefficients, sample sizes, variances and percentages of missing data.

Keywords

Article Details

Section
Research Articles

References

Ahmed MS, AL-Titi O, AL-Rawi Z. et al. Estimation of a population mean using different imputation methods, Statistics in Transition. 2006: 7(6); 1247-1264.

Laaksonen S. Regression-Based nearest neighbor hot decking, Computation Statistics. 2000: 15(1); 65-71.

Little Roderick JA, Rubin Donald B. Statistical Analysis with Missing Data. New York: John Wiley, 1987.

Singh S, Horn S. Compromised imputation in survey sampling, Metrika. 2000: 51; 266-276.

Singh GN, Kumari P, Jong MK. Estimation of population mean using imputation techniques in sample surveys, Journal of the Korean Statistic Society. 2010: 39: 67-74.

Wararit P. The monte carlo simulation for estimating the coefficients of skewness when observations are inverse gaussian distributed, Kasalongkham Research Journal. 2009; 3(1): 14-23.