Feature Selection for Imputation of Missing Data to Measure the Efficiency of Nile Tilapia Production in Suphanburi Province

Main Article Content

จารินี ศานติจรรยาพร
สุชาดา กรเพชรปาณี
พัชรี วงษ์เกษม

Abstract

The objectives of this research were: 1) to development of a new method (FSNNR) for imputation
of missing data that combined feature selection with the nearest neighbor regression imputation
method, 2) to compare the efficiency of FSNNR and three imputations (RI, KNN and NNR) under
36 situations from these three conditions; the sample sizes; the missing percentages; and the data
deviations. The data were simulated using the Monte Carlo technique; repeated 1,000 times for
each situation, and 3) to measure the efficiency of Nile tilapia production in Suphanburi Province.
The results were as followed:
1) The new method for imputation of missing data was as follows:
First, we select two features of data using the Nearest Neighbor. Next, we impute the missing
value using the K-Nearest Neighbor Imputation (k=2). Finally, complete the data obtained from
step 2 to impute missing data with Regression Imputation.
2) The FSNNR was better performed than the other imputations under 33 combinations of
simulated conditions.
3) The technical efficiency of each Nile Tilapia farmer when replacing missing data with FSNNR
method, it was found that the Nile Tilapia farmers have the highest level of technical efficiency
82.76% and high technical efficiency of 24.14%.

Article Details

Section
บทความวิจัย (Research Articles)

References

กรมประมง. (2553). ยุทธศาสตร์การพัฒนาปลานิล (พ.ศ. 2553–2557). กรุงเทพฯ: กรมประมง กระทรวงเกษตร
และสหกรณ์.
Aigner, D., Lovell, C. A. K., & Schmidt, P. (1977). Formulation and estimation of stochastic frontier production
function models. Journal of Econometrics, 6(1), 21–37. https://doi.org/10.1016/0304-4076(77)90052-5
Alam, M. F., Khan, M. A., & Huq, A. A. Anwarul. (2012). Technical efficiency in Tilapia farming of Bangladesh:
a stochastic frontier production approach. Aquaculture International, 20(4), 619–634. https://
doi.org/10.1007/s10499-011-9491-3
Alawode, O. O., & Jinad, A. O. (2014). Evaluation of technical efficiency of catfish produ ction in oyo
state: a case study of Ibadan Metropolis. Journal of Emerging Trends in Educational Research and
Policy Studies, 5(2), 223–231.
Beretta, L., & Santaniello, A. (2016). Nearest neighbor imputation algorithms: a critical evaluation. BMC
Medical Informatics and Decision Making, 16(S3), 198-208.
Chaimongkol, W., & Suwattee, P. (2004). Nearest Neighbor- Regression Imputation (Vol. 5). Presented at the
Applied Statistics Conference 2004, Chaingmai: Chaingmai University.
Eskelson, B. N. I., Temesgen, H., Lemay, V., Barrett, T. M., Crookston, N. L., & Hudak, A. T. (2009).
The roles of nearest neighbor methods in imputing missing data in forest inventory and
monitoring databases. Scandinavian Journal of Forest Research, 24(3), 235–246. https://doi.
org/10.1080/02827580902870490
Farrell, M. J. (1957). The measurement of productive efficiency. Journal of the Royal Statistical Society.
Series A (General), 120(3), 253-290. https://doi.org/10.2307/2343100
Kumari, B., & Swarnkar, T. (2011). Filter versus wrapper feature subset selection in large dimensionality
microarray : A Review. International Journal of Computer Science and Information Technologies.
2(3), 1048–1053.
Ibrahim, J. G., & Molenberghs, G. (2009). Missing data methods in longitudinal studies: a review. TEST,
18(1), 1–43. https://doi.org/10.1007/s11749-009-0138-x
Islam, G. M. N., Tai, S. Y., & Kusairi, M. N. (2016). A stochastic frontier analysis of technical efficiency of
fish cage culture in Peninsular Malaysia. SpringerPlus, 5(1), 1–11. https://doi.org/10.1186/s40064-016-2775-3
Ladha, L., & Deepa, T. (2011). Feature selection methods and algorithms. International Journal on Computer
Science and Engineering (IJCSE), 3(5), 1787–1797.
Li, S., Harner, E. J., & Adjeroh, D. A. (2011). Random KNN feature selection - a fast and stable alternative to
Random Forests. BMC Bioinformatics, 12(450), 1-11. https://doi.org/10.1186/1471-2105-12-450, 1–11.
Raymond, M. R. (2016). Missing data in evaluation research. Evaluation & the Health Professions,
4(9), 395-420. doi:10.1177/016327878600900401
Shweta, S., Nikita, J., & Madhvi, G. (2013). A Review paper on feature selection methodologies and their
applications. International Journal of Engineering Research and Development, 7(6), 57–61.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., & Altman, R. B. (2001). Missing
value estimation methods for DNA microarrays. Bioinformatics (Oxford, England), 17(6), 520–525.