Optical Character Recognition (OCR) enhancement using an approximate string matching technique

Main Article Content

Kraisak Kesorn https://orcid.org/0000-0002-5195-8038 Phornsiri Phawapoothayanchai


Many researchers have focused on improving optical character recognition (OCR) efficiency by developing new techniques using image processing based methodologies. However, the major limitations of image processing techniques are their complexity and computational intensity. Thus, they are not applicable to some real-time application. The main highlight of this paper is that we present a new method for enhancing OCR using a simple approximate string matching technique to complement existing OCR algorithms. The experimental results revealed that the proposed methods can enhance the performance of OCR algorithms measured by precision. The accuracy of Thai word recognition was increased by up to 85.72% compared to use of traditional OCR techniques.


Article Details

How to Cite
Kesorn, K., & Phawapoothayanchai, P. (2018). Optical Character Recognition (OCR) enhancement using an approximate string matching technique. Engineering and Applied Science Research, 45(4), 282-289. Retrieved from https://www.tci-thaijo.org/index.php/easr/article/view/99252


[1] Schantz HF. History of OCR: optical character recognition. USA: Recognition Technologies Users Association; 1982.

[2] Neto R, Fonseca N. Camera reading for blind people. Procedia Technol. 2014;16:1200-9.

[3] Singh AK, Gupta A, Saxena A. Optical character recognition: a review. Int. J. Emerg. Technol. Innov. Res. 2014;3(4):142-6.

[4] Nuance. OmniPage [Internet]. 2017. [cited 2017 Sep]. Available from: https://www.nuance.com/print-capture-and-pdf-solutions/optical-character-recognitio n/omnipage.html.

[5] Daðason JF. Post-correction of Icelandic OCR text [Thesis]. Iceland: University of Iceland; 2012.

[6] Borji A, Hamidi M. Support vector machine for persian font recognition. Eng. Technol. 2007;2:10-3.

[7] Ramanathan R, Soman KP, Thaneshwaran L, Viknesh V, Arunkumar T, Yuvaraj P. A novel technique for english font recognition using support vector machines. 2009 International Conference on Advances in Recent Technologies in Communication and Computing; 2009 Oct 27-28; Kerala, India: IEEE; 2009. p. 766-9.

[8] Leelasantitham A, Kiattisin S. A position-varied plate utilized for a Thai license plate recognition. Proceedings of the SICE International Joint Conference 2010; 2010 Aug 18-21; Taipei, Taiwan: IEEE; 2010. p. 3303-7.

[9] Leesom N, Surinta O. Thai handwritten character segmentation from digital image documents. The 3rd Mahasarakham University Research Conference; 2007; Mahasarakham, Thailand. Mahasarakham : Mahasarakham University; 2007. p. 1-10.

[10] Sangkathum O, Sornil O. Printed Thai character recognition using conditional random fields and hierarchical centroid distance. Appl. Mech. Mater. 2013;441-414:1238-46.

[11] Dong C, Zhu X, Deng Y, Loy CC, Qiao Y. Boosting optical character recognition: a super-resolution approach. arXiv:1506.02211. 2015:1-5.

[12] Mahendra KU, Joshi MS. Improving optical character recognition process for low resolution images. Int. J. Comput. Science Netw. 2017;6(3):145-8.

[13] Islam N, Islam Z, Noor N. A survey on optical character recognition system. J. Inf. Commun. Technol. 2016;10(2):1-4.

[14] Kumar S, Sahu N, Deep A, Gavel K, Ghos R. Offline handwriting character recognition (for use of medical purpose) using neural network. Int. J. Eng. Comput. Sci. 2016;5(10):18612-5.

[15] Gupta A, Srivastava M, Mahanta C. Offline handwritten character recognition using neural network. 2011 IEEE International Conference on Computer Applications and Industrial Electronics; 2011 Dec 4-7; Penang, Malaysia: IEEE; 2011. p. 102-7.

[16] Fragoso V, Gauglitz S, Zamora S, Kleban J, Turk M. TranslatAR: a mobile augmented reality translator. 2011 IEEE Workshop on Applications of Computer Vision; 2011 Jan 5-7; Kailua-Kona, USA: IEEE; 2011. p. 497-502.

[17] Martínez-Carballido J, Alfonso-López R, Ramírez-Cortés JM. License plate digit recognition using 7x5 binary templates at an outdoor parking lot entrance. 21st International Conference on Electrical Communications and Computers; 2011 Feb 28- Mar 2; San Andres Cholula, Mexico: IEEE; 2011. p. 18-21.

[18] Abdurrahman MW. Developing mobile sunda-indonesia-inggris translator application using capture camera on android smartphone. [Internet]. 2017 [cited Sept 2017]. Available from: https:// www.academia.edu/4805148/Developing_Mobile_Sunda-Indonesia-Inggris_Translator_Application_Using _Capture_Camer_On_Android_Smartphone.

[19] Kesorn K, Chimlek S, Poslad S, Piamsa-nga P. Visual content representation using semantically similar visual words. Expert Syst. Appl. 2011;38(9):11472-81.

[20] i2OCR. i2OCR - Free Online OCR n.d. [Internet]. [cited 2017]. Available from: https://www.i2ocr.com.

[21] newocr. Free Online OCR [Internet] 2017. [cited October 2017]. Available from: https://www.newocr. com.

[22] ocrapiservice. Ocr Api Service [Internet]. 2017 [cited 2017]. Available from: https://ocrapiservice.com/.

[23] Lowe DG. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004;60(2):91-110.

[24] Chimlek S, Kesorn K, Piamsa-nga P, Poslad S. Semantically Similar Visual Words Discovery to Facilitate Visual Invariance. 2010 IEEE International Conference on Multimedia and Expo; 2010 Jul 19-23; Suntec City, Singapore: IEEE; 2010. p. 1242-7.