Pattern-based Extraction of Named Entities in Thai News Documents

Authors

  • Nattapong Tongtep Sirindhorn International Institute of Technology, Thammasat University
  • Thanaruk Theeramunkong Sirindhorn International Institute of Technology, Thammasat University

Keywords:

Named Entity, Information Extraction, Pattern Classification

Abstract

Named entity extraction is a nontrivial and challenging task for information extraction in Thai language since a Thai text has no word, phrase and sentence boundary. This paper proposes a pattern-based method to extract Thai named entities, such as person name, organization name, location, date and time, as well as action phrases from a text, without assistance of word segmentation and part-of-speech tagging. The experimental results show that the proposed method can detect named entities with approximately 68-100% correctness, using a large-scale Thai dictionary and a set of predefined pattern matching templates.

Downloads

How to Cite

Tongtep, N., & Theeramunkong, T. (2015). Pattern-based Extraction of Named Entities in Thai News Documents. Science & Technology Asia, 15(1), 70–81. Retrieved from https://ph02.tci-thaijo.org/index.php/SciTechAsia/article/view/41311

Issue

Section

Articles