Sequential Clustering and Condensing the Meaning of Texts into Centroid Terms

Maytiyanin Komkhao; Mario Kubek; Wolfgang A. Halang

Published: Jun 30, 2018

Keywords:

Clustering, Number of Clusters, Distance Measures, Sequential Clustering, Single-Linkage, Reclustering, Outlier Removal, Text Analysis, Centroid Term, Centroid Distance Measure

Maytiyanin Komkhao

Faculty of Science and Technology, Rajamangala University of Technology Phra Nakhon

Mario Kubek

Faculty of Mathematics and Computer Science, Fernuniversitat in Hagen, Hagen, Germany

Wolfgang A. Halang

Sino-German Technical Faculty Qingdao University of Science and Technolgy, Qingdao, China.

Abstract

When run, most traditional clustering algorithms require the number of clusters sought to be specied beforehand, and all clustered items to be present. These two, for practical applications very serious shortcomings are overcome by a straightforward sequential clustering algorithm. Its most crucial constituent is a distance measure whose suitable choice is discussed. It is shown how sequentially obtained cluster sets can be improved by reclustering, and how items
considered as outliers can be removed. As a case study, the feasibility of applying the method and a centroid-based distance measure to nd and group semantically similar documents in text analysis is investigated.

Issue

Vol. 14 No. 1 (2018): มกราคม - มิถุนายน 2561

Section

Research Paper

Article Sidebar

Main Article Content

Abstract

Article Details