Cluster Analysis to Find Sets of High-frequency Queries for Filtering in Similarity Join

Main Article Content

Kamolwan Kunanusont
Jaruloj Chongstitvatana

Abstract

Similarity search and similarity join are important operations in text databases. In some situations, some similar queries, called high-frequency queries, are repeated over a period of time. High-frequencyqueries-based filter is used to facilitate this type of queries. However, the performance of this method depends mostly on the chosen high-frequency queries. This paper proposes methods, which are based on DBSCAN and agglomerative hierarchical-based clustering algorithm, to find high-frequency queries for the filter, called DBRAN and DBSM. For evaluation, both DBRAN and DBSM are applied on various sets of queries to find high-frequency queries for three datasets. It is found that DBSM performs better than DBRAN when the variation among highfrequency queries is high. However, when the variation among high-frequency queries is low, the performance of both DBRAN and DBSM are about the same.

Article Details

How to Cite
[1]
K. Kunanusont and J. Chongstitvatana, “Cluster Analysis to Find Sets of High-frequency Queries for Filtering in Similarity Join”, ECTI-CIT Transactions, vol. 10, no. 1, pp. 53–61, Apr. 2016.
Section
Artificial Intelligence and Machine Learning (AI)