In this paper, we propose an automatic clustering method to find synonymous terms including cross-language keywords from Chinese and English thesis documents. First, Chinese and English keyword pairs were collected from an existing database. Then, the system calculates the support and confidence values of the keyword pairs. Next, high confidence and support values are selected for keyword pairs. Subsequently, keyword pairs are merged by applying a clustering algorithm to various keyword pairs with similar meanings which are clustered into the same subset. Finally, effective applications can be applied based the subsets of collected words including cross-language or synonymous queries. The experimental results achieved 98.4% precision identifying correct terms from 1220 keyword pair clusters from the collected subsets. The primary experimental results show that the system can provide effective information for users when making queries online. ? 2007 IEEE.
Relation:
Proceedings of the Sixth International Conference on Machine Learning and Cybernetics, ICMLC 2007 Volume 4, 2007, Article number4370454, Pages 1875-1880