Main Article Content

Abstract

Clustering is one of the most significant research area in the field of data mining and considered as an important tool in the fast developing information explosion era.Clustering systems are used more and more often in text mining, especially in analyzing texts and to extracting knowledge they contain. Data are grouped into clusters in such a way that the data of the same group are similar and those in other groups are dissimilar. It aims to minimizing intra-class similarity and maximizing inter-class dissimilarity. Clustering is useful to obtain interesting patterns and structures from a large set of data. It can be applied in many areas, namely, DNA analysis, marketing studies, web documents, and classification. This paper aims to study and compare three text documents clustering, namely, k-means, k-medoids, and SOM through F-measure.

Keywords

K-means Bisecting K-means k-medoids SOM F-measure

Article Details

How to Cite
Jamnezhad, M. E. ., & Fattahi, R. . (2015). The comparative study of text documents clustering algorithms. Environment Conservation Journal, 16(SE), 133–138. https://doi.org/10.36953/ECJ.2015.SE1614

References

  1. C. J. van Rijsbergen, 1989. Information Retrieval, Buttersworth, London, second edition.
  2. Cutting, D., Karger, D., Pedersen, J. and Tukey, J. W.1992. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen, 1992:318-329.
  3. Daniel Boley.1998. Principal direction divisive partitioning.Data Mining and Knowledge Discovery. 1998, 2(4): 325-344.
  4. Gerald Kowalski, Information Retrieval Systems – Theory and Implementation, Kluwer Academic Publishers, 1997.
  5. J. Han and M. Kamber. “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, August 2000.
  6. Jiawei Han and MichelineKamber, “Data Mining Techniques”, Morgan Kaufmann Publishers, 2000.
  7. K.Lagus,T.Honkela, S.Kaski, and T.Kohonen. (1996). Self-organizing maps of document collections: A newapproach to interactive exploration. Proceedings of the Second International Conference on KnowledgeDiscovery and Data Mining, AAAI Press, Menlo Park, California. 1996:238-243.
  8. L. Wanner, “Introduction to Clustering Techniques”,International Union of Local Authorities, (2004).
  9. Michael Steinbach, George Karypis, Vipin Kumar, Department of Computer Science and Engineering, University of Minnesota, Technical Report #00-034
  10. R. Ng, J. Han. (1994). Efficient and effective clustering method for spatial data mining.In Proc. of the 20th VLDB Conference, Santiago, Chile, 1994:144–155.
  11. S. S Singh and N. C Chauhan. “K-means v/s K-medoids: A Comparative Study”,BVM Engineering College and A.D.Patel Engineering College,May 2011.
  12. Steinbach, M., Karypis, G., Kumar, V., “A Comparison of Document Clustering Techniques,”University of Minnesota, Technical Report #00-034 (2000).http://www.cs.umn.edu/tech_reports/
  13. T. Velmurugan,and T. Santhanam, “A Survey of Partitionbased Clustering Algorithms in Data Mining: AnExperimental Approach” An experimental approach.Information. Technology.J.ournal, Vol, 10,No .3 , pp478-484,(2011).
  14. T.Kohonen. (1982).Self-organized formation of topologically correct feature maps.Biological Cybernetics. 1982(43):59-69
  15. Yiheng Chen,Bing Qin,Ting Liu,Yuanchao Liu,Sheng Li. (2010).The Comparison of SOM and K-means for Text Clustering.School of Computer Science and Technology,Harbin Institute of Technology PO box 321, Harbin, 150001, China
  16. Yutaka Sasaki, Research Fellow, School of Computer Science, University of ManchesterMIB: The truth of the F-measure (2007)