The comparative study of text documents clustering algorithms

Mohammad Eiman  Jamnezhad; Reza  Fattahi

doi:10.36953/ECJ.2015.SE1614

Submitted

June 30, 2015

Published

December 5, 2015

Download

PDF

Statistic

Read Counter : 116 Download : 115

Downloads

Download data is not yet available.

Abstract

Clustering is one of the most significant research area in the field of data mining and considered as an important tool in the fast developing information explosion era.Clustering systems are used more and more often in text mining, especially in analyzing texts and to extracting knowledge they contain. Data are grouped into clusters in such a way that the data of the same group are similar and those in other groups are dissimilar. It aims to minimizing intra-class similarity and maximizing inter-class dissimilarity. Clustering is useful to obtain interesting patterns and structures from a large set of data. It can be applied in many areas, namely, DNA analysis, marketing studies, web documents, and classification. This paper aims to study and compare three text documents clustering, namely, k-means, k-medoids, and SOM through F-measure.

Keywords

K-means Bisecting K-means k-medoids SOM F-measure

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

How to Cite

Jamnezhad, M. E. ., & Fattahi, R. . (2015). The comparative study of text documents clustering algorithms. Environment Conservation Journal, 16(SE), 133–138. https://doi.org/10.36953/ECJ.2015.SE1614

Download Citation

References

C. J. van Rijsbergen, 1989. Information Retrieval, Buttersworth, London, second edition.
Cutting, D., Karger, D., Pedersen, J. and Tukey, J. W.1992. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen, 1992:318-329.
Daniel Boley.1998. Principal direction divisive partitioning.Data Mining and Knowledge Discovery. 1998, 2(4): 325-344.
Gerald Kowalski, Information Retrieval Systems – Theory and Implementation, Kluwer Academic Publishers, 1997.
J. Han and M. Kamber. “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, August 2000.
Jiawei Han and MichelineKamber, “Data Mining Techniques”, Morgan Kaufmann Publishers, 2000.
K.Lagus,T.Honkela, S.Kaski, and T.Kohonen. (1996). Self-organizing maps of document collections: A newapproach to interactive exploration. Proceedings of the Second International Conference on KnowledgeDiscovery and Data Mining, AAAI Press, Menlo Park, California. 1996:238-243.
L. Wanner, “Introduction to Clustering Techniques”,International Union of Local Authorities, (2004).
Michael Steinbach, George Karypis, Vipin Kumar, Department of Computer Science and Engineering, University of Minnesota, Technical Report #00-034
R. Ng, J. Han. (1994). Efficient and effective clustering method for spatial data mining.In Proc. of the 20th VLDB Conference, Santiago, Chile, 1994:144–155.
S. S Singh and N. C Chauhan. “K-means v/s K-medoids: A Comparative Study”,BVM Engineering College and A.D.Patel Engineering College,May 2011.
Steinbach, M., Karypis, G., Kumar, V., “A Comparison of Document Clustering Techniques,”University of Minnesota, Technical Report #00-034 (2000).http://www.cs.umn.edu/tech_reports/
T. Velmurugan,and T. Santhanam, “A Survey of Partitionbased Clustering Algorithms in Data Mining: AnExperimental Approach” An experimental approach.Information. Technology.J.ournal, Vol, 10,No .3 , pp478-484,(2011).
T.Kohonen. (1982).Self-organized formation of topologically correct feature maps.Biological Cybernetics. 1982(43):59-69
Yiheng Chen,Bing Qin,Ting Liu,Yuanchao Liu,Sheng Li. (2010).The Comparison of SOM and K-means for Text Clustering.School of Computer Science and Technology,Harbin Institute of Technology PO box 321, Harbin, 150001, China
Yutaka Sasaki, Research Fellow, School of Computer Science, University of ManchesterMIB: The truth of the F-measure (2007)

References

C. J. van Rijsbergen, 1989. Information Retrieval, Buttersworth, London, second edition.

Cutting, D., Karger, D., Pedersen, J. and Tukey, J. W.1992. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen, 1992:318-329.

Daniel Boley.1998. Principal direction divisive partitioning.Data Mining and Knowledge Discovery. 1998, 2(4): 325-344.

Gerald Kowalski, Information Retrieval Systems – Theory and Implementation, Kluwer Academic Publishers, 1997.

J. Han and M. Kamber. “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, August 2000.

Jiawei Han and MichelineKamber, “Data Mining Techniques”, Morgan Kaufmann Publishers, 2000.

K.Lagus,T.Honkela, S.Kaski, and T.Kohonen. (1996). Self-organizing maps of document collections: A newapproach to interactive exploration. Proceedings of the Second International Conference on KnowledgeDiscovery and Data Mining, AAAI Press, Menlo Park, California. 1996:238-243.

L. Wanner, “Introduction to Clustering Techniques”,International Union of Local Authorities, (2004).

Michael Steinbach, George Karypis, Vipin Kumar, Department of Computer Science and Engineering, University of Minnesota, Technical Report #00-034

R. Ng, J. Han. (1994). Efficient and effective clustering method for spatial data mining.In Proc. of the 20th VLDB Conference, Santiago, Chile, 1994:144–155.

S. S Singh and N. C Chauhan. “K-means v/s K-medoids: A Comparative Study”,BVM Engineering College and A.D.Patel Engineering College,May 2011.

Steinbach, M., Karypis, G., Kumar, V., “A Comparison of Document Clustering Techniques,”University of Minnesota, Technical Report #00-034 (2000).http://www.cs.umn.edu/tech_reports/

T. Velmurugan,and T. Santhanam, “A Survey of Partitionbased Clustering Algorithms in Data Mining: AnExperimental Approach” An experimental approach.Information. Technology.J.ournal, Vol, 10,No .3 , pp478-484,(2011).

T.Kohonen. (1982).Self-organized formation of topologically correct feature maps.Biological Cybernetics. 1982(43):59-69

Yiheng Chen,Bing Qin,Ting Liu,Yuanchao Liu,Sheng Li. (2010).The Comparison of SOM and K-means for Text Clustering.School of Computer Science and Technology,Harbin Institute of Technology PO box 321, Harbin, 150001, China

Yutaka Sasaki, Research Fellow, School of Computer Science, University of ManchesterMIB: The truth of the F-measure (2007)

	All	Since 2019
Citations	2686	1786
h-index	20	14
i10-index	59	28

The comparative study of text documents clustering algorithms

Article Sidebar

Downloads

Main Article Content

Abstract

Keywords

Article Details

References

References

Cited byView all

Cited by