Research Article
Open access
Published on 22 March 2024
Download pdf
Hu,Z. (2024). Comparison of K-Means, K-Medoids and K-Means++ algorithms based on the Calinski-Harabasz index for COVID-19 epidemic in China. Applied and Computational Engineering,49,11-20.
Export citation

Comparison of K-Means, K-Medoids and K-Means++ algorithms based on the Calinski-Harabasz index for COVID-19 epidemic in China

Zhengcao Hu *,1,
  • 1 Taiyuan University of Technology

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/49/20241046

Abstract

The novel coronavirus spreads from person to person through close contact and respiratory droplets such as coughing or sneezing. Various studies have been conducted globally to deal with COVID-19. However, no cure for the virus has been found , and efficient data processing methods for sudden outbreaks have not yet been identified. This study compares three algorithms for data sets to analyze clustering patterns to determine the best data processing method. The data of this study comes from the Chinese Center for Disease Control and Prevention, including two attributes of confirmed cases and death cases. We selected the data from the initial stage of the outbreak until October 31, 2021. We compared the data analysis and processing results of the clustering of the spread of the new coronavirus in China by the K-Means, K-Medoids and K-Means++ algorithms. By comparing the Calinski-Harabasz index values from K=2 to K=10, the results show that the K-Means, K-Medoids and K-Means++ algorithms have almost the same clustering effect when K does not exceed 6, but when the K value is greater than 6. When the K-Medoids clustering effect is significantly better, therefore, from the three clustering algorithms used, it can be concluded that the best method for clustering the spread of the novel coronavirus outbreak in China is the K-Medoids method. The results of this study provides ideas for future researchers to choose an appropriate cluster analysis method to effectively process the data in the early stages of the epidemic.

Keywords

COVID-19, Calinski-Harabasz, K-Means, K-Medoids, K-Means++

[1]. M. A. Shereen, S. Khan, A. Kazmi, NBashir, and R. Siddique, COVID- 19 infection: Origin, transmission, and characteristics of human coronaviruses, Journal of Advanced Research 24 (2020) 91–98.

[2]. N.Dwitri dkk, Penerapan Algoritma K-Means dalam Menentukan Tingkat Penyebaran Pandemi Covid- 19 di Indonesia, Jurnal Teknologi Informasi, Vol. 4, No. 1, Juni 2020.

[3]. R.A. Indraputra , R. Fitriana, K-Means Clustering Data COVID- 19, Jurnal Teknik Industri, Volume 10 No.3.Desember 2020.

[4]. Gao, S., Rao, J., Kang, Y., Liang, Y., & Kruse, J. (2020). Hierarchical Clustering Analysis of COVID-19 Transmission in Wuhan, China. Journal of Medical Virology, 92(9), 1887-1895.

[5]. Sun, Y., Li, Y., Bao, Y., Meng, S., Sun, Y., Schumann-Bischoff, J.,... & Luan, H. (2020). Identifying Links Between SARS-CoV-2 Transmission and Clustered Environments. Journal of Travel Medicine, 27(5), taaa099.

[6]. Liu, L., Wei, Q., Alvarez, X., Wang, H., Du, Y., Zhu, H.,... & Chen, Z. (2020). Epithelial Cells lining Salivary Gland Ducts are Early Target Cells of Severe Acute Respiratory Syndrome Coronavirus Infection in The Upper Respiratory Tracts of Rhesus Macaques. Journal of Virology, 84(15), 765-771.

[7]. Zhang, J., Zhou, L., Yang, Y., Peng, W., Wang, W., Chen, X.,... & Liu, Z. (2020). Therapeutic and Triaging Strategies for 2019 Novel Coronavirus Disease in Fever Clinics. The Lancet Respiratory Medicine, 8(3), e11-e12.

[8]. NetEase. (n.d.). Virus Report. Retrieved from https://wp.m.163.com/163/page/news/virus_report/index.html.

[9]. China Centers for Disease Control and Prevention. (n.d.). Health and Wellness. Retrieved from https://m.chinacdc.cn/xwzx/zxyw/.

[10]. Li Cuixia, Yu Jian. A study on classification of fuzzy clustering algorithm [J]. Journal of Beijing Jiaotong University: Natural Science Edition, 2005, 29(2): 17-21.

[11]. Witten, I. H., & Frank, E. (2005). An Introduction to Data Mining.

[12]. Mitchell, T. M. (1997). Machine Learning. McGraw Hill.

[13]. Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129-137.

[14]. Kaufman, L., & Rousseeuw, P. J. (1987). Clustering by means of medoids. Statistical Data Analysis Based on the L1-Norm and Related Methods, 405-416.

[15]. Park, H. S., Jun, C. H., & Park, H. H. (2009). A partitioning around medoids-based clustering algorithm for large-scale data sets. Data Mining and Knowledge Discovery, 18(3), 359-390.

[16]. Arthur,D.,&Vassilvitskii,S. (2007). K-Means++: The advantages of careful seeding. Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms.

[17]. Bahmani,B.,Moseley, B.,Vattani, A., Kumar, R., & Vassilvitskii, S. (2012). Scalable K-Means++. Proceedings of the VLDB Endowment, 5(7), 622-633.

[18]. Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3(1), 1-27.

[19]. Nguyen,T.X.,Vo, B.,& Cao, H. (2017). An efficient clustering algorithm for image segmentation. International Journal of Computer Vision, 123(2), 312-328.

Cite this article

Hu,Z. (2024). Comparison of K-Means, K-Medoids and K-Means++ algorithms based on the Calinski-Harabasz index for COVID-19 epidemic in China. Applied and Computational Engineering,49,11-20.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 4th International Conference on Signal Processing and Machine Learning

Conference website: https://www.confspml.org/
ISBN:978-1-83558-343-2(Print) / 978-1-83558-344-9(Online)
Conference date: 15 January 2024
Editor:Marwan Omar
Series: Applied and Computational Engineering
Volume number: Vol.49
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).