Research Article
Open access
Published on 24 April 2025
Download pdf
Wen,L. (2025). A Comparison Between the K-Nearest Neighbors Algorithm and Logistic Regression in the Field of Cell Type Annotation. Theoretical and Natural Science,93,51-56.
Export citation

A Comparison Between the K-Nearest Neighbors Algorithm and Logistic Regression in the Field of Cell Type Annotation

Lezhou Wen *,1,
  • 1 College of Life Sciences, Sichuan University, Chengdu, Sichuan, China, 610000

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2753-8818/2025.22373

Abstract

With the advancement of single-cell sequencing technologies, high-capacity gene expression data have made cell type annotation across diverse cell populations feasible. However, the high-dimensional and complex nature of these datasets poses challenges for algorithm selection, as traditional manual annotation methods have become inadequate. Machine learning algorithms offer a robust alternative, yet choosing the optimal algorithm remains a critical step. This study provides a detailed analysis of two classical machine learning algorithms--k-Nearest Neighbors (KNN) and Logistic Regression and compares their strengths and limitations in cell type annotation from the perspective of algorithmic principles and data characteristics, aiming to offer practical guidance for selecting machine learning approaches. KNN, a distance-based non-parametric method, excels in small-sample and nonlinear scenarios but suffers from the "curse of dimensionality" in high-dimensional spaces, requiring efficiency optimization via dimensionality reduction or locality-sensitive hashing. In contrast, LR, relying on linear assumptions, performs well with large-scale, high-dimensional data through regularization to prevent overfitting, yet its performance declines with small samples or nonlinear distributions. Each algorithm has its own benefits; the choice between algorithms should consider factors such as sample size, feature dimensionality, data quality, interpretability, and the alignment between the true data distribution and the algorithm’s inherent assumptions.

Keywords

Cell Type Annotation, k-Nearest Neighbors, Logistic Regression

[1]. Jovic, Dragomirka, Xue Liang, Hua Zeng, Lin Lin, Fengping Xu, and Yonglun Luo. "Single-Cell Rna Sequencing Technologies and Applications: A Brief Overview." Clinical and Translational Medicine 12, no. 3 (2022): e694.

[2]. Liu, Menglin. "Research On Cell Type Annotation Methods for Single-Cell Rna-Sequencing Data.", Hunan University, 2022.

[3]. Pasquini, Giovanni, Jesus Eduardo Rojo Arias, Patrick Schaefer, and Volker Busskamp. "Automated Methods for Cell Type Annotation On Scrna-Seq Data." Computational and Structural Biotechnology Journal 19 (2021): 961-69.

[4]. Domínguez Conde, C., C. Xu, L. B. Jarvis, D. B. Rainbow, S. B. Wells, T. Gomes, S. K. Howlett, O. Suchanek, K. Polanski, H. W. King, L. Mamanova, N. Huang, P. A. Szabo, L. Richardson, L. Bolt, E. S. Fasouli, K. T. Mahbubani, M. Prete, L. Tuck, N. Richoz, Z. K. Tuong, L. Campos, H. S. Mousa, E. J. Needham, S. Pritchard, T. Li, R. Elmentaite, J. Park, E. Rahmani, D. Chen, D. K. Menon, O. A. Bayraktar, L. K. James, K. B. Meyer, N. Yosef, M. R. Clatworthy, P. A. Sims, D. L. Farber, K. Saeb-Parsy, J. L. Jones, and S. A. Teichmann. "Cross-Tissue Immune Cell Analysis Reveals Tissue-Specific Features in Humans." SCIENCE 376, no. 6594 (2022): eabl5197.

[5]. Dasarathy, Belur V. "Nearest Neighbor (Nn) Norms: Nn Pattern Classification Techniques." IEEE Computer Society Tutorial (1991).

[6]. Prakisya, Nurcahya Pradana Taufik, Febri Liantoni, Puspanda Hatta, Yusfia Hafid Aristyagama, and Andika Setiawan. "Utilization of K-Nearest Neighbor Algorithm for Classification of White Blood Cells in Aml M4, M5, and M7." Open Engineering 11, no. 1 (2021): 662-68.

[7]. Bai, Xiuxiu, Xiaoshe Dong, and Yuanqi Su. "Edge Propagation Kd-Trees: Computing Approximate Nearest Neighbor Fields." IEEE SIGNAL PROCESSING LETTERS 22, no. 12 (2015): 2209-13.

[8]. Wu, Yingquan, Krassimir Ianakiev, and Venu Govindaraju. "Improved K-Nearest Neighbor Classification." PATTERN RECOGNITION 35, no. 10 (2002): 2311-18.

[9]. Li, Jia, Yu Shyr, and Qi Liu. "Aknno: Single-Cell and Spatial Transcriptomics Clustering with an Optimized Adaptive K-Nearest Neighbor Graph." GENOME BIOLOGY 25, no. 1 (2024).

[10]. Kleinbaum, David G., K. Dietz, M. Gail, Mitchel Klein, and Mitchell Klein. Logistic Regression: Springer, 2002.

[11]. Dreiseitl, S., and L. Ohno-Machado. "Logistic Regression and Artificial Neural Network Classification Models: A Methodology Review." JOURNAL OF BIOMEDICAL INFORMATICS 35, no. 5-6 (2002): 352-59.

[12]. Peterson, Leif E. "K-Nearest Neighbor." Scholarpedia 4, no. 2 (2009): 1883.

[13]. Lee, K. M., and K. M. Lee. "A Locality Sensitive Hashing Technique for Categorical Data." In INDUSTRIAL INSTRUMENTATION AND CONTROL SYSTEMS, PTS 1-4, edited by P. Yarlagadda and Y. H. Kim, 3159-64. International Conference on Measurement, Instrumentation and Automation (ICMIA 2012), 2013.

[14]. Boonyakunakorn, P., C. Nunti, W. Yamaka, and ACM. "Forecasting of Thailand's Rice Exports Price: Based On Ridge and Lasso Regression." In PROCEEDINGS OF 2019 2ND INTERNATIONAL CONFERENCE ON BIG DATA TECHNOLOGIES (ICBDT 2019), 354-57. 2nd International Conference on Big Data Technologies (ICBDT) / 3rd International Conference on Business Information Systems Workshop (ICBIS), 2019.

[15]. Abdelaal, Tamim, Lieke Michielsen, Davy Cats, Dylan Hoogduin, Hailiang Mei, Marcel J. T. Reinders, and Ahmed Mahfouz. "A Comparison of Automatic Cell Identification Methods for Single-Cell Rna Sequencing Data." GENOME BIOLOGY 20, no. 1 (2019).

[16]. Luecken, M. D., and F. J. Theis. "Current Best Practices in Single-Cell Rna-Seq Analysis: A Tutorial." Molecular Systems Biology 15, no. 6 (2019).

Cite this article

Wen,L. (2025). A Comparison Between the K-Nearest Neighbors Algorithm and Logistic Regression in the Field of Cell Type Annotation. Theoretical and Natural Science,93,51-56.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 3rd International Conference on Environmental Geoscience and Earth Ecology

Conference website: https://2025.icegee.org/
ISBN:978-1-83558-976-2(Print) / 978-1-83558-975-5(Online)
Conference date: 16 June 2025
Editor:Alan Wang
Series: Theoretical and Natural Science
Volume number: Vol.93
ISSN:2753-8818(Print) / 2753-8826(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).