
The prediction of stroke and feature importance analysis based on multiple machine learning algorithms
- 1 Zhejiang University of Technology
* Author to whom correspondence should be addressed.
Abstract
Stroke is a leading cause of death and disability worldwide, which requires the accurate and timely diagnosis for effective stroke management. Based on the Kaggle dataset, data preprocessing, which included addressing missing values, encoding categorical variables, and normalising numerical features, was done first in the study. Next, this paper implemented three commonly used machine learning models: logistic regression, decision tree, and random forest. To assess the performance of the models, the paper applied accuracy as the evaluation metric, which measures the proportion of correct predictions out of all predictions. This study also identified the most important features affecting stroke risk using feature importance analysis provided by the machine learning. All three models achieved accuracy rates, according to the experimental findings, albeit random forest outperformed the other two models. The reliability of the models for random forest, decision tree, and logistic regression were 0.963, 0.925, and 0.961, respectively. Feature importance analysis revealed that age, average glucose level, and work type were the most important predictors of stroke risk. Findings in this study suggest that machine learning algorithms, particularly the Logistic Regression model, can effectively predict the likelihood of stroke using the Stroke Prediction Dataset. These findings are in line with other research that also showed how machine learning has the potential to enhance stroke diagnosis. The identification of important features affecting stroke risk can provide valuable insights for clinicians and researchers in developing targeted interventions for stroke prevention and management.
Keywords
stroke prediction, machine learning, artificial intelligence
[1]. Sacco R L et al. 2013 An updated definition of stroke for the 21st century: a statement for healthcare professionals from the American Heart Association/American Stroke Association Stroke vol 44 7 2064-89
[2]. Feigin V L et al 2014 Global and regional burden of stroke during 1990–2010: findings from the Global Burden of Disease Study 2010 The Lancet 383(9913) 245-255
[3]. Barker-Collo S Bennett D A Krishnamurthi R V et al. 2015 Sex differences in stroke incidence, prevalence, mortality and disability-adjusted life years: results from the Global Burden of Disease Study 2013 Neuroepidemiology 45(3): 203-214
[4]. Bhogal S K Teasell R Foley N et al. 2004 Lesion location and poststroke depression: systematic review of the methodological limitations in the literature Stroke 35(3): 794-802
[5]. Jiang F Jiang Y Zhi H et al. 2017 Artificial intelligence in healthcare: past, present and future Stroke and vascular neurology 2(4)
[6]. Esteva A Kuprel B Novoa R A 2017 et al. Dermatologist-level classification of skin cancer with deep neural networks nature 542(7639): 115-118
[7]. Wang X Peng Y Lu L et al. 2017 Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases Proceedings of the IEEE conference on computer vision and pattern recognition 2097-2106
[8]. Huang C Yin C 2021 DEEP LEARNING SURVIVAL PREDICTION FOR LUNG CANCER PATIENTS Biomedical Engineering: Applications, Basis and Communications
[9]. Topol E J 2019 High-performance medicine: the convergence of human and artificial intelligence Nature medicine 25(1): 44-56
[10]. Stroke Prediction Dataset Kaggle 2021 https://www.kaggle.com/fedesoriano/stroke-prediction-dataset
[11]. Hosmer Jr D W Lemeshow S Sturdivant R X 2013 Applied logistic regression John Wiley & Sons
[12]. Breiman L Friedman J Olshen R et al 1993 Classification and regression trees, wadsworth international group, belmont, ca Case Description Feature Subset Correct Missed FA Misclass 1: 1-3
[13]. Breiman L 2001 Random forests Machine learning 45: 5-32
[14]. Newer risk factors for stroke 2001 Neurology 57(suppl 2): S31-S34
Cite this article
Li,S. (2023). The prediction of stroke and feature importance analysis based on multiple machine learning algorithms. Applied and Computational Engineering,18,37-41.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 5th International Conference on Computing and Data Science
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).