Comparison of machine learning algorithms and feature importance analysis for star classification

Research Article
Open access

Comparison of machine learning algorithms and feature importance analysis for star classification

Taiqi Zhou 1*
  • 1 Hong Kong Polytechnic University    
  • *corresponding author 21106717d@connect.polyu.hk
Published on 31 January 2024 | https://doi.org/10.54254/2755-2721/30/20230111
ACE Vol.30
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-83558-285-5
ISBN (Online): 978-1-83558-286-2

Abstract

This study presents a comprehensive investigation of star classification based on the Stellar Classification Dataset-SDSS17, employing machine learning algorithms including Random Forest, Gradient Boosting, and Support Vector Machine (SVM), along with Shapley Additive Explanations (SHAP) feature importance analysis. The research found that among the 17 features studied, redshift consistently emerged as the most significant feature. Additionally, the feature importance or SHAP value of Redshift obtained in various models is significantly higher than that of other features. The angles of Right Ascension (alpha) and Declination (delta), in contrast, showed the least importance. Models with higher accuracy tend to exhibit lower importance for Redshift. For the classifier result, Random Forest yielded the highest accuracy and SVM had the lowest accuracy. Most models perform best when classifying the "star" class and worst when classifying the "quasar" class. These findings provide valuable insights for automated star classification and underscore the critical role of redshift, thereby aligning with astronomical theories. Further research could include investigating more sophisticated models, like neural networks, and conducting a more profound analysis of feature interactions.

Keywords:

star classification, machine learning, feature importance analysis

Zhou,T. (2024). Comparison of machine learning algorithms and feature importance analysis for star classification. Applied and Computational Engineering,30,261-270.
Export citation

References

[1]. Daud, A., Ahmad, M., Malik, M. S. I., & Che, D. (2015). Using machine learning techniques for rising star prediction in co-author network. Scientometrics, 102, 1687-1711.

[2]. Stellar Classification Dataset - SDSS17 (2022). URL: https://www.kaggle.com/datasets/fedesoriano/stellar-classification-dataset-sdss17. Last accessed: 2023/07/09.

[3]. Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260.

[4]. Mahesh, B. (2020). Machine learning algorithms-a review. International Journal of Science and Research (IJSR) 9(1), 381-386.

[5]. Nohara, Y., Matsumoto, K., Soejima, H., & Nakashima, N. (2019). Explanation of machine learning models using improved shapley additive explanation. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 546-546.

[6]. Rigatti, S. J. (2017). Random forest. Journal of Insurance Medicine, 47(1), 31-39.

[7]. Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in neurorobotics, 7, 21.

[8]. Friedman, J. H. (2002). Stochastic gradient boosting. Computational statistics & data analysis, 38(4), 367-378.

[9]. Suthaharan, S., & Suthaharan, S. (2016). Support vector machine. Machine learning models and algorithms for big data classification: thinking with examples for effective learning, 207-235.

[10]. Sahlaoui, H., Nayyar, A., Agoujil, S., & Jaber, M. M. (2021). Predicting and interpreting student performance using ensemble models and shapley additive explanations. IEEE Access, 9, 152688-152703.

[11]. Ren, J., Wang, L., Zhang, S., Cai, Y., & Chen, J. (2021). Online Critical Unit Detection and Power System Security Control: An Instance-Level Feature Importance Analysis Approach. Applied Sciences, 11(12), 5460.


Cite this article

Zhou,T. (2024). Comparison of machine learning algorithms and feature importance analysis for star classification. Applied and Computational Engineering,30,261-270.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 2023 International Conference on Machine Learning and Automation

ISBN:978-1-83558-285-5(Print) / 978-1-83558-286-2(Online)
Editor:Mustafa İSTANBULLU
Conference website: https://2023.confmla.org/
Conference date: 18 October 2023
Series: Applied and Computational Engineering
Volume number: Vol.30
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[1]. Daud, A., Ahmad, M., Malik, M. S. I., & Che, D. (2015). Using machine learning techniques for rising star prediction in co-author network. Scientometrics, 102, 1687-1711.

[2]. Stellar Classification Dataset - SDSS17 (2022). URL: https://www.kaggle.com/datasets/fedesoriano/stellar-classification-dataset-sdss17. Last accessed: 2023/07/09.

[3]. Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260.

[4]. Mahesh, B. (2020). Machine learning algorithms-a review. International Journal of Science and Research (IJSR) 9(1), 381-386.

[5]. Nohara, Y., Matsumoto, K., Soejima, H., & Nakashima, N. (2019). Explanation of machine learning models using improved shapley additive explanation. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 546-546.

[6]. Rigatti, S. J. (2017). Random forest. Journal of Insurance Medicine, 47(1), 31-39.

[7]. Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in neurorobotics, 7, 21.

[8]. Friedman, J. H. (2002). Stochastic gradient boosting. Computational statistics & data analysis, 38(4), 367-378.

[9]. Suthaharan, S., & Suthaharan, S. (2016). Support vector machine. Machine learning models and algorithms for big data classification: thinking with examples for effective learning, 207-235.

[10]. Sahlaoui, H., Nayyar, A., Agoujil, S., & Jaber, M. M. (2021). Predicting and interpreting student performance using ensemble models and shapley additive explanations. IEEE Access, 9, 152688-152703.

[11]. Ren, J., Wang, L., Zhang, S., Cai, Y., & Chen, J. (2021). Online Critical Unit Detection and Power System Security Control: An Instance-Level Feature Importance Analysis Approach. Applied Sciences, 11(12), 5460.