Project on salary classification

Research Article
Open access

Project on salary classification

Yuntian Xu 1*
  • 1 University of California    
  • *corresponding author yuntiax@uci.edu
Published on 26 December 2023 | https://doi.org/10.54254/2755-2721/29/20230744
ACE Vol.29
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-83558-259-6
ISBN (Online): 978-1-83558-260-2

Abstract

In this project, The results use three different machine learning algorithms to approach salary classification. The analyzed data used many different variables such as education level, age, and work-class to label each person into two categories, one with a salary greater than 50k and the other with a salary less than or equal to 50k. First of all, this work uses a single decision tree model to visualize data because it is more concise and understandable, and then by using the support vector machine method, the result becomes more accurate. After building two different models, The accuracy was found to be about 86.32%, which is relatively high and reliable. However, higher accuracy may be more persuasive. So, this project uses another model which is the random forest model. This algorithm is considered a highly accurate method because of the number of decision trees that participated. This model explained 87.03% of the accuracy of my result. According to my models, if a person desires a wage increase, that person should do his best to improve his education level, and he needs to have a stable marriage situation and be able to start his own business as much as possible between the ages of 20 to 60.

Keywords:

salary classification, machine learning, decision tree model

Xu,Y. (2023). Project on salary classification. Applied and Computational Engineering,29,12-18.
Export citation

References

[1]. “Salary Classification”, Kaggle, published by AYESSA https://www.kaggle.com/datasets/ayessa/salary-prediction-classification

[2]. Boyce, J Christopher, Brown, D.A. Gordon and Moore C. Simon. “Money and Happiness: Rank of Income, not Income, Affects Life Satisfaction”, published on Psychological Science.https://dspace.stir.ac.uk/bitstream/1893/12866/1/BoyceBrownMoore_PsychScie nce.pdf

[3]. Rynes, L. Sara, Gerhart, Baarry and Minette, A. Kathleen. “The Importance of Pay In Employee Motivation: Discrepancies Between What People Say And What They Do.” https://download.clib.psu.ac.th/datawebclib/e_resource/trial_database/WileyInterScience CD/pdf/HRM/HRM_2.pdf

[4]. Unwin, Antony. “Why is Data Visualization Important? What is Important in Data Visualization?”, published on Jan 31, 2020, Updated on Feb 02, 2020. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Why+isDataVisualization+Important%3FWhatis+ImportantinData+Visualization%3F&btnG=

[5]. Taubman, J. Paul and Wales, J. Terence. “Higher Education, Mental Ability, and Screening”. http://kumlai.free.fr/RESEARCH/THESE/TEXTE/INEQUALITY/Segment/OK%20Hig her%20Education%20Metal%20Ability.pdf

[6]. Waite, J. Linda. “Does Marriage Matter?”. Published on Nov 1995 and published by Population Association of America. https://www.researchgate.net/profile/Linda-Waite/publication/14281103_Does_Marriage_Matter/links/5849c62008ae5038263d89f6/Does-Marriage-Matter.pdf

[7]. Jijo, Taha. Bahzad and Abdulazzez, Mohsin. Adnan. “Classification Based on Decision Tree Algorithm for Machine Learning”. Published on Journal of Applied Science And Technology Trends. Publish on Mar 24, 2021. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Classification+Based+on+ Decision+Tree+Algorithm+for+Machine+Learning&btnG=

[8]. Noble, S. William. “What is a Support Vector Machine?”. Published at Natural Biotechnology on Dec 2006. https://www.ifi.uzh.ch/dam/jcr:00000000-7f84-9c3b-ffff-ffffc550ec57/what_is_a_suppor t_vector_machine.pdf

[9]. Oshiro, Mayumi. Thais, Perez, Santoro. Pedro and Baranauskas, Augusto. Jose. “How Many Trees in a Random Forest?”. https://www.researchgate.net/profile/Jose-Baranauskas/publication/230766603_How_Many_Trees_in_a_Random_Forest/links/0912f5040fb35357a1000000/How-Many-Trees-in-a-Random-Forest.pdf

[10]. Perner, Petra. “Improving the accuracy of decision tree induction by feature preselection.” Published on Nov 30, 2010. //www.tandfonline.com/doi/pdf/10.1080/088395101317018582


Cite this article

Xu,Y. (2023). Project on salary classification. Applied and Computational Engineering,29,12-18.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 5th International Conference on Computing and Data Science

ISBN:978-1-83558-259-6(Print) / 978-1-83558-260-2(Online)
Editor:Alan Wang, Marwan Omar, Roman Bauer
Conference website: https://2023.confcds.org/
Conference date: 14 July 2023
Series: Applied and Computational Engineering
Volume number: Vol.29
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[1]. “Salary Classification”, Kaggle, published by AYESSA https://www.kaggle.com/datasets/ayessa/salary-prediction-classification

[2]. Boyce, J Christopher, Brown, D.A. Gordon and Moore C. Simon. “Money and Happiness: Rank of Income, not Income, Affects Life Satisfaction”, published on Psychological Science.https://dspace.stir.ac.uk/bitstream/1893/12866/1/BoyceBrownMoore_PsychScie nce.pdf

[3]. Rynes, L. Sara, Gerhart, Baarry and Minette, A. Kathleen. “The Importance of Pay In Employee Motivation: Discrepancies Between What People Say And What They Do.” https://download.clib.psu.ac.th/datawebclib/e_resource/trial_database/WileyInterScience CD/pdf/HRM/HRM_2.pdf

[4]. Unwin, Antony. “Why is Data Visualization Important? What is Important in Data Visualization?”, published on Jan 31, 2020, Updated on Feb 02, 2020. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Why+isDataVisualization+Important%3FWhatis+ImportantinData+Visualization%3F&btnG=

[5]. Taubman, J. Paul and Wales, J. Terence. “Higher Education, Mental Ability, and Screening”. http://kumlai.free.fr/RESEARCH/THESE/TEXTE/INEQUALITY/Segment/OK%20Hig her%20Education%20Metal%20Ability.pdf

[6]. Waite, J. Linda. “Does Marriage Matter?”. Published on Nov 1995 and published by Population Association of America. https://www.researchgate.net/profile/Linda-Waite/publication/14281103_Does_Marriage_Matter/links/5849c62008ae5038263d89f6/Does-Marriage-Matter.pdf

[7]. Jijo, Taha. Bahzad and Abdulazzez, Mohsin. Adnan. “Classification Based on Decision Tree Algorithm for Machine Learning”. Published on Journal of Applied Science And Technology Trends. Publish on Mar 24, 2021. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Classification+Based+on+ Decision+Tree+Algorithm+for+Machine+Learning&btnG=

[8]. Noble, S. William. “What is a Support Vector Machine?”. Published at Natural Biotechnology on Dec 2006. https://www.ifi.uzh.ch/dam/jcr:00000000-7f84-9c3b-ffff-ffffc550ec57/what_is_a_suppor t_vector_machine.pdf

[9]. Oshiro, Mayumi. Thais, Perez, Santoro. Pedro and Baranauskas, Augusto. Jose. “How Many Trees in a Random Forest?”. https://www.researchgate.net/profile/Jose-Baranauskas/publication/230766603_How_Many_Trees_in_a_Random_Forest/links/0912f5040fb35357a1000000/How-Many-Trees-in-a-Random-Forest.pdf

[10]. Perner, Petra. “Improving the accuracy of decision tree induction by feature preselection.” Published on Nov 30, 2010. //www.tandfonline.com/doi/pdf/10.1080/088395101317018582