
House price prediction using machine learning for Ames, Iowa
- 1 Georgia Institute of Technology
* Author to whom correspondence should be addressed.
Abstract
The real estate sector is pivotal to economic growth and plays a substantial role in the global GDP. In this technologically advanced age, the adoption of machine learning for accurate house price prediction is crucial. These models optimize decision-making for homeowners, sellers, and investors alike. This study represents a comprehensive exploration into the field of house price prediction within the context of Ames, Iowa, United States. The primary objective of this research is to construct a reliable and highly accurate predictive model, empowering individuals to estimate property values with unprecedented precision. The research encompasses three different machine learning algorithms: linear regression, random forest, and XGBoost. The use of the dataset from the reputed website helps improve the reliability of the result. Furthermore, the investigation extends to a detailed examination of the multifaceted determinants exerting a profound influence on house prices in the dynamic Ames real estate landscape, and determined that the factor that will influence the house price most is the total area of the house. Among all models, XGBoost produces the best result, which achieved an R-square of 0.8803. Moreover, the importance of each feature is also analyized using the feature ranking algorithm in random forest, showing that the overall quantity of the house, the living area of the house, and the total area of the basement are the top three factors that influence the house price most.
Keywords
Machine learning, house price prediction, linear regression model, random forest, XGBoost
[1]. Mora-Garcia, R.-T., Cespedes-Lopez, M.-F., & Perez-Sanchez, V. R. (2022). Housing Price Prediction Using Machine Learning Algorithms in COVID-19 Times. Land, 11(11), 2100. MDPI AG.
[2]. Wikipedia contributors. (2023, October 13). Ames, Iowa. In Wikipedia, The Free Encyclopedia. Retrieved from https://en.wikipedia.org/w/index.php?title=Ames,_Iowa&oldid=1179870443
[3]. Ames Housing Market Report. (2023, September). Rocket. Retrieved from https://www.rockethomes.com/real-estate-trends/ia/ames
[4]. Kaggle contributors. House Prices – Advanced Regression Techniques. Kaggle. https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview
[5]. N. N. Ghosalkar and S. N. Dhage, "Real Estate Value Prediction Using Linear Regression," 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 2018, pp. 1-5, doi: 10.1109/ICCUBEA.2018.8697639
[6]. What is random forest? IBM. Retrieved from https://www.ibm.com/topics/random-forest
[7]. Sruthi E. R. (2023, July). Understanding Random Forest Algorithms with Examples. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/#Working_of_Random_Forest_Algorithm
[8]. Jangaraj, Avanija & Sunitha, Gurram & Madhavi, Reddy & Kora, Padmavathi & Hitesh, R & Associate, Sai. (2021). Prediction of House Price Using XGBoost Regression Algorithm. Turkish Journal of Computer and Mathematics Education (TURCOMAT). 12. 2151-2155.
[9]. Nembrini, S., König, I. R., & Wright, M. N. (2018). The revival of the Gini importance? Bioinformatics, 34(21), 3711-3718. https://doi.org/10.1093/bioinformatics/bty373
[10]. NVIDIA., What is XGBoost?” NVIDIA Data Science Glossary, (2022). Retrieved from https://www.nvidia.com/en-us/glossary/data-science/xgboost/
[11]. Computer Science Wiki contributors. (2023, January). Mean absolute error. Computer Science Wiki. Retrieved from https://computersciencewiki.org/index.php/Mean_absolute_error_(MAE)
[12]. Computer Science Wiki contributors. (2023, January). Root-mean-square error. Computer Science Wiki. Retrieved from https://computersciencewiki.org/index.php?title=Root-mean-square_error_(RMSE)
[13]. Computer Science Wiki contributors. (2023, January). Mean absolute error. Computer Science Wiki. Retrieved from https://computersciencewiki.org/index.php/Mean_absolute_error_(MAE)
[14]. Chicco D, Warrens MJ, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput Sci. 2021 Jul 5;7:e623. doi: 10.7717/peerj-cs.623. PMID: 34307865; PMCID: PMC8279135.
Cite this article
Ye,Q. (2024). House price prediction using machine learning for Ames, Iowa. Applied and Computational Engineering,55,44-54.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 4th International Conference on Signal Processing and Machine Learning
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).