
Taxi fare prediction based on multiple machine learning models
- 1 University of California, Davis
* Author to whom correspondence should be addressed.
Abstract
The use of taxis as a fundamental mode of transportation in everyday life has led to the increased popularity of various ride-hailing applications such as Uber and Lyft, enabling users to conveniently request and view the predicted fare for their desired destination. Accurately predicting the fare is thus of significant importance. In this study, machine learning models were employed to forecast taxi fares based on factors such as distance and passenger count. As the initial data only contained latitude and longitude values, the Haversine formula was utilized to calculate the distance between two locations. Moreover, the raw data was plagued with inconsistencies such as negative fares and grossly exaggerated distances, which were resolved by implementing four data cleaning criteria. Following the preprocessing stage, three distinct models (i.e., linear regression, decision tree, and random forest) were trained and evaluated using the root mean square error metric. The results indicated that the random forest model produced the smallest error (1.264), followed by the decision tree model with a similar error rate (1.277), and lastly, the linear regression model with the highest error (1.718). Thus, the random forest model demonstrated superior performance and is recommended for accurate fare predictions.
Keywords
machine learning, taxi fare prediction, linear regression
[1]. Qasem A G and Lam S S 2020 Predicting taxi fare using multilayer perceptron and radial basis function networks: New York city as a case study In IIE Annual Conference Proceedings (pp. 1-6) Institute of Industrial and Systems Engineers (IISE)
[2]. Banerjee P Kumar B Singh A Ranjan P & Soni K 2020 Predictive analysis of taxi fare using machine learning Int. J. Sci. Res. Comput Sci. Eng. Inf. Technol 373-378
[3]. Baştanlar Y & Özuysal M 2014 Introduction to machine learning. miRNomics: MicroRNA biology and computational analysis 105-128
[4]. Berry M W Mohamed A & Yap, B. W. (Eds.) 2019 Supervised and unsupervised learning for data science Springer Nature
[5]. Mehta K Shah A & Patel S 2022 2022 Cab Fare Prediction Using Machine Learning. In Computing Science, Communication and Security: Third International Conference, COMS2 2022, Gujarat India February 6–7 Revised Selected Papers (pp. 244-254) Cham: Springer International Publishing.
[6]. Kaggle 2018 New York City Taxi Fare Prediction https://www.kaggle.com/competitions/new-york-city-taxi-fare-prediction/overview/description
[7]. Chopde N R & Nichat M 2013 Landmark based shortest path detection by using A* and Haversine formula International Journal of Innovative Research in Computer and Communication Engineering 1(2) 298-302
[8]. Eck D J 2018 Bootstrapping for multivariate linear regression models Statistics & Probability Letters 134 141-149
[9]. Alexopoulos E C 2010 Introduction to multivariate regression analysis Hippokratia 14(Suppl 1) 23
[10]. Song Y Y & Ying L U 2015 Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry 27(2) 130
[11]. Myles A J Feudale R N Liu Y Woody N A & Brown S D 2004 An introduction to decision tree modeling Journal of Chemometrics: A Journal of the Chemometrics Society 18(6) 275-285
[12]. Breiman L 2001 Random forests Machine learning 45 5-32
[13]. Chai T & Draxler R R 2014 Root mean square error (RMSE) or mean absolute error (MAE) Geoscientific model development discussions 7(1) 1525-1534
Cite this article
Huang,H. (2023). Taxi fare prediction based on multiple machine learning models. Applied and Computational Engineering,16,7-12.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 5th International Conference on Computing and Data Science
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).