Research Article
Open access
Published on 1 November 2024
Download pdf
Fu,H.;Qing,K. (2024). A lightweight, easy-integration reward shaping study for progress maximization in Reinforcement Learning for autonomous driving. Advances in Engineering Innovation,13,31-43.
Export citation

A lightweight, easy-integration reward shaping study for progress maximization in Reinforcement Learning for autonomous driving

Hongze Fu 1, Kunqiang Qing *,2,
  • 1 Shandong University
  • 2 Automotive Software Innovation Center (Chongqing)

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2977-3903/13/2024137

Abstract

This paper addresses the challenge of sample efficiency in reinforcement learning (RL) for autonomous driving, a domain characterized by long-term dependencies and complex environments. While RL has shown success in various fields, its application to autonomous driving is hindered by the need for numerous samples to learn effective policies. We propose a novel, lightweight reward-shaping method called room-of-adjust to maximize learning progress. This approach separates rewards into continuous tendency rewards for long-term guidance and discrete milestone rewards for short-term exploration. Our method is designed to be easily integrated with other approaches, such as efficient representation, imitation learning, and transfer learning. We evaluate our approach on a hill-climbing task with uneven surfaces, which simulates the spatial-temporal reasoning required in autonomous driving. Results show that our room-of-adjust reward shaping achieves near-human performance (81.93%), whereas other reward shaping and progress maximization methods struggle. When combined with imitation learning, the performance matches human levels (97.00%). The Study also explores the method's effectiveness in formulating control theory, such as 4-wheel independent drive (4WID) systems. With reduced spatial-temporal reasoning, reward shaping can match human performance (89.7%). However, control theory cannot be trained together with complicated spatial-temporal progress maximization.

Keywords

Reinforcement Learning, Autonomous Driving, Proximal Policy Optimization (PPO), End to end learning, Reward Shaping

[1]. Trumpp, R., Büchner, M., Valada, A., Caccamo, M. (2023). Efficient Learning of Urban Driving Policies Using Bird'View State Representations. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), pp. 4181–4186.

[2]. Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A. A., Yogamani, S., Pérez, P. (2021). Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23, 4909–4926.

[3]. Wang, L., Liu, J., Shao, H., Wang, W., Chen, R., Liu, Y., Waslander, S. L. (2023). Efficient reinforcement learning for autonomous driving with parameterized skills and priors. arXiv preprint arXiv:2305.04412.

[4]. Chae, H., Kang, C. M., Kim, B., Kim, J., Chung, C. C., Choi, J. W. (2017). Autonomous braking system via deep reinforcement learning. In Proceedings of the 2017 IEEE 20th International conference on intelligent transportation systems (ITSC), pp. 1–6.

[5]. Ha, D., Schmidhuber, J. (2018). Recurrent world models facilitate policy evolution. Advances in neural information processing systems, 31.

[6]. Chekroun, R., Toromanoff, M., Hornauer, S., Moutarde, F. (2023). Gri: General reinforced imitation and its application to vision-based autonomous driving. Robotics, 12, 127.

[7]. Lu, Y., Fu, J., Tucker, G., Pan, X., Bronstein, E., Roelofs, R., Sapp, B., White, B., Faust, A., Whiteson, S. (2023). Imitation is not enough: Robustifying imitation with reinforcement learning for challenging driving scenarios. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7553–7560.

[8]. Nilaksh, A. R., Agrawal, S., Jain, A., Jagtap, P., Kolathaya, S. (2024). Barrier Functions Inspired Reward Shaping for Reinforcement Learning. arXiv preprint arXiv:2403.01410.

[9]. Le Mero, L., Yi, D., Dianati, M., Mouzakitis, A. (2022). A survey on imitation learning techniques for end-to-end autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems, 23, 14128–14147.

[10]. Hnewa, M., Radha, H. (2020). Object detection under rainy conditions for autonomous vehicles: A review of state-of-the-art and emerging techniques. IEEE Signal Processing Magazine, 38, 53–67.

[11]. Jehanzeb Mirza, M., Masana, M., Possegger, H., Bischof, H. (2022). An efficient domain-incremental learning approach to drive in all weather conditions. arXiv e-prints, arXiv:2204.08817.

[12]. Mohammed, A. S., Amamou, A., Ayevide, F. K., Kelouwani, S., Agbossou, K., Zioui, N. (2020). The perception system of intelligent ground vehicles in all weather conditions: A systematic literature review. Sensors, 20, 6532.

[13]. Chiba, S., Sasaoka, H. (2021). Effectiveness of transfer learning in autonomous driving using model car. In Proceedings of the Proceedings of the 2021 13th International Conference on Machine Learning and Computing, pp. 595–601.

[14]. Akhauri, S., Zheng, L. Y., Lin, M. C. (2020). Enhanced transfer learning for autonomous driving with systematic accident simulation. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5986–5993.

[15]. Liu, X., Li, J., Ma, J., Sun, H., Xu, Z., Zhang, T., Yu, H. (2023). Deep transfer learning for intelligent vehicle perception: A survey. Green Energy and Intelligent Transportation, 100125.

[16]. Randløv, J., Alstrøm, P. (1998). Learning to Drive a Bicycle Using Reinforcement Learning and Shaping. In Proceedings of the ICML, pp. 463–471.

[17]. Wu, L. -C., Zhang, Z., Haesaert, S., Ma, Z., Sun, Z. (2023). Risk-aware reward shaping of reinforcement learning agents for autonomous driving. In Proceedings of the IECON 2023 - 49th Annual Conference of the IEEE Industrial Electronics Society, pp. 1–6.

[18]. Lv, K., Pei, X., Chen, C., Xu, J. (2022). A safe and efficient lane change decision-making strategy of autonomous driving based on deep reinforcement learning. Mathematics, 10, 1551.

[19]. Niu, J., Hu, Y., Jin, B., Han, Y., Li, X. (2020). Two-stage safe reinforcement learning for high-speed autonomous racing. In Proceedings of the 2020 IEEE international conference on Systems, Man, and Cybernetics (SMC), pp. 3934–3941.

[20]. Abouelazm, A., Michel, J., Zoellner, J. M. (2024). A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving. arXiv preprint arXiv:2405.01440.

[21]. Lee, H., Jeong, J. (2023). Velocity range-based reward shaping technique for effective map-less navigation with LiDAR sensor and deep reinforcement learning. Frontiers in Neurorobotics, 17, 1210442.

[22]. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

[23]. Vamplew, P., Smith, B. J., Källström, J., Ramos, G., Rădulescu, R., Roijers, D. M., Hayes, C. F., Heintz, F., Mannion, P., Libin, P. J. (2022). Scalar reward is not enough: A response to silver, singh, precup and sutton (2021). Autonomous Agents and Multi-Agent Systems, 36, 41.

[24]. Mariusz, B. (2016). End to end learning for self-driving cars. arXiv:1604.07316.

[25]. Ho, J., Ermon, S. (2016). Generative adversarial imitation learning. Advances in neural information processing systems, 29.

[26]. Peng, H., Chen, X. (2022). Active safety control of X-by-wire electric vehicles: A survey. SAE International Journal of Vehicle Dynamics, Stability, and NVH, 6, 115–133.

[27]. Tong, Y., Li, C., Wang, G., Jing, H. (2022). Integrated path-following and fault-tolerant control for four-wheel independent-driving electric vehicles. Automotive Innovation, 5, 311–323.

[28]. Li, R., Yu, Y., Sun, Y., Lu, Z., Tian, G. (2022). Trajectory following control for automated drifting of 4WID vehicles; 0148-7191; SAE Technical Paper.

[29]. Li, B., Du, H., Zhang, B. (2019). Path planning for autonomous vehicle in off-road scenario. In Path Planning for Autonomous Vehicles - Ensuring Reliable Driverless Navigation and Control Maneuver; IntechOpen.

[30]. Dulac-Arnold, G., Mankowitz, D., Hester, T. (2019). Challenges of real-world reinforcement learning. arXiv preprint arXiv:1904.12901.

[31]. Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., Pérolat, J., Silver, D., Graepel, T. (2017). A unified game-theoretic approach to multiagent reinforcement learning. Advances in neural information processing systems, 30.

[32]. Sun, X., Zhou, M., Zhuang, Z., Yang, S., Betz, J., Mangharam, R. (2023). A benchmark comparison of imitation learning-based control policies for autonomous racing. In Proceedings of the 2023 IEEE Intelligent Vehicles Symposium (IV), pp. 1–5.

Cite this article

Fu,H.;Qing,K. (2024). A lightweight, easy-integration reward shaping study for progress maximization in Reinforcement Learning for autonomous driving. Advances in Engineering Innovation,13,31-43.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Journal:Advances in Engineering Innovation

Volume number: Vol.13
ISSN:2977-3903(Print) / 2977-3911(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).