
Dynamic Obstacle Avoidance and Trajectory Planning Based on the Soft Actor-Critic Algorithm
- 1 School of Future Technology, Tianjin University, Tianjin, 300354, China
- 2 Department of Science&Technology, City, University of London, London, EC1V 0HB, United Kingdom
- 3 Department of Lille, Hohai University, Changzhou, 213200, China
- 4 Eastern Michigan Joint College, Beibu Gulf University, Qinzhou, 535037, China
- 5 Department of International, AP&AL, Guangzhou Foreign Language School, Guangzhou, 511455, China
* Author to whom correspondence should be addressed.
Abstract
In automated factories, dynamic obstacle avoidance and trajectory planning of robotic manipulators are critical to achieving safe and efficient operations. However, traditional obstacle avoidance methods, such as the artificial potential field, element decomposition, viewable, Voronoi diagram, and probabilistic road map, face many challenges in dealing with complex dynamic environments, such as target unreachable and local minimum problems. A new dynamic obstacle avoidance and trajectory planning framework based on the Soft Actor-Critic (SAC) algorithm is proposed in this paper to address these problems. The framework combines the fast-scaling random tree (RRT) algorithm for global path planning and the SAC algorithm to optimize the local path to adapt to the changes in the dynamic environment. Specifically, the simulation uses Python to construct a URDF (Unified Robot Description Format) model of an open-source robot arm. It applies the SAC algorithm to the model's dynamic obstacle avoidance trajectory planning. The simulation results show that the proposed framework combining RRT and SAC algorithms achieved a high success rate in reaching the target point. This method can effectively find the right trajectory in a complex dynamic environment.
Keywords
Trajectory planning, Dynamic obstacle avoidance, Reinforcement learning, SAC
[1]. X. Cheng and S. Liu, "Dynamic Obstacle Avoidance Algorithm for Robot Arm Based on Deep Reinforcement Learning," 2022 IEEE 11th Data Driven Control and Learning Systems Conference (DDCLS), Chengdu, China, 2022, pp. 1136-1141,doi:10.1109/DDCLS55054.2022.9858561.
[2]. K. Kamali, I. A. Bonev and C. Desrosiers, "Real-time Motion Planning for Robotic Teleoperation Using Dynamic-goal Deep Reinforcement Learning," 2020 17th Conference on Computer and Robot Vision (CRV), Ottawa, ON, Canada, 2020, pp. 182-189, doi: 10.1109/CRV50864.2020.00032.
[3]. M. Imran and F. Kunwar, "A hybrid path planning technique developed by integrating global and local path planner," 2016 International Conference on Intelligent Systems Engineering (ICISE), Islamabad, Pakistan, 2016, pp. 118-122, doi: 10.1109/INTELSE.2016.7475172.
[4]. X. Yuan, "Research on the Limitations of UAV Path Planning Based on Artificial Potential Field Method," 2022 9th International Forum on Electrical Engineering and Automation (IFEEA), Zhuhai, China, 2022, pp. 619-622, doi: 10.1109/IFEEA57288.2022.10037827.
[5]. Q. Yang, L. Yu and Z. Chen, "Research on Global Path Planning Based on the Integration of the APF and Soft-Actor Critic Algorithms," 2024 IEEE 14th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Copenhagen, Denmark, 2024, pp. 49-54, doi: 10.1109/CYBER63482.2024.10749676.
[6]. Chethana, S. et al. (2023) ‘Humanoid robot gait control using PPO, SAC, and es algorithms’, 2023 4th IEEE Global Conference for Advancement in Technology (GCAT), pp. 1–7. doi:10.1109/gcat59970.2023.10353490.
[7]. Tengesdal, T., Pedersen, T.A. and Johansen, T.A. (2025) ‘A comparative study of rapidly-exploring random tree algorithms applied to ship trajectory planning and behavior generation’, Journal of Intelligent & Robotic Systems, 111(1). doi:10.1007/s10846-025-02222-7.
[8]. Zhang, R. et al. (2025) ‘Efficient and near-optimal global path planning for agvs: A DNN-based double closed-loop approach with guarantee mechanism’, IEEE Transactions on Industrial Electronics, 72(1), pp. 681–692. doi:10.1109/tie.2024.3409883.
[9]. Cao, M. et al. (2025) ‘A novel rrt*-connect algorithm for path planning on robotic arm collision avoidance’, Scientific Reports, 15(1). doi:10.1038/s41598-025-87113-5.
[10]. Penelas, G. et al. (2025) ‘Machine Learning for Decision Support and automation in games: A study on Vehicle Optimal Path’, Algorithms, 18(2), p. 106. doi:10.3390/a18020106.
[11]. Chen, Y. et al. (2023) ‘Deep reinforcement learning in maximum entropy framework with automatic adjustment of mixed temperature parameters for path planning’, 2023 7th International Conference on Robotics, Control and Automation (ICRCA), pp. 78–82. doi:10.1109/icrca57894.2023.10087467.
[12]. Ali, H. et al. (2021) ‘Reducing entropy overestimation in soft actor critic using Dual Policy Network’, Wireless Communications and Mobile Computing, 2021(1). doi:10.1155/2021/9920591.
[13]. Cai, Y. et al. (2022a) ‘A review of research on the application of deep reinforcement learning in Unmanned Aerial Vehicle Resource Allocation and trajectory planning’, 2022 4th International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), pp. 238–241. doi:10.1109/mlbdbi58171.2022.00053.
[14]. Peng, G. et al. (2023) ‘Deep reinforcement learning with a stage incentive mechanism of dense reward for robotic trajectory planning’, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 53(6), pp. 3566–3573. doi:10.1109/tsmc.2022.3228901.
[15]. Xie, J. et al. (2019) ‘Deep reinforcement learning with optimized reward functions for robotic trajectory planning’, IEEE Access, 7, pp. 105669–105679. doi:10.1109/access.2019.2932257.
[16]. Nguyen, H.T., Tran, K. and Luong, N.H. (2022) ‘Combining soft-actor critic with cross-entropy method for policy search in continuous control’, 2022 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. doi:10.1109/cec55065.2022.9870209.
[17]. Michalik, R. and Janota, A. (2020b) ‘The pybullet module-based approach to control the collaborative Yumi Robot’, 2020 ELEKTRO, pp. 1–4. doi:10.1109/elektro49696.2020.9130233.
[18]. Huo, B. et al. (2024) ‘Target-reaching control with obstacle avoidance based future perception SAC for unmanned bicycle’, 2024 China Automation Congress (CAC), pp. 7250–7255. doi:10.1109/cac63892.2024.10865213.
Cite this article
Wu,Q.;Shi,W.;Fan,S.;Qu,B.;Zhang,H. (2025). Dynamic Obstacle Avoidance and Trajectory Planning Based on the Soft Actor-Critic Algorithm. Applied and Computational Engineering,158,17-26.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of CONF-SEML 2025 Symposium: Machine Learning Theory and Applications
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).