Deep Reinforcement Learning-Based Gait Optimization for Bionic Quadruped Robots

Junqing Wang

doi:10.54254/2755-2721/2025.20104

1. Introduction

With increasing human exploration of nature and the desire for harmonious coexistence, bionic robots have become a hot topic in international scientific discourse in recent years. This study focuses on bionic quadrupedal robots, which mimic quadrupedal mammals. The design of these robots is closely related to the characteristics of the animals they imitate. After millions of years of evolution, quadrupedal mammals have perfect skeletal structure and locomotor ability, and they can travel in different kinds of complex terrains, such as plains, mountains, hills, basins, etc. The remarkable ability of quadrupedal mammals to navigate various terrains has been studied and applied to the development of quadrupedal robots [1]. In different environments, quadrupedal robots have strong adaptability, can perform tasks with limited energy, and have a certain degree of anti-interference in the movement process, which is significantly better than the traditional wheeled robots and tracked robots, so the study of biomimetic quadrupedal robots is of great significance to the future development of science and technology [2].

Generally speaking, the foot motion law is the main content in gait control. This law governs the foot gait, which refers to the relationship between the rocking motion of the leg, the supporting motion, and their relative timing. Gait specifies the different modes of foot motion, and its study is mainly to form a specific planning method to achieve a stable periodic motion of the robot. According to the balance mode, the gait of quadrupedal robots can be roughly divided into two states: static gait and dynamic gait. Therefore, when optimizing the gait of bionic quadrupedal robots, both static and dynamic situations are usually considered. Meanwhile, when people discuss the design of bionic quadrupedal robots, the focus of research is often concentrated on structural design, gait adjustment, operation algorithms, and the combination of joint rigidity and flexibility [3]. The process of gait optimization for bionic quadrupedal robots typically involves several steps. First, the mechanism of legs or feet is adjusted. Then, innovative algorithms are applied to assist the process. Finally, comprehensive simulation experiments are conducted to achieve the optimization goal. In this paper, two optimization methods on mechanisms and seven optimization methods on algorithms are presented, which optimize the local mechanisms of the robot to the visual navigation control of the robot as a whole, and their innovative ideas are validated in simulation or practice.

Although bionic quadrupedal robots have achieved some success in terms of mechanical structure and control, many challenges remain. One such challenge is addressing the end-of-foot oscillation phenomenon observed in natural animal movement, which is crucial for ensuring coherent movement and resisting ground impact [4]. For researchers, the practice of gait optimization for bionic quadrupedal robots still faces numerous challenges.

2. Gait optimization for bionic quadruped robots

Robots can't operate without both mechanics and algorithms, and for gait optimization, researchers usually improve on both as well.

2.1. Optimization of improved mechanisms

For gait control, mechanism design is an indispensable part. Quadrupedal robots have been the focus of attention for their excellent flexibility and adaptability during movement in unstructured terrain. For bionic quadruped gait, improving stability at rest is the first step in optimizing the gait, and members of Baoling Han's group used a concept called ‘stability margin’ to solve this problem. The stability margin is the vertical projection of the center of gravity of the quadrupedal robot onto the support polygon, which is at the shortest distance from the sides of the support polygon, as shown in Figure 1. The researchers found that when they measured the stability margin with triangular supports, different gait sequences formed different obtuse or acute triangles, with the latter triangles favoring robot stability. In conjunction with this discovery, the researchers designed a gait that not only does not require any rearward adjustments to the moving robot body, but also reduces energy expenditure during movement. Meanwhile, to verify the feasibility of this optimized gait, the researchers compared the optimized gait with the random gait using the simulation software Adams. This analysis shows that the optimized gait fluctuation is significantly smaller than that of the random gait, which proves the significance of the optimized gait in improving the stability of the bionic quadruped robot [5]. In addition, through this study, the researchers concluded that based on the inherent symmetry of the quadrupedal robot's body structure, the gait sequence of walking the back leg first contains the gait sequence of walking the front leg first, and the preferential movement of the back leg is able to form the support triangle formed by the acute angle, which resists some of the lateral impacts while improving the stability of the bionic quadrupedal robot.

/word/media/image1.png

Figure 1: Stability Margin [5]

Since bionic quadrupedal robots are often used for tough jobs such as transport in mountainous areas or mine detection in conflict zones, which traditional vehicles are not capable of, one needs to improve the overall gait configuration, in addition to the researchers need to make adjustments to the leg mechanism. Members of Jiu-Peng Chen's team improved the stiffness and strength of the robot by designing an anti-parallelogram leg mechanism using the linkage principle. This reduced the number of control requirements for the robot. Utilizing Denavit-Hartenberg (D-H) characteristics, the researchers examined the kinematics of each of the robot's legs. The researchers employed a laser tracker to polynomial plan the foot trajectory during the leg's swing and support phases to integrate the leg with the foot and guarantee that the foot could make no contact with the ground. The stability margin at rest was used by the team to rework the ideal stable gait at the same time. When the preceding methods were combined with simulations, the research members showed that the creative mechanism design resulted in a notable boost in the robot's performance and effective motion control with a reduced need for actuators.

Furthermore, the duty cycles — that is, the percentage of time each leg is on the ground—was optimized in the study. By increasing the duty cycle from 0.75 to 0.875, the foot of the bionic quadruped robot spends more time in contact with the ground, which optimizes the stability of the robot to some extent while mitigating the vibrations caused by collisions [6]. This study combines the design of legs and feet, which brings more possibilities for further exploration of practical applications in various difficult environments.

2.2. Optimization of improved algorithms

Algorithmic optimization is crucial for gait control. Algorithmic optimization keeps foot and leg control as its primary focus, just like mechanism-based optimization does. A foot trajectory optimization technique based on Evolutionary Computation (EC) is proposed by Jihoon Kim's research group. The study came to the conclusion that conventional gait optimization techniques typically depend on gait parameters, including swing height and step length, among others, that restrict the deformation of the foot trajectory. In order to effectively make up for the drawbacks of conventional algorithms, the study offers an optimization method based on foot position ingestion to construct an atypical search range generated by independent processes embracing each location of the foot trajectory. At the same time, the researchers designed and built an actual quadrupedal robotic walking system to verify the feasibility of the algorithm. While the algorithm was running, the robot's feet moved along a set trajectory without any deliberate control of its stance. To test the optimization results, the research team compared the optimized algorithm with the traditional one. It was found not only that the optimized trajectory reduced the stride length, but also prevented the possibility of a large swing of the robot's body. Although the two are comparable in terms of running speed, the new algorithm's adaptability has been significantly improved, which provides a new direction of thinking for gait optimization [7].

Yuliu Wang's research team shifted the goal of optimization to the robot's legs. They proposed an optimization approach. This approach allows for better control of the behavior of bionic quadrupedal robots by using Multi-Agent Reinforcement Learning (MARL) to synchronize the robot's movements. In the course of their research, they found that training with this method could overcome the shortcomings of traditional reinforcement learning and give robots more complex dynamic locomotion capabilities. For example, robots could selectively walk on only one, two, or three legs in some special situations. At the same time, the researchers found that this approach can help robots better address the difficulties of implementing Riemannian Motion Policies (RMPs) in some systems. By treating each drive component of the robot as a separate agent, researchers have greatly simplified the process of learning complex motion patterns, allowing MARL to be used to train individual robots [8].This behavior not only reduces the range of motion but also gives these particular components some autonomy and increases the range of motion flexibility in the legs. The results of the study showed that this optimization approach outperformed the traditional method in terms of training efficiency and stability, as shown in Figure 2. In conclusion, there are a lot of possible uses for the bionic quadriplegic robot that was enhanced utilizing this optimization strategy.

/word/media/image2.png /word/media/image3.png /word/media/image4.png

Figure 2: Three-legged, two-legged and one-legged stability tests [8]

Chen Ci's team, on the other hand, has a different goal than Wang Yuliu's team since they want to optimize leg algorithms to help small robots walk more naturally. These robots are distinguished by their ability to carry out tasks with less power consumption and a limited quantity of degrees of actuators (DoAs). The group suggested using two layers of optimization. The gait controller problem is solved by the lower layer using a Central Pattern Generator (CPG) in conjunction with a Deep Reinforcement Learning (DRL) algorithm. The application of the simulation results in real-world settings is facilitated by the Central Pattern Generator, which expedites the training process significantly. On the other hand, dual network Bayesian Optimization (BO) is a higher-level optimization technique. This approach significantly increases optimization efficiency by creating fundamental strategies based on prior experimental data, saving research participants from having to train new strategies in subsequent candidate patterns [9]. The study's findings demonstrate that the training gait is substantially faster than the standard gait and that the robot walks more quickly in the gait configuration when its front leg is shorter than its back leg. These findings support the algorithm's efficacy.

The capacity of the foot or leg to promptly return to normal following a fall is just as important to gait optimization in the design of bionic quadrupedal robots as its flexibility or stability. Jemin Hwangbo's team has considered these factors. Neural network strategies are trained in simulated practice and then applied to the advanced bionic quadruple ANYmal. ANYmal goes beyond previous quadrupedal robots in that it can precisely and efficiently follow high levels of body speed commands, allowing them to achieve faster running speeds. At the same time, its advanced construction allows it to recover quickly if they accidentally fall while working in harsh conditions. In the early stages of their research, they found that previous control theories were unable to resolve the uncertainty in the dynamics due to inaccuracies in the robot's analytical model, so the researchers proposed an optimal training method. This method, which they called ‘actuator network’, incorporates the classical articulated system model, which allows the robot to autonomously learn and transfer dynamic motor skills of complex leg systems, such as helping the robot to convert algorithmic commands into torque outputs. In the course of the study, to verify the feasibility of the method, the researchers compared it with previous methods and found that the optimized method not only reduced the error in executing commands by nearly 95%, but also improved the efficiency of torque and mechanical power utilization [10]. The results of this study prove that the training method can greatly improve the locomotion performance of bionic quadruped robots and make them more reliable in their work.

For practical applications of bionic quadrupedal robots, researchers have to focus on the visual features of the robot in addition to the gait of its legs. Jorge Vásquez's team took this into account by exploring how to combine the control problem of a quadrupedal robot with the navigation problem in an obstacle-filled environment. Combining the complex visual sensing techniques, the team devised a method to solve the gait optimization problem by transforming the robot motion problem into a Partially Observable Markov Decision Process (POMDP) and using Proximal Policy Optimization (PPO). This approach is called the architecture of VAE-CPG. To increase the real-world utility of the optimized robot, the approach combines a Central Pattern Generator (CPG) for motion planning and a Variational Auto-Encoder (VAE), which reduces the complexity of the action and observation space. During the study, to validate the soundness of the approach, the researchers used the Unitree Laikago robot to perform simulation experiments related to a construction site. The results of the experiments showed that by transforming the movement space of the legs into a cyclic pattern and optimizing the associated gait based on sensory feedback, the efficiency of the robot in the workplace could be improved for the experiments. In addition, to test the capabilities of the optimized robot, the researchers compared the optimized approach with PPO, which can only reach its goals in an obstacle-free environment, while the new architecture can learn to walk and avoid obstacles simultaneously in an obstacle-filled environment [11]. This study demonstrated that the architecture of VAE-CPG ensures that the robot learns in richer virtual environments. Not coincidentally, Christyan Mario Cruz's team carried out similar research to that of Jorge Vásquez's team, which also focused on how to combine the navigational capabilities of bionic quadrupedal robots with gait optimization. The team used ARTU-R (A1 Rescue Task UPM Robot), which is based on a Central Pattern Generator (CPG), to optimize the gait patterns of a quadrupedal robot working in complex terrain. This optimization method determines the parameters related to walking by simulating the gait of a dog and then adjusts the output based on a robotic neural network. Meanwhile, the visual information included in the method also helps the robot to autonomously analyze the characteristics of the terrain and the types of obstacles at the workplace so that the robot can perform its tasks more efficiently and accurately. To prove the feasibility of the method, the research team validated the findings through simulations with ROS-Gazebo, and finally applied them to real-world scenarios. From the results of the study, it can be seen that the optimization model works more than 93% efficiently on complex surfaces and has a 91% success rate in overcoming obstacles. This study demonstrates the importance of vision-based adjustment methods for gait optimization of bionic quadrupedal robots, which opens the way for subsequent sensory development based on LiDAR systems [12].

The gait optimization of the bionic quadruped robots demonstrated in the above research is based on the power provided by electric drive, however, usually, pneumatic drive is also a kind of motion power for this type of robots. So researchers need to consider gait optimization for pneumatic quadrupedal robots as well. Soofiyan Atar's team has proposed an auto-learning optimization method for pneumatic bionic quadrupedal robots using sample-efficient deep Q-learning. The method can employ minimal tuning and few trials to learn a neural network. The researchers believe that when training a quadrupedal robot with this method, not only can the use of simulators be reduced, but also a more stable gait can be obtained. In order to test this theory, the researchers conducted simulations and obtained a stable jumping gait as well as a set of periodic, synchronized locomotion gaits. At the same time, in order to explore the adaptability of the method, the researchers also tested the optimized robot on different terrains, and they eventually found that the success rate of running on a flat surface was higher than that on a steep slope [13]. Although the results of the study are not perfect, the study still shows us the feasibility of this optimization method. In the future, researchers need to do more work to analyze and improve this optimization method to contribute to the gait optimization of pneumatic bionic quadruped robots.

3. Conclusion

It is well known that when working in different environments, bionic quadrupedal robots show strong structural stability compared with ordinary robots, and their good grip can support them to work with movement on different surfaces, so bionic quadrupedal robots have a great potential for development and a wide range of applications. This paper introduces several gait optimization methods for bionic quadrupedal robots based on reinforcement learning, and explores the effects of different research methods on the work of bionic quadrupedal robots in complex terrain. Through experiments, it is found that the ultimate goal of gait optimization is invariably to improve the flexibility and adaptability of the robot at work, enhance the stability of its movement process, and reduce unnecessary wear and tear while ensuring work efficiency. At the same time, the study also proves the effective application of deep reinforcement learning on bionic quadrupedal robots, especially the overloading of innovative ideas from frameworks to simulation experiments and then to the practice of real scenarios, which provides a direction for the development of robotics technology. Currently, these optimized bionic quadrupedal robots are active in scenarios such as emergency rescue, environmental monitoring, military reconnaissance, and construction mapping. Their ability to navigate difficult terrains makes them invaluable in assisting humans in complex and high-risk environments. As bionic quadrupedal robots gain increasing attention, future research should focus on improving gait optimization strategies while integrating various technologies. This could lead to more intelligent and autonomous robots capable of adapting to diverse environments and performing a wider range of tasks, from search and rescue operations to space exploration.

References

[1]. Chen, J., Li, C., San, H., He, C., & Luo, Y. (2024). Biomimetic trajectory planning and implementation of quadruped robots based on biological motion characteristics. Advanced Engineering Sciences. https://doi.org/10.1016/j.aes.2024.07.002

[2]. Zhao, J., Gong, S., & Wang, J. (2022). Optimization of gait parameters and exploratory walking strategy for quadruped robot. Journal of Beijing Institute of Technology, 42(4), 407-414. https://doi.org/10.15918/j.tbit1001-0645.2020.215

[3]. Wang, H., Chen, M., & Cao, W. (2024). Stability study on gait control of bionic quadruped robot. Wireless Internet Technology, 21(15), 60-62.

[4]. Chen, M., Zhang, K., Wang, S., Liu, F., Liu, J., & Zhang, Y. (2020). Analysis and Optimization of Interpolation Points for Quadruped Robots Joint Trajectory. Complexity, 2020, 1-17. https://doi.org/10.1155/2020/3507679

[5]. Han, B., Luo, X., Zhao, R., Luo, Q., & Liang, G. (2019). The Optimization Algorithm for Gait Planning and Foot Trajectory on the Quadruped Robot. In Advanced Computational Methods in Life System Modeling and Simulation (pp. 1123-1132). Springer.

[6]. Chen, J. P., San, H. J., Wu, X., & Xiong, B. Z. (2021). Structural design and gait research of a new bionic quadruped robot. Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, 236(4), 236-248. https://doi.org/10.1177/0954405421995663

[7]. Kim, J., Ba, D., Yeom, H., & Bae, J. (2021). Gait Optimization of a Quadruped Robot Using Evolutionary Computation. Journal of Bionic Engineering, 18(2), 306-318. https://doi.org/10.1007/s42235-021-0026-y

[8]. Wang, Y., Sagawa, R., & Yoshiyasu, Y. (2024). Learning Advanced Locomotion for Quadrupedal Robots: A Distributed Multi-Agent Reinforcement Learning Framework with Riemannian Motion Policies. Robotics, 13(6), 86. https://doi.org/10.3390/robotics13060086

[9]. Chen, C., Xiang, P., Zhang, J., Xiong, R., Wang, Y., & Lu, H. (2023). Deep Reinforcement Learning Based Co-Optimization of Morphology and Gait for Small-Scale Legged Robot. IEEE/ASME Transactions on Mechatronics, PP, 1-12. https://doi.org/10.1109/TMECH.2023.3330427

[10]. Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., & Hutter, M. (2019). Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26), eaau5872. https://doi.org/10.1126/scirobotics.aau5872

[11]. Vásquez, J., Adams, M., & Dassori, I. (2024). Four-Legged Gait Control via the Fusion of Computer Vision and Reinforcement Learning. Information Fusion, 103(1), 101-112.

[12]. Cruz, C., Sánchez, L., Cerro, J., & Barrientos, A. (2023). Deep Learning Vision System for Quadruped Robot Gait Pattern Regulation. Biomimetics, 8(3), 289. https://doi.org/10.3390/biomimetics8030289

[13]. Atar, S., Shaikh, A., Rajpurkar, S., Bhalala, P., Desai, A., & Siddavatam, I. (2021). Gaits Stability Analysis for a Pneumatic Quadruped Robot Using Reinforcement Learning. International Journal of Robotics Research, 40(8), 976-991.

Cite this article

Wang,J. (2025). Deep Reinforcement Learning-Based Gait Optimization for Bionic Quadruped Robots. Applied and Computational Engineering,126,109-115.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 5th International Conference on Materials Chemistry and Environmental Engineering

ISBN：978-1-83558-911-3(Print) / 978-1-83558-912-0(Online)

Editor：Harun CELIK

Conference website: https://2025.confmcee.org/

Conference date: 17 January 2025

Series: Applied and Computational Engineering

Volume number: Vol.126

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[3]. Wang, H., Chen, M., & Cao, W. (2024). Stability study on gait control of bionic quadruped robot. Wireless Internet Technology, 21(15), 60-62.

[11]. Vásquez, J., Adams, M., & Dassori, I. (2024). Four-Legged Gait Control via the Fusion of Computer Vision and Reinforcement Learning. Information Fusion, 103(1), 101-112.