
The Effect of Hyperparameters on the Model Convergence Rate of Cliff Walking Problem Based on Q-Learning
- 1 Institute of Future Technology, Nanjing University of Information Science & Technology, Chengdu, 610041, China
* Author to whom correspondence should be addressed.
Abstract
With the rapid development of AI, machine learning has become a hot topic. Among them, reinforcement learning is an important branch of machine learning. With the continuous efforts of scholars, various algorithms emerge in an endless stream. Q-Learning algorithm is a very classic reinforcement learning algorithm, which is the basis of many algorithms. Basically, the Q-table is updated by iteration, so that the agent can choose the best action in the corresponding situation, so as to get closer to the optimal solution. In essence, Q-Learning is sequential difference of different strategies. In the process of learning different strategies, there are two different strategies, goal strategy and behavior strategy. In order to balance the relationship between exploration and exploitation, the ε-greedy strategy is selected to maintain a certain exploratory property of the agent, and relevant hyperparameters such as learning rate (alpha) and discount factor (gamma) are set. However, the research on Q-Learning hyperparameters is not clear enough. In this paper, the author will study the influence of Q-Learning algorithm hyperparameters on its convergence speed under a relatively simple model.
Keywords
Q-Learning, machine learning, cliff walking.
[1]. Kai A, Peter M D, Miles B, et al. 2017 Deep Reinforcement Learning: A Brief Survey. IEEE Signal Processing Magazine, 34(6), 26-38.
[2]. Zhou W, Yao X Z, Xiao Y W, et al. 2022 Atari Game Decision Algorithm Based on Hierarchical Reinforcement Learning. Information and Computers (Theory), 34(20), 97-99.
[3]. Hou J, Li H, Hu J, et al. 2017 A review of the applications and hotspots of reinforcement learning. IEEE Beijing Section, Beijing Institute of Technology (BIT), Chinese Institute of Command and Control (CICC), Proceedings of 2017 IEEE International Conference on Unmanned Systems (ICUS). the Northwestern Polytechnical University, School of Automation.
[4]. Meng X L 2024 Application of Markov chain in high school Mathematics. Mathematical, physical and chemical solution research, 24-26
[5]. Hai R, Zhang X L, Jiang Y, et al. 2024 Stable and constrained new reinforcement learning SAC algorithm. Journal of Jilin University (Information Science Edition), 42(02), 318-325.
[6]. Tang K 2023 Research on Collaborative Decision-Making Method based on multi-agent reinforcement learning. Xidian University.
[7]. Watkins C J C H and Dayan P 1992 Q-learning. Machine learning, 279-292.
[8]. Chen Y L 2023 Location and capacity determination of distributed power Supply based on Q learning. Plateau agriculture, 6, 670-678.
[9]. Song X M, Shi Z Y, Bao S P, et al. 2024 Study on Q-Learning-Cell Method for Structural topology Optimization. Journal of Railway Science and Engineering, 1-12.
[10]. Feng W Y, Ling S Y, Feng J T, et al. 2024 Joint Unloading Strategy of PC-5/Uu interface of Cellular Vehicle Networking Edge Computing System Based on Q Learning. Acta electronica, 385-395.
[11]. Ding H H 2023 Research on Intelligent Anti-jamming method of Wireless communication based on Machine learning. Nanjing University of Information Science and Technology.
Cite this article
Zhu,J. (2024). The Effect of Hyperparameters on the Model Convergence Rate of Cliff Walking Problem Based on Q-Learning. Applied and Computational Engineering,110,6-12.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of CONF-MLA 2024 Workshop: Securing the Future: Empowering Cyber Defense with Machine Learning and Deep Learning
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).