Curiosity-Driven Multi-Level Intrinsic Reward DQN for Enhanced Exploration in Reinforcement Learning

Zheyuan Cao; Jiyu Jiang; Hengyan Liu

doi:10.54254/2755-2721/103/20241121

Research Article

Open access

Published on 8 November 2024

Download pdf

Cao,Z.;Jiang,J.;Liu,H. (2024). Curiosity-Driven Multi-Level Intrinsic Reward DQN for Enhanced Exploration in Reinforcement Learning. Applied and Computational Engineering,103,142-151.

Export citation

Curiosity-Driven Multi-Level Intrinsic Reward DQN for Enhanced Exploration in Reinforcement Learning

Zheyuan Cao ¹, Jiyu Jiang ², Hengyan Liu *^,3,

¹ School of Advanced Technology, Xi'an Jiaotong-Liverpool University, Suzhou, China
² School of Artificial Intelligence and Advanced Computing, Xi'an Jiaotong-Liverpool University, Suzhou, China
³ School of Artificial Intelligence and Advanced Computing, Xi'an Jiaotong-Liverpool University, Suzhou, China

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/103/20241121

Abstract

Reinforcement learning (RL) has shown great potential in solving complex decision-making tasks. However, efficient exploration remains a significant challenge, particularly in environments with sparse or deceptive rewards. This paper introduces DQN-Mult-Cur, a novel reinforcement learning algorithm that enhances exploration by integrating curiosity-driven and multi-level intrinsic rewards within the Deep Q-Network (DQN) framework. The proposed method addresses the limitations of conventional exploration strategies by incentivizing agents to explore novel and meaningful states, thereby improving learning efficiency and performance. Extensive experiments across three standard environments—CartPole-v1, MountainCar-v0 and Acrobot-v1—demonstrate that DQN-Mult-Cur outperforms traditional DQN variants, achieving faster convergence, higher rewards, and greater stability. An ablation study further highlights the importance of each intrinsic reward component, confirming the robustness of the proposed approach. The results suggest that DQN-Mult-Cur offers a comprehensive solution to the exploration-exploitation trade-off in reinforcement learning, making it applicable to a wide range of challenging environments.

Keywords

Reinforcement Learning, Deep Q-Network (DQN), Curiosity-Driven Exploration, Multi-Level Intrinsic Reward.

View pdf

References

[1]. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. MIT Press.

[2]. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., & Ostrovski, G. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.

[3]. Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237-285.

[4]. Osband, I., Blundell, C., Pritzel, A., & Van Roy, B. (2016). Deep exploration via bootstrapped DQN. In Advances in Neural Information Processing Systems (pp. 4026-4034).

[5]. Bellemare, M. G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. (2016). Unifying count-based exploration and intrinsic motivation. In Advances in Neural Information Processing Systems (pp. 1471-1479).

[6]. Oudeyer, P. Y., Kaplan, F., & Hafner, V. V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation.

[7]. Schmidhuber, J. (1991). A possibility for implementing curiosity and boredom in model-building neural controllers. In Proceedings of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats (pp. 222-227).

[8]. Berlyne, D. E. (1960). Conflict, arousal, and curiosity. McGraw-Hill Book Company.

[9]. Gopnik, A., Meltzoff, A. N., & Kuhl, P. K. (1999). The scientist in the crib: What early learning tells us about the mind. William Morrow & Co.

[10]. Pathak, D., Agrawal, P., Efros, A. A., & Darrell, T. (2017). Curiosity-driven exploration by self-supervised prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 16-17).

[11]. Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., & Efros, A. A. (2018). Exploration by random network distillation. In International Conference on Learning Representations.

[12]. Houthooft, R., Chen, X., Isola, P., Stadie, B. C., Wolski, F., Ho, J., & Abbeel, P. (2016). VIME: Variational information maximizing exploration. In Advances in Neural Information Processing Systems (pp. 1109-1117).

[13]. Singh, S., Barto, A. G., & Chentanez, N. (2004). Intrinsically motivated reinforcement learning. In Advances in Neural Information Processing Systems (pp. 1281-1288).

[14]. Stadie, B. C., Levine, S., & Abbeel, P. (2015). Incentivizing exploration in reinforcement learning with deep predictive models. In International Conference on Learning Representations.

[15]. Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., & Efros, A. A. (2018). Large-scale study of curiosity-driven learning. In International Conference on Learning Representations.

[16]. Achiam, J., Edwards, H., Amodei, D., & Abbeel, P. (2017). Surprise-based intrinsic motivation for deep reinforcement learning. In Advances in Neural Information Processing Systems.

[17]. Stanton, C., Tachet des Combes, R., Wang, G., Roberts, M., Mozer, M. C., Cho, K., & Bengio, Y. (2021). RL-Square: Decoupling strategy and reward for generalization in reinforcement learning. In International Conference on Learning Representations.

[18]. Frank, M., Leitner, D., Zambanini, S., & Vincze, M. (2014). Curiosity-driven exploration for knowledge discovery. Autonomous Robots, 37(1), 87-104.

[19]. Eysenbach, B., Gupta, A., Ibarz, J., & Levine, S. (2018). Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations.

[20]. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., & Zaremba, W. (2017). Hindsight experience replay. In Advances in Neural Information Processing Systems (pp. 5048-5058).

[21]. Aubret, A., Matignon, L., & Hassas, S. (2019). A survey on intrinsic motivation in reinforcement learning. arXiv preprint arXiv:1908.06976.

[22]. Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990-2010). IEEE Transactions on Autonomous Mental Development, 2(3), 230-247.

[23]. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI gym. arXiv preprint arXiv:1606.01540.

[24]. Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multi-armed bandit problem. Machine Learning, 47(2), 235-256.

[25]. Agrawal, S., & Goyal, N. (2012). Analysis of Thompson sampling for the multi-armed bandit problem. In Conference on Learning Theory (pp. 39-1). PMLR.

[26]. Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1).

[27]. Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized experience replay. In International Conference on Learning Representations.

[28]. Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., & De Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. In International Conference on Machine Learning (pp. 1995-2003). PMLR.

Cite this article

Cao,Z.;Jiang,J.;Liu,H. (2024). Curiosity-Driven Multi-Level Intrinsic Reward DQN for Enhanced Exploration in Reinforcement Learning. Applied and Computational Engineering,103,142-151.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 2nd International Conference on Machine Learning and Automation

Conference website: https://2024.confmla.org/

ISBN：978-1-83558-695-2(Print) / 978-1-83558-696-9(Online)

Conference date: 12 January 2025

Editor：Mustafa ISTANBULLU

Series: Applied and Computational Engineering

Volume number: Vol.103

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).