Machine learning algorithm and training in Go—Take three influential program as example

Qiao Zhang

doi:10.54254/2755-2721/6/20230845

Research Article

Open access

Published on 14 June 2023

Download pdf

Zhang,Q. (2023). Machine learning algorithm and training in Go—Take three influential program as example. Applied and Computational Engineering,6,391-399.

Export citation

Machine learning algorithm and training in Go—Take three influential program as example

Qiao Zhang *^,1,

¹ University of Science and Technology of China, 96 Jinzhai Road, Hefei City, Anhui Province, China

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/6/20230845

Abstract

In 2017, AlphaGo, an artificial intelligence in Go, beat KeJie----the No.1 Go player in 3-0, which have surprised the world, and artificial intelligence came to the attention of the public again. In this article, we take three influential artificial intelligence Go----AlphaGo, AlphaGo Zero and KataGo, as example to discuss how artificial intelligence Go work. We discuss them about their structures and training methods one by one in chronological order, which can also show the process of their development. In addition, some of the structures and training methods are enlightening to us, and we expect them can work in other fields.

Keywords

machine learning, neural network, monte Carlo tree search, AlphaGo.

View pdf

References

[1]. Persson C G A, Erjefält J S, Korsgren M, et al. 1997 The mouse trap[J]. Trends in pharmacological sciences, 18(12): 465-467.

[2]. Allis L V. 1994 Searching for solutions in games and artificial intelligence[M]. Wageningen: Ponsen & Looijen.

[3]. Van Den Herik H J, Uiterwijk J W H M, Van Rijswijck J. 2002 Games solved: Now and in the future[J]. Artificial Intelligence, 134(1-2): 277-311.

[4]. Schaeffer J. 2000 The games computers (and people) play[M]//Advances in computers. Elsevier, 52: 189-266.

[5]. Silver D, Huang A, Maddison C J, et al. 2016 ing the game of Go with deep neural networks and tree search[J]. nature, 529(7587): 484-489.

[6]. Silver D, Schrittwieser J, Simonyan K, et al. 2017 ring the game of go without human knowledge[J]. nature, 550(7676): 354-359.

[7]. Coulom R. 2006 ient selectivity and backup operators in Monte-Carlo tree search[C]//International conference on computers and games. Springer, Berlin, Heidelberg, PP72-83.

[8]. Kocsis L, Szepesvári C. 2006 Bandit based monte-carlo planning[C]//European conference on machine learning. Springer, Berlin, Heidelberg, PP282-293.

[9]. Coulom R. 2007 Computing “elo ratings” of move patterns in the game of go[J]. ICGA journal, 30(4): 198-208.

[10]. Stern D, Herbrich R, Graepel T.2006 Bayesian pattern ranking for move prediction in the game of Go[C]//Proceedings of the 23rd international conference on Machine learning. PP873-880.

[11]. Sutskever I, Nair V. 2008 Mimicking go experts with convolutional neural networks[C]//International Conference on Artificial Neural Networks. Springer, Berlin, Heidelberg, PP101-110.

[12]. Maddison C J, Huang A, Sutskever I, et al. 2014 Move evaluation in Go using deep convolutional neural networks[J]. arXiv preprint arXiv:1412.6564.

[13]. Clark C, Storkey A. 2015 Training deep convolutional neural networks to play go[C]//International conference on machine learning. PMLR, PP1766-1774.

[14]. Williams R J. 1992 Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine learning, 8(3): 229-256.

[15]. Sutton R S, McAllester D, Singh S, et al. 1999 Policy gradient methods for reinforcement learning with function approximation[J]. Advances in neural information processing systems, P12.

[16]. Schraudolph N, Dayan P, Sejnowski T J.1993 Temporal difference learning of position evaluation in the game of Go[J]. Advances in neural information processing systems, P6.

[17]. Enzenberger M. 2004 Evaluation in Go by a neural network using soft segmentation[M]//Advances in Computer Games. Springer, Boston, MA, PP97-108.

[18]. Silver D, Sutton R S, Müller M. 2012 Temporal-difference search in computer Go[J]. Machine learning, 87(2): 183-219.

[19]. Wu D J. 2019 Accelerating self-play learning in go[J]. arXiv preprint arXiv:1902.10565.

[20]. He K, Zhang X, Ren S, et al. 2016 Identity mappings in deep residual networks[C]//European conference on computer vision. Springer, Cham, PP630-645.

[21]. Silver D, Hubert T, Schrittwieser J, et al. 2018 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play[J]. Science, 362(6419): 1140-1144.

[22]. Hu J, Shen L, Sun G. 2018 Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 7132-7141

[23]. Tian Y, Zhu Y. 2015 Better computer go player with neural network and long-term prediction[J]. arXiv preprint arXiv:1511.06410.

[24]. Wu T R, Wu I C, Chen G W, et al. 2018 Multilabeled value networks for computer Go[J]. IEEE Transactions on Games, 10(4): 378-389.

Cite this article

Zhang,Q. (2023). Machine learning algorithm and training in Go—Take three influential program as example. Applied and Computational Engineering,6,391-399.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 3rd International Conference on Signal Processing and Machine Learning

Conference website: http://www.confspml.org

ISBN：978-1-915371-59-1(Print) / 978-1-915371-60-7(Online)

Conference date: 25 February 2023

Editor：Omer Burak Istanbullu

Series: Applied and Computational Engineering

Volume number: Vol.6

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).