
Comparison of Convergence for Different Models and Loss Function: Evidence from Transformer, Diffusion and RNN
- 1 School of Mathematics and Physics, Xi’an Jiaotong-Liverpool University, SuZhou, China
* Author to whom correspondence should be addressed.
Abstract
With the development of science and technology, people are using machine learning as a method to make life easier. However, the performance corresponding to different models will produce different predictions and results for one thing. In order to improve the performance of the model, this paper investigates the influence of the learning rate hyperparameter on the model's performance, and demonstrates it through the convergence of the loss function. After experimental research, it has been found that different models exhibit significant differences in their performance when processing the same dataset. Meanwhile, different learning rates also have a significant impact on the performance of the model. Therefore, after selecting the correct model for machine learning, one should also adjust a relatively good hyperparameter to make the entire process smoother. Based on the analysis, one will gain a basic understanding of the optimal learning rates for transformer, diffusion, and RNN models when training MNIST. It is convenient for people to set better hyperparameters and obtain better prediction and decision-making results when using these three models, so that one can demonstrate better performance when using these three models.
Keywords
Machine learning, loss function, learning rate, convergence.
[1]. Zhong S, Zhang K, Bagheri M, et al 2021 Machine learning: new ideas and tools in environmental science and engineering Environmental science and technology vol 55(19) pp 12741-12754
[2]. Bi Q, Goodman K E, Kaminsky J and Lessler J 2019 What is machine learning? A primer for the epidemiologist American journal of epidemiology vol 188(12) pp 2222-2239
[3]. Shah P, Kendall F, Khozin S, et al 2019 Artificial intelligence and machine learning in clinical development: a translational perspective NPJ digital medicine vol 2(1) p 69
[4]. Vamathevan J, Clark D, Czodrowski P et al 2019 Applications of machine learning in drug discovery and development Nature reviews Drug discovery vol 18(6) pp 463-477
[5]. Kim S J, Cho K J and Oh S 2017 Development of machine learning models for diagnosis of glaucoma PloS one vol 12(5) p e0177726
[6]. Choi R Y, Coyner A S, Kalpathy-Cramer J, Chiang M F and Campbell J P 2020 Introduction to machine learning neural networks and deep learning Translational vision science and technology vol 9(2) pp 14-14
[7]. Carleo G, Cirac I, Cranmer K et al 2019 Machine learning and the physical sciences Reviews of Modern Physics vol 91(4) p 045002
[8]. Morgan D and Jacobs R 2020 Opportunities and challenges for machine learning in materials science Annual Review of Materials Research vol 50(1) pp 71-103
[9]. Athey S and Imbens G W 2019 Machine learning methods that economists should know about Annual Review of Economics vol 11(1) pp 685-725
[10]. Sun S, Cao Z, Zhu H and Zhao J 2019 A survey of optimization methods from a machine learning perspective IEEE transactions on cybernetics vol 50(8) pp 3668-3681
[11]. Vishnu V K and Rajput D S 2020 A review on the significance of machine learning for data analysis in big data Jordanian Journal of Computers and Information Technology vol 6(1)
[12]. Paullada A, Raji I D, Bender E M, Denton E and Hanna A 2021 Data and its (dis) contents: A survey of dataset development and use in machine learning research Patterns vol 2(11)
Cite this article
Fang,X. (2024). Comparison of Convergence for Different Models and Loss Function: Evidence from Transformer, Diffusion and RNN. Applied and Computational Engineering,82,173-180.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 2nd International Conference on Machine Learning and Automation
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).