The metamorphosis of machine translation: The rise of neural machine translation and its challenges

Yuduo Chen

doi:10.54254/2755-2721/43/20230815

Research Article

Open access

Published on 26 February 2024

Download pdf

Chen,Y. (2024). The metamorphosis of machine translation: The rise of neural machine translation and its challenges. Applied and Computational Engineering,43,99-106.

Export citation

The metamorphosis of machine translation: The rise of neural machine translation and its challenges

Yuduo Chen *^,1,

¹ School of Mathematics, Sichuan University

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/43/20230815

Abstract

Machine translation refers to the process of using computers to translate source language into target language, which has undergone significant transformations since its inception, with the current mainstream neural machine translation achieving satisfactory translation performance. This paper overviews the three developmental stages of machine translation: rule-based machine translation, statistical machine translation, and neural machine translation, with a focus on neural machine translation. It introduces the key models that emerged in the development process of neural machine translation, namely the recurrent neural network encoder-decoder model, recurrent neural network search model, and Transformer, and compares their strengths and limitations. Other relevant technologies and models developed alongside neural machine translation are also discussed. Addressing the current challenges of neural machine translation, the paper delves into issues of overfitting, low-resource translation, structural optimization of Transformer models, and enhancement of neural machine translation interpretability. Finally, the paper explores the prospects of applying neural machine translation to multimodal translation.

Keywords

Machine Translation, Neural Machine Translation, Attention Mechanism, Model Training, Domain Adaptability

View pdf

References

[1]. Weaver W 1952 Translation Proc. of the Conf. on Mechanical Translation (Carlsbad)

[2]. Zarechnak M 1979 The history of machine translation Trends in Linguistics: Studies and Monographs vol 11, ed Chiara Gianollo and Daniel Van Olmen (Berlin: De Gruyter Mouton) part 1 pp 3-87

[3]. Brown P F, Cocke J, Della Pietra S A, Della Pietra V J, Jelinek F, Lafferty J D, Mercer R L and Roossin P S 1990 A statistical approach to machine translation Comput. Linguistics 16(2) 79-85

[4]. Kalchbrenner N and Blunsom P 2013 October Recurrent continuous translation models Proc. of the 2013 Conf. on Empirical Methods in Natural Language Processing (Seattle: Association for Computational Linguistics) pp 1700-09

[5]. Sutskever I, Vinyals O and Le Q V 2014 Sequence to sequence learning with neural networks Adv. neural inf. process syst. 27

[6]. Bahdanau D, Cho K and Bengio Y 2015 Neural machine translation by jointly learning to align and translate 3rd Int. Conf. on Learning Representations (San Juan)

[7]. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L and Polosukhin I 2017 Attention is all you need Adv. Neural Inf. Process. Syst. 30

[8]. Devlin J, Chang M W, Lee K and Toutanova K 2019 Bert: Pre-training of deep bidirectional transformers for language understanding Proc. of NAACL-HLT (Minneapolis) pp 4171-86

[9]. Liu X, Duh K, Liu L and Gao J 2020 Very deep transformers for neural machine translation Preprint arXiv:2008.07772

[10]. Neco R P and Forcada M L 1997 June Asynchronous translations with recurrent neural nets Proc. of Int. Conference on Neural Networks (Houston) vol 4 pp 2535-40

[11]. Castano A and Casacuberta F 1997 A connectionist approach to machine translation 5th European Conf. on Speech Communication and Technology (Rhodes)

[12]. Hochreiter S 1998 The vanishing gradient problem during learning recurrent neural nets and problem solutions Int. J. Uncertain. Fuzz. 6(02) 107-16

[13]. Bengio Y, Simard P and Frasconi P 1994 Learning long-term dependencies with gradient descent is difficult IEEE Trans. Neural Netw. 5(2) 157-66

[14]. Mikolov T, Sutskever I, Chen K, Corrado G S and Dean J 2013 Distributed representations of words and phrases and their compositionality Adv in Neural Inf. Process. Syst. 26

[15]. Luong M T, Pham H and Manning C D 2015 Effective approaches to attention-based neural machine translation Proc. of the 2015 Conf. on Empirical Methods in Natural Language Processing (Lisbon) pp 1412-21

[16]. Kingma D P and Ba J 2014 Adam: A method for stochastic optimization Preprint arXiv:1412.6980

[17]. Wu Y et al. 2016 Google's neural machine translation system: Bridging the gap between human and machine translation Preprint arXiv:1609.08144

[18]. Gehring J, Auli M, Grangier D, Yarats D and Dauphin Y N 2017 July Convolutional sequence to sequence learning In Int. Conf. on Machine Learning pp 1243-52

[19]. Gū J, Shavarani H S and Sarkar A 2018 Top-down tree structured decoding with syntactic connections for neural machine translation and parsing Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing pp 401-13

[20]. Wang T and Xiong D 2022 Enhancing Neural Machine Translations with Pre-Defined Bilingual Pairs J. Chin. Inf. Process 6(36)

[21]. Miao G, Liu M, Chen Y, Xu j, Zhang Y and Feng W 2022 Incorporating Clause Alignment Knowledge into Chinese-English Neural Machine Translation Acta Sci. Nat. Univ. Pekin 58(1) 8

[22]. Zhu J, Yang F, Yu Z, Zou X and Zheng Z 2022 Low Resource Neural Machine Translation with Enhanced Representation of Rare Words J. Chin. Inf. Process 6(36)

[23]. Szegedy C, Vanhoucke V, Ioffe S, Shlens J and Wojna Z 2016 Rethinking the inception architecture for computer vision Proc. of the IEEE Conf. on computer vision and pattern recognition pp 2818-26

[24]. Hinton G E, Srivastava N, Krizhevsky A, Sutskever I and Salakhutdinov R R 2012 Improving neural networks by preventing co-adaptation of feature detectors Preprint arXiv:1207.0580

[25]. Fan A, Grave E and Joulin A 2019 September Reducing transformer depth on demand with structured dropout Int. Conf. on Learning Representations (New Orleans)

[26]. Sennrich R, Haddow B and Birch A 2015 Improving neural machine translation models with monolingual data Proc. of the 54th Annu. Meeting of the Association for Computational Linguistics (Berlin) vol 1 pp 86-96

[27]. Lample G, Conneau A, Denoyer L and Ranzato M A 2018 Unsupervised machine translation using monolingual corpora only Int. Conf. on Learning Representations (Vancouver)

[28]. Artetxe M, Labaka G, Agirre E and Cho K 2018 February Unsupervised neural machine translation Int. Conf. o Learning Representations (Vancouver)

[29]. Ruiter D, Espana-Bonet C and van Genabith J 2019 July Self-supervised neural machine translation Proc. of the 57th Annu. Meeting of the Association for Computational Linguistics (Florence) pp 1828-34

[30]. Zoph B and Le Q V 2016 Neural architecture search with reinforcement learning Int. Conf. on Learning Representations (San Francisco)

[31]. Gu J, Hassan H, Devlin J and Li V O 2018 Universal neural machine translation for extremely low resource languages Proc. of NAACL-HLT 2018 (New Orleans) vol 1 pp 344-54

[32]. Cheng Y, Yang Q, Liu Y, Sun M and Xu W 2019 Joint training for pivot-based neural machine translation Proc. of the 26th Int. Joint Conf. on Artificial Intelligence (Melbourne) vol 1 pp 41-54

[33]. Leng Y, Tan X, Qin T, Li X Y and Liu T Y 2019 July Unsupervised pivot translation for distant languages Proc. of the 57th Annu. Meeting of the Association for Computational Linguistics (Florence) pp 175-83

[34]. Shaw P, Uszkoreit J and Vaswani A 2018 Self-attention with relative position representations Proc. of NAACL-HLT 2018 (New Orleans) pp 464-68

[35]. Gulati A et al. 2020 Conformer: Convolution-augmented transformer for speech recognition Interspeech (Shanghai)

[36]. Qiu J, Ma H, Levy O, Yih S W T, Wang S and Tang J 2020 Blockwise self-attention for long document understanding Findings of the Association for Computational Linguistics: EMNLP 2020 pp 2555-65

[37]. Kitaev N, Kaiser L and Levskaya A 2019 September Reformer: The efficient transformer Int. Conf. on Learning Representations (New Orleans)

[38]. Li J, Xiong D, Tu Z, Zhu M, Zhang M and Zhou G 2017 Modeling source syntax for neural machine translation Proc. of the 55th Annu. Meeting of the Association for Computational Linguistics (Vancouver) vol 1 pp 688-97

[39]. Eriguchi A, Hashimoto K and Tsuruoka Y 2016 August Tree-to-sequence attentional neural machine translation Proc. of the 54th Annu. Meeting of the Association for Computational Linguistics (Berlin) pp 823-33

[40]. Chen H, Huang S, Chiang D and Chen J 2017 Improved neural machine translation with a syntax-aware encoder and decoder Preprint arXiv:1707.05436

[41]. Aharoni R and Goldberg Y 2017 July Towards string-to-tree neural machine translation Proc. of the 55th Annu. Meeting of the Association for Computational Linguistics (Vancouver) vol 2 pp 132-40

[42]. Wu S, Zhang D, Yang N, Li M and Zhou M 2017 July Sequence-to-dependency neural machine translation Proc. of the 55th Annu. Meeting of the Association for Computational Linguistics (Vancouver) vol 1 pp 698-707

[43]. Voita E, Talbot D, Moiseev F, Sennrich R and Titov I 2019 Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned Proc. of the 57th Annu. Meeting of the Association for Computational Linguistics (Florence) pp 5797-808

[44]. Dalvi F, Durrani N, Sajjad H, Belinkov Y, Bau A and Glass J 2019 July What is one grain of sand in the desert? analyzing individual neurons in deep nlp models Proc. of the AAAI Conf. on Artificial Intelligence vol 33 pp 6309-17

[45]. Raganato A and Tiedemann J 2018 An analysis of encoder representations in transformer-based machine translation Proc. of the 2018 EMNLP workshop BlackboxNLP: analyzing and interpreting neural networks for NLP pp 287-97

Cite this article

Chen,Y. (2024). The metamorphosis of machine translation: The rise of neural machine translation and its challenges. Applied and Computational Engineering,43,99-106.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 2023 International Conference on Machine Learning and Automation

Conference website: https://2023.confmla.org/

ISBN：978-1-83558-311-1(Print) / 978-1-83558-312-8(Online)

Conference date: 18 October 2023

Editor：Mustafa İSTANBULLU

Series: Applied and Computational Engineering

Volume number: Vol.43

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).