Research on the intersection of natural language processing and deep learning

Hanwen Cai

doi:10.54254/2755-2721/42/20230685

1. Introduction

Artificial Intelligence (AI) is a field of study in computer science that involves the use of computers to simulate human thought processes to be able to handle complex tasks in different domains. AI is a field that covers a very wide range, involving research and applications in many aspects, so the definition of AI is not unique. Generally speaking, the definition of AI can be summarized into the following four aspects: systems that think like humans, systems that act like humans, systems that think rationally, and systems that act rationally [1].

1.1. A Brief History of Artificial Intelligence

Early theoretical stage (1950s): The earliest concept of artificial intelligence was proposed by computer scientist Alan Turing, who proposed the "Turing Test" in "Computing Machinery and Intelligence" published in 1950 to judge whether a machine has human Intelligence has triggered researchers to explore how to use machines to simulate human thinking [2].

The initial development period (1956-early 1960s): After the Dartmouth Conference in 1956, artificial intelligence ushered in a period of development and made some breakthroughs such as the concept of machine learning, the first industrial robot The advent of the birth of the programming language Lisp and so on. At this stage, science also ambitiously proposed four major predictions about AI [2].

Development stagnation period (early 1960s-early 1970s): After the enthusiasm in the early stage of development, limited by the limitations of science and technology at that time, the development of AI entered a trough.

Application development period (early 1970s to mid-1980s): With the emphasis on scientific research in various countries, AI has returned to the public's vision, and it has gradually moved from theoretical research to practice.

Downturn period (mid-1980s-mid-1990s): Due to the limitations of technology at that time, AI could still not handle complex tasks. People's expectations are too high, and with the advent of the PC, a large amount of funds have been withdrawn, and artificial intelligence has officially ushered in a cold winter [3].

Steady development period (mid-1990s-2010): During this period, network technology and computer technology developed rapidly, which brought great development to the development of AI. Machine learning has become the main research direction in the AI field and has made great progress.

Booming development period (2010-present): Around 2010, with the vigorous development of computer technology, deep learning achieved a breakthrough [4]. It has achieved remarkable results in many fields and set off another upsurge in the development of artificial intelligence. In recent years, reinforcement learning and autonomous intelligence have gradually become popular, providing more possibilities for the development of AI.

Since the development of AI, it has been applied in many fields, such as natural language processing, computer vision, machine learning, deep learning, reinforcement learning, healthcare, finance, manufacturing, transportation, etc.. It will continue to bring Innovation and change. The wide application of deep learning techniques has made great progress in many NLP tasks, continuously improving the ability of computers to understand and generate human natural language. This article will additionally present the ongoing progress of natural language processing and the impact of deep learning on natural language processing.

1.2. Two Methods in Artificial Intelligence

Natural Language Processing (NLP) constitutes a crucial segment of artificial intelligence, aspiring to employ computers for the recognition, comprehension, and generation of human natural language [5]. It covers a wide range of fields, from the most basic text processing to advanced semantic analysis. The purpose is to enable computers to help humans deal with various complex tasks in text or speech so that computers can interact effectively and naturally with humans.

Deep Learning is an algorithm that focuses on learning data representations within machine learning. It builds a multi-level neural network model by imitating the neural network of the human brain and has good processing and learning capabilities in the face of large-scale data [6]. It can help people deal with complex data and tasks. In the recent period, the advancement of hardware and software technology has further facilitated the development of DL, and the deeper application of DL to the field of NLP will bring greater progress to NLP.

The topic of NLP is in section 2 of the article, which introduces the definition (Section 2-A), classification (Section 2-B) and application (Section 2-C) of NLP. The topic of DL is in Section 3, which briefly introduces DL (Section 3-A) and the impact of DL on some fields in NLP (Section 3-B). Section 4 will be the discussion part, which will involve the limitations of NLP and DL's current development and the imagination of future development.

2. Definition and Applications in Natural Language Processing

2.1. Definition of NLP

Natural language processing (NLP), also known as computer linguistics, is an important branch of the development of computing technology today. It is an interdisciplinary subject that combines artificial intelligence, computer science and technology, and linguistics. Natural language processing can recognize and analyze human language and translate it into a format comprehensible by machines, thereby realizing a series of tasks and applications similar to human language processing. The main process of NLP can be divided into two key facets: natural language understanding (NLU) and natural language generation (NLG) [7]. NLP recognizes semantics, which is closer to NLU. With the continuous development of technology. Researchers aim to empower computers with the capability to produce natural language, leading to the inception of NLG. However, achieving comprehensive NLU remains the ultimate objective of NLP [5].

Based on technical distinctions, NLP can be categorized into two types: rule-based and statistics-based approaches. Rule-based NLP is based on artificially formulating a series of rules and standards. It has a good effect in dealing with structured or certain tasks in specific fields, but it cannot deal with the hugeness, unlimitedness, and ambiguity of natural language. And it needs to consume a lot of human resources. Statistical NLP can process more complex and large-scale texts by learning a large amount of real data and using statistical models, neural networks, deep learning, and other technologies to complete tasks. However, Data-based NLP and Statistical NLP are complementary, and mixing the two can lead to better performance [8].

2.2. Applications in NLP

Natural language processing finds applications in various text and speech tasks. Its primary applications in text information processing include:

1) Information Retrieval: IR can match the content in the document library according to the internal keywords or sentences that people query and get related text or text fragments.

2) Information Extraction: The object of the IE system is to assist individuals in identifying, marking, and extracting key information when faced with a large amount of text information so as to obtain information that people are interested in;

3) Machine Translation: Machine translation stands as one of the earliest and most emblematic applications within the realm of NLP. It uses a series of algorithms and models to realize the function of translating a language into a target language; today's machine translation can easily translate every word, but the consistency of sentence meaning before and after translation cannot be guaranteed.

4) Text Classification: Text Classification can classify the input text according to the predetermined category;

5) Text Generation: Many NLP-related tasks need to generate human-like language texts, such as dialogue systems, which require machines to be able to communicate with humans who are talking. Machines not only need to complete tasks assigned by humans but also need to be able to express the result of the task;

6) Summarization: When faced with a large amount of text, NLP can identify, extract, and summarize the text. Here, two types of Summarization can be derived. One is simple text summarization, which only involves simplification of the text, and proposing keywords The other is understanding the content of the text and even using words outside the text to summarize the text;

7) Question Answering: QA can give accurate and direct answers corresponding to the questions according to the questions raised by people; this is different from IR, which returns a series of documents related to user queries, while QA combines IR and Summarization, extract key information from the documents obtained from the query and respond to the user with an accurate answer [9].

8) Dialogue System: Dialogue Systems can realize the natural language interaction between humans and computers, simulate the natural dialogue between human beings, and cover many fields in NLP[5]. It can be divided into two types: domain-specific which can only answer specific topics, and the other is open-ended which can be used to answer various topics.

3. Definition and Impact of Deep Learning in NLP

3.1. Definition of DL

Deep learning, as an important branch in machine learning, is a method based on deep neural networks, which aims to realize feature learning and representation of data through neural network structure [4]. Deep learning abandons manual labeling and completes it in an unsupervised manner. The pre-training process is more in line with how the human cerebral cortex works [6]. With the robust advancement of deep learning in recent years, it has achieved excellent results in computer vision, pattern recognition, and other fields. Under such a trend, we can pay more attention to using deep learning to improve the ability of computers to process natural language. The following lists several accomplishments in deep learning that can be employed across the spectrum of NLP (core domains as well as applications).

3.2. The impact of DL on NLP

1) Word vector representation: Word vector, also known as word embedding[5], is a representation method in NLP[10]. The preferred expression for learning. Through word vectors, we can generalize to distributed representations, which can express not only words but also phrases, sentences, documents, etc. However, deep learning-based models invariably represent their words, phrases, and even sentences using these embeddings [11]. Such representations can bring better results for NLP. Word vector representation techniques based on deep learning, such as Word2Vec and GloVe[10], facilitate the conversion of high-dimensional discrete vocabulary representations into compact, low-dimensional continuous vector spaces. This approach mitigates the dimensionality challenge faced by traditional One-hot Representation when addressing specific problems[6]. This method of mapping to continuous space also enables the model to better understand the meaning between words and then combine contextual information to obtain higher-quality word vector expressions.

2) Language Modeling: Language Modeling (LM) is a task central to NLP and Language Understanding [9]. Before deep learning was involved in NLP, LM faced many challenges. For example, the N-Grams Model with The increase in the number of vocabulary in the thesaurus and the increase in the model parameter index, will face the disaster of dimensionality[8], which cannot be combined with the decrease in contextual relevance, the decrease in the predictability of the model, and the large vocabulary ambiguity. As deep learning has advanced, Neural Networks have swiftly gained widespread usage within language modeling. For example, Convolutional Neural Networks (CNN)[12] can capture the local features of the text to better understand the context and can also be combined with the maximum pooling to understand the rich meaning of the sentence and understand the text at a deeper level. Recurrent Neural Networks is also an important deep learning model. It benefits from its internal cyclic connection, and each step depends on the calculation and results of the previous step as if it has a kind of "memory" [10]. Such a feature allows him to capture the context of a long distance, show the dependence between words in the article, and greatly improve the prediction performance of the model based on history. At the same time, Long Short-Term Memory (LSTM)[9], a variant of RNN, introduces a "gate control" mechanism, which is specially used to solve the gradient disappearance and explosion problems of traditional RNN and can complete the task of generating natural language, which has great potential. It is worth mentioning that CNN and RNN have different performances in different tasks, and appropriate models should be selected for application based on different tasks.

3) Text Generation: Text Generation is one of the NLP applications mentioned above. DL helps to achieve some text generation tasks with little or no input data to transform. Examples include generating poems, jokes, and stories[9]. In Text Generation, RNN and LSTM can generate logically more reasonable and coherent text. Using generative adversarial networks (GANs), it is possible to measure the human fit of generated text [9], making the generated text more realistic. Using the Transformer model, its attention mechanism can be used to obtain a more closely related context. You can also use Reinforcement Learning (RL)[13]. This model uses the agent to interact with the environment and learn to achieve the goal of maximizing rewards to train the text, which can make the text more accurate, natural, and personalized. It can also be used to create the poems mentioned above, jokes, stories, and other creative texts.

4) Machine Translation: MT is also one of the NLP applications mentioned above. Introduce Encoder-decoder models in MT, which can be used to process Seq2seq tasks[14]. The encoder maps the words in the sentence to the high-level vector space so that these vectors can capture the semantic information of the input sentence, and the decoder uses the encoded sentence vector to realize the translation of the target language. We can also use pre-training techniques in DL, such as Generative Pre-trained Transformer (GPT)[15]. These techniques can be used to initialize the encoder or decoder of the translation model. to achieve better performance. Such an introduction can make machine translation more accurate and flexible. Using the distributed representation mentioned above, the relationship between words and semantics can be captured, and problems such as lexical ambiguity encountered in machine translation can be solved. We can also introduce models such as RNN and Transformer and use them to capture the relationship between the source language and the target language to further improve the accuracy of machine translation[16].

4. Conclusion

NLP has undoubtedly made amazing progress in the past few decades, and it is one of the most popular directions in the current AI field. Integrating neural networks and deep learning into NLP has led to substantial enhancements in training, learning, and overall performance of NLP tasks. But today's NLP and DL are imperfect, and they continue to show some disadvantages and limitations while making significant progress. Nowadays, most of the research on NLP is based on English, but there are thousands of languages in the world, and NLP cannot master such a large and complex language system [9]. At the same time, this problem also involves a major shortcoming of deep model-based NLP, that is, deep learning requires a large amount of data, and it is still unknown whether it can master the language when it faces some languages with insufficient samples. Although one of the main purposes of NLP is to understand natural language, there are still many deficiencies in the ability of NLP to understand and reason about context. The current NLP also has a lot of deviations in the mastery of some common sense, which also affects its understanding of natural language [7]. The security of NLP also needs to be strengthened. Attackers can affect the model through minor changes, which will bring many hidden dangers to the application of NLP in the future. Deep learning also relies on a large amount of data. When data is scarce, the deep learning model may be over-fitting or under-fitting, and the generalization ability is insufficient. At the same time, the deep learning model needs to use high-performance hardware and a large amount of energy, which will consume a large amount of computing resources and bring higher operating costs [4].

In future research and development, we will make more comprehensive improvements to NLP and DL based on current deficiencies. For NLP, few samples and zero samples []can be developed, and better models can be used to deal with small and scarce data resources. Multimodal NLP[] is also an important development direction. The NLP model combining text, voice, and image can understand natural language more comprehensively and bring richer and more practical applications. NLP should also be extended to more languages and even cultures to improve its degree of personalization to meet the needs of more users and contribute to global communication and progress. For DL, the generalization ability and learning ability of the model should be further improved, focusing on the development of small-sample and zero-sample learning to improve the accuracy of the model prediction. At the same time, DL should further improve the interpretability and fairness of the model, strengthen the model's understanding of the model's decision-making process, eliminate bias in the training data, and avoid the generation of model bias. DL should also focus on optimizing its consumption of computing resources, designing more lightweight and concise models, using more pre-trained models and multi-distribution learning, and adding energy-conscious training strategies to further reduce energy consumption [15].

In summary, with the common progress of algorithms and hardware performance, NLP and DL will usher in more vigorous development[9], and combine to bring more advanced and intelligent applications. At the same time, it also provides more possibilities for the development of AI in various fields and promotes AI to truly enter reality.

References

[1]. Kok J N, Boers E J, Kosters W A, et al. (2009) Artificial intelligence: definition, trends, techniques, and cases[J]. Artificial intelligence, 1: 270-299.

[2]. Haenlein M, Kaplan A. (2019) A brief history of artificial intelligence: On the past, present, and future of artificial intelligence[J]. California management review, 61(4): 5-14.

[3]. Muthukrishnan N, Maleki F, Ovens K, et al. (2020) Brief history of artificial intelligence[J]. Neuroimaging Clinics, 30(4): 393-399.

[4]. LeCun Y, Bengio Y, Hinton G. (2015) Deep learning[J]. nature, 521(7553): 436-444.

[5]. Liddy E D. (2001) Natural language processing[J].

[6]. Xuefeng Xi, Guodong Zhou.(2016) Research on Deep Learning for Natural Language Processing [J]. Acta Automatica Sinica, 42(10): 1445-1465.

[7]. Khurana D, Koli A, Khatter K, et al. (2023) Natural language processing: State of the art, current trends and challenges[J]. Multimedia tools and applications, 82(3): 3713-3744.

[8]. Nadkarni P M, Ohno-Machado L, Chapman W W. (2011) Natural language processing: an introduction[J]. Journal of the American Medical Informatics Association, 18(5): 544-551.

[9]. D. W. Otter, J. R. Medina and J. K. Kalita, (2021) "A Survey of the Usages of Deep Learning for Natural Language Processing," in IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 2, pp. 604-624, doi: 10.1109/TNNLS.2020.2979670.

[10]. T. Young, D. Hazarika, S. Poria and E. Cambria, (2018) "Recent Trends in Deep Learning Based Natural Language Processing [Review Article]," in IEEE Computational Intelligence Magazine, vol. 13, no. 3, pp. 55-75, doi: 10.1109/MCI.2018.2840738.

[11]. Pennington J, Socher R, Manning C D. (2014) Glove: Global vectors for word representation[C]//Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP): 1532-1543.

[12]. Yin W, Schütze H. (2015) Convolutional neural network for paraphrase identification[C]//Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 901-911.

[13]. Jiang Y, Jiang Z P. (2017) Robust adaptive dynamic programming[M]. John Wiley & Sons.

[14]. Sutskever I, Vinyals O, Le Q V. (2014) Sequence to sequence learning with neural networks[J]. Advances in neural information processing systems, 27.

[15]. Mathew A, Amudha P, Sivakumari S. (2021) Deep learning techniques: an overview[J]. Advanced Machine Learning Technologies and Applications: Proceedings of AMLTA 2020,: 599-608.

[16]. Jozefowicz R, Vinyals O, Schuster M, et al. (2016) Exploring the limits of language modeling[J]. arXiv preprint arXiv:1602.02410.

Cite this article

Cai,H. (2024). Research on the intersection of natural language processing and deep learning. Applied and Computational Engineering,42,61-66.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 2023 International Conference on Machine Learning and Automation

ISBN：978-1-83558-309-8(Print) / 978-1-83558-310-4(Online)

Editor：Mustafa İSTANBULLU

Conference website: https://2023.confmla.org/

Conference date: 18 October 2023

Series: Applied and Computational Engineering

Volume number: Vol.42

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).