
The evolution, applications, and future prospects of large language models: An in-depth overview
- 1 Xiamen University Malaysia
* Author to whom correspondence should be addressed.
Abstract
The evolution of natural language processing has transpired through three primary phases, with large-scale language models significantly transforming the field. These models have heightened the machine's capability to understand, produce, and interact with human language in unprecedented ways. Progressing from RNNs to transformer models, transitioning from encoder-decoder frameworks to decoder-centric designs, and the journey from BERT to the Chat-GPT series have marked significant shifts in the academic discourse. Impressively, these sophisticated models have infiltrated a range of sectors, including finance, healthcare, biology, and education, revolutionizing both traditional and emerging domains. However, as these advancements are celebrated, the ethical and economic challenges they introduce must also be addressed. Confronting these pivotal issues and harnessing technology for societal betterment has become a priority for academia and industry alike, sparking intense research endeavors in recent times. This review dives into the history of natural language processing, highlighting the pivotal developments and core principles of large language models. It provides a comprehensive perspective on their adoption and influence within the financial sector, crafting a detailed narrative of their deployment. In conclusion, the analysis reflects on the current challenges posed by these models and presents potential solutions. This study stands as a definitive guide, offering readers an in-depth understanding of the development, application, and future trajectories of large-scale language models.
Keywords
large language models, natural language processing, challenges and opportunities
[1]. Norris, J. R. (1998). Markov chains (No. 2). Cambridge university press.
[2]. Markov, A. (2006). An Example of Statistical Investigation of the Text Eugene Onegin Concerning the Connection of Samples in Chains. Science in Context, 19(4), 591-600.
[3]. Cambria, E., & White, B. (2014). Jumping NLP curves: A review of natural language processing research. IEEE Computational intelligence magazine, 9(2), 48-57.
[4]. Jarrinik, A. (1999). "From Watergate to Monica Lewinsky": Presentation at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Conference Proceedings, 1999, 4-7.
[5]. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
[6]. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[7]. Bommasani, R. , Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., ... & Liang, P. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
[8]. Lei, S., Yi, W., Ying, C. , & Ruibin, W. (2020). Review of attention mechanism in natural language processing. Data analysis and knowledge discovery, 4(5), 1-14.
[9]. Chowdhery, A. , Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., ... & Fiedel, N. (2022). Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
[10]. Shazeer, N. (2019). Fast transformer decoding: One write-head is all you need. arXiv preprint arXiv: 1911.02150.
[11]. Zhao, W. X., Zhou, K., Li, J., Tang, T. , Wang, X., Hou, Y., ... & Wen, J. R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223.
[12]. Yang, J., Jin, H. , Tang, R., Han, X., Feng, Q., Jiang, H., ... & Hu, X. (2023). Harnessing the power of llms in practice: A survey on chatgpt and beyond. arXiv preprint arXiv:2304.13712.
[13]. Zhang, J., Huang, J. , Jin, S., & Lu, S. (2023). Vision-language models for vision tasks: A survey. arXiv preprint arXiv:2304.00685.
[14]. Peters, M. E., Neumann, M. , Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365. Retrieved from https://arxiv.org/abs/1802.05365.
[15]. Radford, A., Narasimhan, K., Salimans, T. , & Sutskever, I. (2018). Improving language understanding by generative pre-training.
[16]. Han, X., Zhang, Z., Ding, N., Gu, Y., Liu, X. , Huo, Y., ... & Zhu, J. (2021). Pre-trained models: Past, present and future. AI Open, 2, 225-250.
[17]. Zhao, W. X., Zhou, K., Li, J. , Tang, T., Wang, X., Hou, Y., ... & Wen, J. R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223.
[18]. Xu, R., Luo, F., Zhang, Z., Tan, C., Chang, B., Huang, S., & Huang, F. (2021). Raise a child in large language model: Towards effective and generalizable fine-tuning. arXiv preprint arXiv:2109.05687.
[19]. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.
[20]. Yang, H., Liu, X. Y. , & Wang, C. D. (2023). FinGPT: Open-Source Financial Large Language Models. arXiv preprint arXiv:2306.06031.
[21]. Dowling, M. , & Lucey, B. (2023). ChatGPT for (finance) research: The Bananarama conjecture. Finance Research Letters, 53, 103662.
[22]. Xue, K., Zhou, Y., Ma, Z., Ruan, T., Zhang, H., & He, P. (2019, November). Fine-tuning BERT for joint entity and relation extraction in Chinese medical text. In 2019 IEEE international conference on bioinformatics and biomedicine (BIBM) (pp. 892-897). IEEE.
[23]. Wang, S., Zhao, Z., Ouyang, X. , Wang, Q., & Shen, D. (2023). Chatcad: Interactive computer-aided diagnosis on medical image using large language models. arXiv preprint arXiv:2302.07257.
[24]. López-Úbeda, Pilar, Teodoro Martín-Noguerol, and Antonio Luna. "Radiology in the era of large language models: the near and the dark side of the moon." European Radiology (2023): 1-3.
[25]. Kasneci, E. , Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., ... & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274.
[26]. Gilson, A. , Safranek, C. W., Huang, T., Socrates, V., Chi, L., Taylor, R. A., & Chartash, D. (2023). How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Medical Education, 9(1), e45312.
[27]. Kung, T. H. , Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., ... & Tseng, V. (2023). Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS digital health, 2(2), e0000198.
[28]. Xiao, C., Hu, X., Liu, Z., Tu, C. , & Sun, M. (2021). Lawformer: A pre-trained language model for chinese legal long documents. AI Open, 2, 79-84.
[29]. Roberts, J. , Lüddecke, T., Das, S., Han, K., & Albanie, S. (2023). GPT4GEO: How a Language Model Sees the World's Geography. arXiv preprint arXiv:2306.00020.
[30]. de la Rosa, J., Pozo, Á. P., Ros, S., & González-Blanco, E. (2023). ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis. arXiv preprint arXiv:2307.01387.
[31]. Ling, C., Zhao, X., Lu, J., Deng, C., Zheng, C., Wang, J., ... & Zhao, L. (2023). Beyond One-Model-Fits-All: A Survey of Domain Specialization for Large Language Models. arXiv preprint arXiv:2305.18703.
[32]. Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., ... & Mann, G. (2023). Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564.
[33]. Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063.
[34]. Zhou, J., Ke, P., Qiu, X., Huang, M., & Zhang, J. (2023). ChatGPT: Potential, prospects, and limitations. Frontiers of Information Technology & Electronic Engineering, 1-6.
[35]. Thirunavukarasu, A. J., Hassan, R., Mahmood, S., Sanghera, R., Barzangi, K., El Mukashfi, M., & Shah, S. (2023). Trialling a large language model (ChatGPT) in general practice with the Applied Knowledge Test: observational study demonstrating opportunities and limitations in primary care. JMIR Medical Education, 9(1), e46599.
[36]. Tamkin, A., Brundage, M., Clark, J., & Ganguli, D. (2021). Understanding the capabilities, limitations, and societal impact of large language models. arXiv preprint arXiv:2102.02503.
[37]. Brown, H., Lee, K., Mireshghallah, F., Shokri, R., & Tramèr, F. (2022, June). What does it mean for a language model to preserve privacy?. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (pp. 2280-2292).
[38]. Brown, H., Lee, K., Mireshghallah, F., Shokri, R., & Tramèr, F. (2022, June). What does it mean for a language model to preserve privacy?. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (pp. 2280-2292).
[39]. Zhou, H. Y., Yu, Y., Wang, C., Zhang, S., Gao, Y., Pan, J., ... & Li, W. (2023). A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics. Nature Biomedical Engineering, 1-13.
Cite this article
Li,J. (2024). The evolution, applications, and future prospects of large language models: An in-depth overview. Applied and Computational Engineering,35,234-244.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 2023 International Conference on Machine Learning and Automation
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).