Research Article
Open access
Published on 26 November 2024
Download pdf
Jiang,Z. (2024). Unleashing the Potential of Compact Language Models: A Context-Optimized Soft Prompting Approach. Applied and Computational Engineering,97,31-36.
Export citation

Unleashing the Potential of Compact Language Models: A Context-Optimized Soft Prompting Approach

Zhanxu Jiang *,1,
  • 1 Institute of Artificial Intelligence, Beihang University, Beijing, China

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/97/20241352

Abstract

The field of Natural Language Processing (NLP) has seen remarkable advancements with the development of large pre-trained language models, which excel in various tasks, especially through in-context learning. However, the increasing size of these models presents significant challenges for widespread deployment, particularly in resource-constrained environments. This study introduces Context-Optimized Soft Prompts (COSP), a new approach that can improve the performance of smaller language models in few-shots learning scenarios. COSP uses information from the presentation to initialize soft prompts, effectively addressing the limitations of smaller models when performing contextual learning. COSP is evaluated on multiple tasks in the SuperGLUE benchmark and showed significant performance improvements. Results show that COSP not only enhances model performance but also generates more diverse and evenly distributed soft prompts, contributing to robust and generalizable model behavior. Additionally, COSP accelerates the training process, potentially reducing computational resources for model adaptation. By enabling smaller models to perform complex tasks competitively, COSP opens up new possibilities for deploying complex language understanding techniques in resource-constrained environments.

Keywords

Pretrained Language Models, Prompt Tuning, In-context Learning.

[1]. Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.

[2]. Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.

[3]. Liu, X., Ji, K., Fu, Y., Tam, W. L., Du, Z., Yang, Z., & Tang, J. (2021). P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602.

[4]. Liu, X., Zheng, Y., Du, Z., Ding, M., Qian, Y., Yang, Z., & Tang, J. (2023). GPT understands, too. AI Open.

[5]. Gu, Y., Han, X., Liu, Z., & Huang, M. (2021). Ppt: Pre-trained prompt tuning for few-shot learning. arXiv preprint arXiv:2109.04332.

[6]. Oymak, S., Rawat, A. S., Soltanolkotabi, M., & Thrampoulidis, C. (2023, July). On the role of attention in prompt-tuning. In International Conference on Machine Learning (pp. 26724-26768). PMLR.

[7]. Wei, C., Xie, S. M., & Ma, T. (2021). Why do pretrained language models help in downstream tasks? an analysis of head and prompt tuning. Advances in Neural Information Processing Systems, 34, 16158-16170.

[8]. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1-35.

[9]. Dong, Q., Li, L., Dai, D., Zheng, C., Wu, Z., Chang, B., ... & Sui, Z. (2022). A survey on in-context learning. arXiv preprint arXiv:2301.00234.

[10]. Su, Y., Wang, X., Qin, Y., Chan, C. M., Lin, Y., Wang, H., ... & Zhou, J. (2021). On transferability of prompt tuning for natural language processing. arXiv preprint arXiv:2111.06719.

[11]. Dai, D., Sun, Y., Dong, L., Hao, Y., Ma, S., Sui, Z., & Wei, F. (2022). Why can gpt learn in-context? language models implicitly perform gradient descent as meta-optimizers. arXiv preprint arXiv:2212.10559.

[12]. Wang, A., Pruksachatkun, Y., Nangia, N., Singh, A., Michael, J., Hill, F., ... & Bowman, S. (2019). Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems, 32.

[13]. Clark, C., Lee, K., Chang, M. W., Kwiatkowski, T., Collins, M., & Toutanova, K. (2019). BoolQ: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044.

[14]. Pilehvar, M. T., & Camacho-Collados, J. (2018). WiC: the word-in-context dataset for evaluating context-sensitive meaning representations. arXiv preprint arXiv:1808.09121.

[15]. Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N. A., Khashabi, D., & Hajishirzi, H. (2022). Self-instruct: Aligning language models with self-generated instructions. arXiv preprint arXiv:2212.10560.

Cite this article

Jiang,Z. (2024). Unleashing the Potential of Compact Language Models: A Context-Optimized Soft Prompting Approach. Applied and Computational Engineering,97,31-36.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 2nd International Conference on Machine Learning and Automation

Conference website: https://2024.confmla.org/
ISBN:978-1-83558-673-0(Print) / 978-1-83558-674-7(Online)
Conference date: 21 November 2024
Editor:Mustafa ISTANBULLU
Series: Applied and Computational Engineering
Volume number: Vol.97
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).