
Cura-LLaMA: Evaluating open-source large language Model’s question answering capability on medical domain
- 1 Institute of Shanghai Qibao Dwight High School, 3233 Hongxin Road, Minhang District, Shanghai, China
* Author to whom correspondence should be addressed.
Abstract
This paper presents the development and evaluation of “Cura-LLaMA,” an open-source large language model (LLM) tailored for the medical domain. Based on the TinyLlama model, Cura-LLaMA was fine-tuned using the PubMedQA dataset to enhance its ability to address complex medical queries. The model’s performance was compared with the original TinyLlama, focusing on its accuracy and relevance in medical question-answering tasks. Despite improvements, the study highlights challenges in using keyword detection methods for evaluation and the limitations of omitting non-essential columns during fine-tuning. The findings underscore the potential of fine-tuning open-source models for specialized applications, particularly in resource-limited settings, while pointing to the need for more sophisticated evaluation metrics and comprehensive datasets to further enhance accuracy and
Keywords
Medical LLM, PubMedQA, fine-tuning, open-source model
[1]. IBM. (n.d.). What are large language models? IBM. https://www.ibm.com/topics/large-language-models
[2]. Xu, Y., Su, D., & Zhang, L. (2023). Memory-efficient MoE for large language models: A dynamic routing framework. arXiv. https://arxiv.org/abs/2307.06435
[3]. Hong, F., Huang, T., Huang, X., & Shen, Z. (2023). Multimodal GPT-4 for multimodal machine reading comprehension. arXiv. https://arxiv.org/abs/2311.12351
[4]. Ma, Y., & Ye, W. (2023). Exploring temporal patterns in large language models for event detection. arXiv. https://arxiv.org/abs/2311.05232
[5]. News Medical. (2023, October 12). Large language models in medicine: Current limitations and future scope. News-Medical.net. https://www.news-medical.net/news/20231012/Large-language-models-in-medicine-Current-limitations-and-future-scope.aspx
[6]. Hugging Face. (2023, September 13). MedicalLLM leaderboard: Large language models applied to medical tasks. Hugging Face. https://huggingface.co/blog/leaderboard-medicalllm
[7]. Google Research. (n.d.). Med-PaLM: Pathways Language Model for Medicine. Google Research. https://sites.research.google/med-palm/
[8]. Pradhan, R., Liu, X., & Li, H. (2023). Prompting large language models for event coreference resolution. arXiv. https://arxiv.org/abs/2310.09089
[9]. 53AI. (2024, June 30). Large language models in AI development: The future of Qianyan technology. 53AI. https://www.53ai.com/news/qianyanjishu/2024063086075.html
[10]. Zhang, J. (2023). TinyLlama: README file. GitHub. https://github.com/jzhang38/TinyLlama/blob/main/README.md
[11]. AI Business. (2023, October). TinyLlama: The mini AI model with a trillion token punch. AI Business. https://aibusiness.com/nlp/tinyllama-the-mini-ai-model-with-a-trillion-token-punch
[12]. Hugging Face. (n.d.). PubMedQA dataset: A large-scale dataset for medical question answering. Hugging Face. https://huggingface.co/datasets/qiaojin/PubMedQA
[13]. PubMedQA. (n.d.). PubMedQA: A dataset for biomedical question answering. PubMedQA. https://pubmedqa.github.io
[14]. Zhang, J. (2023). TinyLlama: README file. GitHub. https://github.com/jzhang38/TinyLlama/blob/main/README.md
[15]. Meta AI. (n.d.). LLaMA 2: Open foundation and fine-tuned chat models. Meta AI. https://llama.meta.com/llama2/
[16]. Yadav, M., & Gupta, R. (2023). Cross-lingual transfer for dialogue systems using LLaMA 2 models. arXiv. https://arxiv.org/abs/2307.09288
[17]. Mistral AI. (2023, October). Announcing Mistral 7B: A powerful open-weight language model. Mistral. https://mistral.ai/news/announcing-mistral-7b/
[18]. Shah, A., & Huang, J. (2023). Efficient fine-tuning of large language models with adapters. arXiv. https://arxiv.org/abs/2310.06825
[19]. IBM. (n.d.). Fine-tuning models: The art of improving performance on specific tasks. IBM. https://www.ibm.com/topics/fine-tuning
[20]. LMSYS. (2023, March 30). Vicuna: An open-source chatbot for research purposes. LMSYS. https://lmsys.org/blog/2023-03-30-vicuna/
[21]. Wu, W., Tang, H., & Zhou, Y. (2023). Vicuna: Optimizing conversational agents using large language models. arXiv. https://arxiv.org/abs/2304.14454
[22]. Huang, X., Wu, J., & Liu, Z. (2024). Future challenges of multimodal AI systems. arXiv. https://arxiv.org/abs/2402.12749
[23]. Ma, J., & Zhang, L. (2024). Improving contextual understanding in LLMs through better pretraining methods. arXiv. https://arxiv.org/abs/2401.02385
[24]. Wu, T., Li, Z., & Chen, Y. (2023). Large language models for structured prediction: A case study. arXiv. https://arxiv.org/abs/2309.10818
[25]. Together AI. (2023, April). RedPajama: An open-source model for pretraining large language models. Together AI. https://www.together.ai/blog/redpajama
[26]. Hugging Face. (n.d.). Starcoder dataset: A large-scale dataset for code generation. Hugging Face. https://huggingface.co/datasets/bigcode/starcoderdata
Cite this article
Zhu,J. (2024). Cura-LLaMA: Evaluating open-source large language Model’s question answering capability on medical domain. Applied and Computational Engineering,90,52-60.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 6th International Conference on Computing and Data Science
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).