Research Article
Open access
Published on 21 February 2025
Download pdf
Liang,J.Z. (2025). Constructing a mental health analysis system for social media using large language models. Advances in Engineering Innovation,16(1),13-22.
Export citation

Constructing a mental health analysis system for social media using large language models

Jinming Zion Liang *,1,
  • 1 Sendelta International Academy

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2977-3903/2025.21189

Abstract

As social media platforms have penetrated every aspect of people’s lives, many mental health problems have also arisen along with them. In today’s digital age, analyzing mental health trends through these platforms has become critical. In this study, we present a system designed to identify the mental health trends of Weibo users by extracting and analyzing the content posted by users on Weibo, China’s leading social media platform. The system is mainly composed of two parts: a data acquisition module and an analysis module. The data collection module uses the Python-based web scraping tool Scrapy to scrape comments from popular topics on Weibo. At the heart of the analysis module is a large language model fine-tuned from a psychological database. The module assesses the topic and specific content of the posts, scoring comments based on criteria such as positivity, alignment with mood disorders, and potential signs of psychoactive substance use. This data is stored and mediated using the relational database MySQL, and then analyzed and visualized using advanced data analysis tools. Through this method, we can timely and comprehensively monitor the mental health status of social media platforms, and provide a solid foundation for further academic research on public mental health.

Keywords

ChatGLM, Scrapy, Large Language Models, Artificial Intelligence

[1]. Lei, X., Liu, C., & Jiang, H. (2021). Mental health of college students and associated factors in Hubei of China. PLoS One, 16(7), e0254183.

[2]. Yu, R., Chen, Y., Li, L., Chen, J., Guo, Y., Bian, Z., Lv, J., Yu, C., Xie, X., Huang, D., et al. (2021). Factors associated with suicide risk among Chinese adults: a prospective cohort study of 0.5 million individuals. PLoS Medicine, 18(3), e1003545.

[3]. Abi-Jaoude, E., Naylor, K. T., & Pignatiello, A. (2020). Smartphones, social media use, and youth mental health. CMAJ, 192(6), E136–E141.

[4]. Wang, L., Liu, X., Liu, Z.-Z., & Jia, C.-X. (2020). Digital media use and subsequent self-harm during a 1-year follow-up of Chinese adolescents. Journal of Affective Disorders, 277, 279–286.

[5]. Torfi, A., Shirvani, R. A., Keneshloo, Y., Tavaf, N., & Fox, E. A. (2020). Natural language processing advancements by deep learning: A survey. arXiv preprint arXiv:2003.01200.

[6]. Zhou, M., Duan, N., Liu, S., & Shum, H.-Y. (2020). Progress in neural NLP: Modeling, learning, and reasoning. Engineering, 6(3), 275–290.

[7]. De Choudhury, M., Gamon, M., Counts, S., & Horvitz, E. (2013). Predicting depression via social media. In Proceedings of the International AAAI Conference on Web and Social Media, 7, 128–137.

[8]. Reece, A. G., & Danforth, C. M. (2017). Instagram photos reveal predictive markers of depression. EPJ Data Science, 6(1), 15.

[9]. Coppersmith, G., Dredze, M., & Harman, C. (2014). Quantifying mental health signals in Twitter. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 51–60.

[10]. Schwartz, H. A., Sap, M., Kern, M. L., Eichstaedt, J. C., Kapelner, A., Agrawal, M., Blanco, E., Dziurzynski, L., Park, G., Stillwell, D., et al. (2016). Predicting individual well-being through the language of social media. In Biocomputing 2016: Proceedings of the Pacific Symposium, 516–527. World Scientific.

[11]. Orabi, A. H., Buddhitha, P., Orabi, M. H., & Inkpen, D. (2018). Deep learning for depression detection of Twitter users. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, 88–97.

[12]. Chandrasekaran, R., Mehta, V., Valkunde, T., & Moustakas, E. (2020). Topics, trends, and sentiments of tweets about the COVID-19 pandemic: Temporal infoveillance study. Journal of Medical Internet Research, 22(10), e22624.

[13]. De Choudhury, M., Counts, S., & Horvitz, E. (2013). Social media as a measurement tool of depression in populations. In Proceedings of the 5th Annual ACM Web Science Conference, 47–56.

[14]. Devlin, J. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[15]. Yenduri, G., Ramalingam, M., Selvi, G. C., Supriya, Y., Srivastava, G., Maddikunta, P. K. R., Raj, D. G., Jhaveri, R. H., Prabadevi, B., Wang, W., Vasilakos, A. V., & Gadekallu, T. R. (2023). Generative pre-trained transformer: A comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions.

[16]. Liu, Y. (2019). RoBERTa: A robustly optimized BERT pre-training approach. arXiv preprint arXiv:1907.11692.

[17]. Sanh, V. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

[18]. Inamdar, S., Chapekar, R., Gite, S., & Pradhan, B. (2023). Machine learning driven mental stress detection on Reddit posts using natural language processing. Human-Centric Intelligent Systems, 3(2), 80–91.

[19]. Radford, A. (2018). Improving language understanding by generative pre-training.

[20]. Du, J., Grave, E., Gunel, B., Chaudhary, V., Celebi, O., Auli, M., Stoyanov, V., & Conneau, A. (2020). Self-training improves pre-training for natural language understanding. arXiv preprint arXiv:2010.02194.

[21]. Xu, X., Yao, B., Dong, Y., Gabriel, S., Yu, H., Hendler, J., Ghassemi, M., Dey, A. K., & Wang, D. (2024). Mental-LLM: Leveraging large language models for mental health prediction via online text data. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 8(1), 1–32.

[22]. Team GLM, Zeng, A., Xu, B., Wang, B., Zhang, C., Yin, D., Feng, G., Zhao, H., Lai, H., Yu, H., Wang, H., Sun, J., Zhang, J., Cheng, J., Gui, J., Tang, J., Zhang, J., Cheng, J., Gui, J., Wu, L., Zhong, L., Liu, M., Huang, M., Zhang, P., Zheng, Q., Lu, R., Duan, S., Zhang, S., Cao, S., Yang, S., Tam, W. L., Zhao, W., Liu, X., Xia, X., Zhang, X., Liu, X., Yang, X., Song, X., Zhang, X., An, Y., Xu, Y., Niu, Y., Yang, Y., Li, Y., Bai, Y., Dong, Y., Qi, Z., Wang, Z., Yang, Z., Du, Z., Hou, Z., Wang, Z. (2024). ChatGLM: A family of large language models from GLM-130B to GLM-4 all tools.

[23]. Ke, L., Tong, S., Chen, P., & Peng, K. (2024). Exploring the frontiers of LLMs in psychological applications: A comprehensive review. arXiv preprint arXiv:2401.01519.

[24]. Kouzis-Loukas, D. (2016). Learning Scrapy. Packt Publishing Ltd.

[25]. Zheng, Y., Zhang, R., Zhang, J., Ye, Y., Luo, Z., Feng, Z., Ma, Y. (2024). LLamaFactory: Unified efficient fine-tuning of 100+ language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Bangkok, Thailand. Association for Computational Linguistics.

Cite this article

Liang,J.Z. (2025). Constructing a mental health analysis system for social media using large language models. Advances in Engineering Innovation,16(1),13-22.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Journal:Advances in Engineering Innovation

Volume number: Vol.16
ISSN:2977-3903(Print) / 2977-3911(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).