
A Machine Learning-Enhanced Chat Application for the Identification of Mental Disorders
- 1 School of Computer Science, Wuhan University, Wuhan 430072, China
- 2 Bishop Guertin High School, Nashua, NH 03060
- 3 Magee Secondary School, Vancouver, BC, Canada, V6M 4M2
- 4 Westridge School for Girls, Pasadena, CA 91105
* Author to whom correspondence should be addressed.
Abstract
The prevalence of mental disorders is increasing, but they continue to be underdiagnosed and under addressed. Social media platforms offer novel opportunities for detecting potential mental health issues through the analysis of user-generated content. This paper presents a chat-based program developed using machine learning models trained on a dataset of comments from Reddit users. The program is capable of predicting the type of mental illness based on user input. This study provides a detailed comparison of various classification algorithms, including Naïve Bayes, Logistic Regression (LR), Support Vector Machines (SVM), and Random Forests (RF). Additionally, the paper discusses relevant machine learning techniques from previous studies. The results indicate that LR model, particularly with a uni-gram feature representation, outperforms other models with an accuracy of 0.81 and demonstrates the fastest processing speed. Future research directions include the integration of Large Language Models and the development of a multilingual chat interface.
Keywords
Social media, mental disorder, machine learning, classification algorithm
[1]. Qiao, J. (2020) A Systematic Review of Machine Learning Approaches for Mental Disorder Prediction on Social Media. CDS, Stanford, CA, USA, pp. 433-438,
[2]. Mental disorders. (2013) [online] Available: https://www.who.int/mental_health/management/en/.
[3]. X. Wang, C. Zhang, Y Ji, L. Sun, L. Wu and Z. Bao. (2013) A Depression Detection Model Based on Sentiment Analysis in Micro-blog Social Network. LNCS., 201-13.
[4]. Vazire, S., Gosling, S. D. (2004). e-Perceptions: Personality Impressions Based on Personal Websites. J Pers Soc Psychol, 87(1), 123–132.
[5]. De Choudhury, M., De, S. (2014) Mental health discourse on Reddit: Self-disclosure social support and anonymity. Proceedings of ICWSM, pp. 71-80.
[6]. Shen, J. H. and Rudzicz, F. (2017) Detecting anxiety on Reddit. CLPsych, pp. 58-65.
[7]. De Choudhury, M., Kiciman, E., Dredze, M., Coppersmith, G., Kumar, M. (2016) Discovering shifts to suicidal ideation from mental health content in social media. CHI, pp. 2098-2110
[8]. Lorena, A.C. et al. (2011) Comparing machine learning classifiers in potential distribution modelling. Expert Syst. Appl., 38:5268-5275.
[9]. Singh, A., Thakur, N., Sharma, A. (2016) A review of supervised machine learning algorithms. INDIA Com, New Delhi, India., pp. 1310-1315.
[10]. Bayaga A. (2010) Multinomial Logistic Regression: Usage and Application in Risk Analysis. JAQM., 5(2).
[11]. Reddy E M K, Gurrala A, Hasitha V B, et al. (2022) Introduction to Naive Bayes and a review on its subtypes with applications. Bayesian reasoning and gaussian processes for machine learning applications., 1-14.
[12]. Rish I. (2001) An empirical study of the naive Bayes classifier[C]//IJCAI 2001 workshop on empirical methods in artificial intelligence., 3(22): 41-46.
[13]. Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. (1998) Support vector machines. IEEE Intell Syst Appl., 13:18–28.
[14]. Flach P A, Lachiche N. (2004) Naive Bayesian classification of structured data. ML., 57: 233-269.
[15]. Cho G, Yim J, Choi Y, Ko J, Lee SH. (2019) Review of Machine Learning Algorithms for Diagnosing Mental Illness. Psychiatry Investig., 16(4):262-269.
[16]. Breiman, L., Friedman, H., Olshen, R.A., Stone, C J. (1984) Classification and regression trees, Wadsworth and Brooks. Monterrey, CA.
[17]. Jaime Lynn Speiser, Michael E. Miller, Janet Tooze, Edward Ip. (2019) A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl., 34:93-101.
[18]. Moore A D. (2018) Python GUI Programming with Tkinter: Develop responsive and powerful GUI applications with Tkinter. Packt Publishing Ltd.
[19]. Syarif I, Ningtias N, Badriyah T. (2019) Study on mental disorder detection via social media mining[C]//2019 4th International conference on computing, communications and security (ICCCS). IEEE., 1-6.
[20]. Sokolova M, Japkowicz N, Szpakowicz S. (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation[C]//Australasian joint conference on artificial intelligence. Berlin, Heidelberg: Springer Berlin Heidelberg,
[21]. Lin T. (1953) A study of the incidence of mental disorder in Chinese and other cultures. Psychiatry., 16(4): 313-336.
Cite this article
Cao,Q.;Yan,Z.;Gong,Z.;Huang,J. (2025). A Machine Learning-Enhanced Chat Application for the Identification of Mental Disorders. Applied and Computational Engineering,132,11-19.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 2nd International Conference on Machine Learning and Automation
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).