Research Article
Open access
Published on 19 December 2024
Download pdf
Zhang,X.;Feng,H.;Li,S.;Yang,Y. (2024). Prediction of Stock Return Based on Sentiment. Theoretical and Natural Science,56,137-150.
Export citation

Prediction of Stock Return Based on Sentiment

Xianyin Zhang *,1, Haoran Feng 2, Shuyu Li 3, Yiqiao Yang 4
  • 1 University of Rochester
  • 2 Xidian University
  • 3 Basis International School Park Lane Harbour
  • 4 Beijing National Day School

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2753-8818/2024.18489

Abstract

In the rapidly evolving field of financial forecasting, the accurate prediction of stock returns remains a significant challenge. This paper aims to leverage Natural Language Processing algorithms (NLP) to develop a predictive model for stock returns. The research utilized return labels derived from stock price return and sentiment data extracted from StockTwits (a financial social media platform) comments from January 2020 to March 2022. A comparative analysis was conducted to assess the performance of traditional statistical models (Logistics Regression Model), supervised models (Random Forest , Gradient Boosting, XGBoost and Naïve Bayes Model), and an ensemble model (Majority Vote Model) in the prediction tasks. The objective was to identify the most effective model, and to provide precise predictions for future stock returns. Our simulations show that (1) Sentiments can work as an effective proxy to predict stock return; (2) “likes” from users to comments is suitable for price prediction; (3) Logistic Regression didn’t work well in prediction, even when used with other techniques; (4) Random Forest Model and Gradient Boosting Model outperform other simpler models, showing promising predictive results; (5) Ensemble model effectively diminishes the influences of potential model overfitting problems. These findings underline the potential of sentiment analytic models as a tool for more accurate financial forecasting.

Keywords

Stock Return Prediction, NLP, Sentiment Analysis, Machine Learning Models

[1]. Fama, E. F. (1970). Efficient capital markets. Journal of finance, 25(2):383–417.

[2]. Shiller, R. J. et al. (1981). Do stock prices move too much to be justified by subsequent changes in dividends?

[3]. Mehra, R. and Prescott, E. C. (1985). The equity premium: A puzzle. Journal of monetary Economics, 15(2):145–161.

[4]. Barberis, N., Huang, M., and Santos, T. (2001). Prospect theory and asset prices. The quarterly journal of economics, 116(1):1–53.

[5]. Lo, A. W. (2004). The adaptive markets hypothesis: Market efficiency from an evolutionary perspective. Journal of Portfolio Management, Forthcoming.

[6]. Kao, A. and Poteet, S. R. (2007). Natural language processing and text mining. Springer Science & Business Media.

[7]. Javed Awan, M., Mohd Rahim, M. S., Nobanee, H., Munawar, A., Yasin, A., and Zain, A. M. (2021). Social media and stock market prediction: a big data approach. MJ Awan, M. Shafry, H. Nobanee, A. Munawar, A. Yasin et al.,” Social media and stock market prediction: a big data approach,” Computers, Materials & Continua, 67(2):2569–2583.

[8]. Khan, W., Ghazanfar, M. A., Azam, M. A., Karami, A., Alyoubi, K. H., and Alfakeeh, A. S. (2022). Stock market prediction using machine learning classifiers and social media, news. Journal of Ambient Intelligence and Humanized Computing, pages 1–24.

[9]. Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

[10]. Pennington, J., Socher, R., and Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.

[11]. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[12]. Medhat, W., Hassan, A., and Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal, 5(4):1093–1113.

[13]. Luo, X., Zhang, J., and Duan, W. (2013). Social media and firm equity value. Information Systems Research, 24(1):146– 163.

[14]. Sul, H. K., Dennis, A. R., and Yuan, L. (2017). Trading on twitter: Using social media sentiment to predict stock returns. Decision Sciences, 48(3):454–488.

[15]. Houlihan, P. and Creamer, G. G. (2017). Can sentiment analysis and options volume anticipate future returns? Computational Economics, 50(4):669–685.

[16]. Kim, S.-H. and Kim, D. (2014). Investor sentiment from internet message postings and the predictability of stock returns. Journal of Economic Behavior & Organization, 107:708–729.

[17]. Ren, R., Wu, D. D., and Liu, T. (2019). Forecasting stock market movement direction using sentiment analysis and support vector machine. IEEE Systems Journal, 13(1):760–770.

[18]. Shah, D., Isah, H., and Zulkernine, F. (2018). Predicting the effects of news sentiments on the stock market. In 2018 IEEE International Conference on Big Data (Big Data), pages 4705–4708. IEEE.

[19]. Houlihan, P. and Creamer, G. G. (2021). Leveraging social media to predict continuation and reversal in asset prices. Computational Economics, 57(2):433–453.

[20]. Loughran, T. and McDonald, B. (2011). When is a liability not a liability? textual analysis, dictionaries, and 10-ks. The Journal of finance, 66(1):35–65.

[21]. Hutto, C. and Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media, volume 8, pages 216–225.

[22]. Baccianella, S., Esuli, A., Sebastiani, F., et al. (2010). Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In Lrec, volume 10, pages 2200–2204. Valletta.

[23]. Murty, M. N. and Devi, V. S. (2011). Pattern recognition: An algorithmic approach. Springer Science & Business Media.

[24]. Ho, T. K. (1995). Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, volume 1, pages 278–282. IEEE.

[25]. Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189– 1232.

[26]. Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794.

Cite this article

Zhang,X.;Feng,H.;Li,S.;Yang,Y. (2024). Prediction of Stock Return Based on Sentiment. Theoretical and Natural Science,56,137-150.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 2nd International Conference on Applied Physics and Mathematical Modeling

Conference website: https://2024.confapmm.org/
ISBN:978-1-83558-679-2(Print) / 978-1-83558-680-8(Online)
Conference date: 20 September 2024
Editor:Marwan Omar
Series: Theoretical and Natural Science
Volume number: Vol.56
ISSN:2753-8818(Print) / 2753-8826(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).