Deep reinforcement learning for stock prediction

Mingkai Wang

doi:10.54254/2755-2721/69/20241453

1. Introduction

Predicting the price of stock market is a key point to provide investors with the basis for an optimal decision in trading but also a challenging task to achieve. Due to its chaotic and high volatility character, the movement is considered to be a stochastic process influenced by many uncertainties from reality, such as macroeconomic factors, the market anticipation and confidence.

There have been many previous studies involving the prediction of stock market, mostly divided into fundamental and technical analysis. Fundamental analysis estimates intrinsic value by analyzing internal and external variables, meanwhile technical analysis finds a pattern in the stock price based on which we can make predictions.

Before the appearance of machine learning methods, early researchers use statistical methods such as time series and multivariable statistics, as well as different kinds of econometric methods, to build the model and make the prediction. However, because of the characteristics of being chaotic, high volatility and nonparametric, the better methods of machine learning are applied. In this situation, it is still difficult to predict the exact values to predict the stock prices, so it is better to treat it as a classification problem rather than a prediction problem. Many machine learning methods are applied based on these conditions, such as support vector machine (SVM), logistic regression, and random forest.

Deep neural networks (DNNs) is an important kind of machine learning due to its unique advantages compared to those conventional methods: The highly complex nonlienearity of DNN can approximately simulate and describe the complexity of the influencing factors, as deep nonlinear topologies can be built to represent complex high-dimensional functions by stacking the hidden layers of nonlinear activation functions. Besides, they are numeric, datadriven and adaptive to be able to analyze inaccurate and noisy data and have been extensively used to predict time series.

The above advantages ensure a good performance of DNN for prediction problems, therefore it is feasible to predict financial problems with deep learning. In this paper, we mainly show its effect of the prediction and how it appears its advantages. (If possible, we may enforce the capability of decision-making by combining reinforcement learning.) In comparison, we also use some traditional methods in machine learning, such as SVM and logistic regression, and show the differences in the results.

The main part of this paper is organized as follows. In part 2 we introduce some related works of this topic in the previous time. In part 3 we discuss the reinforcement learning, including its characters and advantages. The comparison between it and other traditional machine learning methods will also be mentioned to show its improvement. Finally we make the conclusions about the problem and talk about the future development.

2. Literature review

In the early ages, it was widely believed that predicting the future trends in stock market prices contradicted a basic rule known as the Efficient Market Hypothesis (Fama and Malkiel) [1]. However, later more researchers chose to reject this controversial and disputed theory by using more advanced algorithms to model more complex dynamics of the financial system. In this field, more representative work has been studies by Lo, by Mackinlay [2]. Some example studies on the predictability of returns of a long-term investment are those by DeBondt and Thaler [3-5].

With a brilliant development of computer, the machine learning methods based on statistics have contributed a lot to the prediction of stock price. The machine learning methods have a better performance in handling complex nonlinear analysis, which enables various approaches compared with other studies using traditional statistics only. Those representative methods include logistic regression, genetic algorithm, fuzzy theory, SVM, decision tree, and adaptive boosting (AdaBoost) [6-9]. For example, L Khaidem et al. successfully used ensemble learning to improve the performance of random forest for stock prediction. As for logistic regression, J Gong and S Sun introduced innovative feature index variables into the prediction model and proposed a special optimization process to select optimizing regression parameters [10-11]. In 2012, A Upadhyay et al. used seven independent financial ratiosto construct a Multinomial Logistic Regression method [12]. The method of SVM is also widely considered. J Heo and JY Yang evaluated the stock price predictability of it and the time to keep a good performance to support the efficient market hypothesis; meanwhile F Wen et al. proposed the singular spectrum analysis to improve the performance of SVM [13-14].

In 1980S, NNs had been applied by IBM to predict the changes of daily stock prices as well as its returns. An autoregressive integrated moving average (ARIMA) model was added to an artificial neural network(ANN) for predicting the results of time series in stock markets. Later in recent years, deep learning have experienced a fantastic development, as well as its wide applications in prediction prices of stock market. According to different methods and usages of deep learning, for different problems we have certain choices, such as convolutional neural network (CNN), Long Short-Term Memory (LSTM) and Recurrent Neural network (RNN), but here we mainly focus on the most frequently used deep learning methods, namely the deep neural network (DNN) and Reinforcement learning (RL).

DNN can be used to predict stock trends because of its fantastic performance for prediction problems with numerous data and nonlinear mapping relations. Due to this reason, much work have been done recently. Shen et al constructed a deep belief network using a continuous restricted Boltzmann machine and used it to predict exchange rates. Through a comparison, they found that the DL network was better than the conventional feedforward NNs [15]. Song, Y. proposed a plunge filtering technique for a DNN model to improve the accuracy of it. This proposed model had great profitability [16]. Naik, N. proposed a DNN using the Boruta feature selection technique to solve the problem of selecting indicator feature and identificating the relevant indicators , which performed much better than the ANN and SVM models [17]. Chatzis, S.P. proposed a DNN model with Boosted approaches to predict stock market crisis episodes [18]. His research showed that it was meaningful for the stock market crisis to predict the price. Nakagawa, K. proposed a deep factor model and a shallow model with DNN, and the former performed better than the linear model, which implied the nonlinear relationship between the stock returns and the factors in the financial market [19]. The deep factor model also had a better performance than other machine learning methods including SVR and random forest. Chong, E. examined the effects of DNN with three feature extraction methods including, PCA, auto-encoder, and the restricted Boltzmann machine, to predict future market behavior [20]. It was showed that additional information could be extracted from the residuals of the auto-regressive model to improve the final performance in DNN.

RL learns the local optimal timing trading action from the response changes of the stock market by viewing the neighboring information of the transaction, which could be regarded as the environment of reinforcement learning, and there have been much work in this field. Shin, H.-G. proposed a RL model combined with LSTM and CNN, which generated various charts from stock trading data and used them as input layers [21]. The features extracted from the CNN were used to construct the LSTM layer. The RL defined the agents’ policy neural network structure, reward, and provided the final output. Jia, W. proposed a RL model with an LSTM-based agent to sense the dynamics of the stock market and decrease the difficulty of designing indicators from massive data [22]. Carapuço, J. proposed a RL-Q network model, in which three hidden layers of ReLU neurons were trained as RL agents through the Q-learning algorithm [23]. The framework could consistently induce stable learning that generalized to out-of-sample data. Kang, Q. proposed to apply the Asynchronous Advantage Actor-Critic algorithm (A3C algorithm) to solve the portfolio management problem, and later designed a deep RL model [24]. H Yang proposed a novel ensemble strategy combining three deep reinforcement learning algorithms to find the optimal trading strategy in a complex and dynamic stock market. The strategy can adjust to maximize return subject to risk constraint [25]. There are also methods that combine DNN and RL together, namely Deep Reinforcement Learning (DRL), which combines the perception ability of DNN with the decision-making ability of RL. For example, Y Li proposed to introduce DRL into the application of finance and proved its advantages of improving prediction ability [26].

3. Discussion

3.1. characters and advantages

The basic view of traditional research methods could be mainly divided by the viewpoint of time series and factor analysis. The former consider the evolution and changing trend of a single stock as time goes by, and in this way make the prediction of the future trend of this single stock. In this process the main characters in consider are those significantly influenced by time. The latter, on the other hand, mainly explore the deep influence factors that decides the value of stocks. This target is achieved by comparing the value of different stocks as well as their main influence factors given a fixed time.

By comparing different traditional views above, it is concluded that the researching methods from the view of time series may be short of mining the hidden information from the related characters; meanwhile from the view of factor analysis,it is not convenient to precisely control the develop trend following the time. Based on this situation, the DRL could combine the advantages of these two methods and avoid their disadvantages at the same time.

3.2. deep and reinforcement learning

Deep learning could uncover complex, intrinsic, and deep-seated data information. When dealing with problems about stock prediction, the complex characters and factors concerned with the stock price could be well described by DNN due to its complex nonlinearity. Because of the goodness and advantages above, the DNN have a quite fantastic accuracy for complex prediction problems in many fields in practice, such as image classification and natural language process. The research results above also functions well for data analysis concerning time series and prediction with a deep learning algorithm. Overall, deep learning models perform excellent in broad and diverse research fields, which makes it feasible to predict the future trend of stock markets with deep learning.

However, although DNN have many advantages as stated, there also exists limitations and we cannot blindly increase depth. Firstly, increasing the depth will increase the complexity of the model, and the required number of training samples will also increase accordingly. Secondly, the single-layer feedforward operation complexity of the prediction model is relatively high, and increasing the number of layers will lead to an overall high complexity. Thirdly, gradient explosion and gradient disappearance may occur. By combining it with another machine learning method, the DNN method could have some developments.

Reinforcement learning could be applied to learn an optimal policy by making interactions with the environment and repeated experiments. The process above is widely applied for sequential decision making problems. For such models, the key conception is the value funtion which decides the learning targets, the control algorithms are applied to find optimal policies, and the final target is to learn behavior strategies in multiple stages. Reinforcement learning enhances the model's ability to learn from data and improves feature dimensions. The method have the advantages to minimum the transaction cost.

Combining the two methods above, it is obvious that each method has its goodness and shortages. In this case, we use DNN to simulate components of reinforcement learning, including value function, policy, and the models. Although reinforcement learning can process problems of making decisions, it is short in expressing perception, which prompts the combination of reinforcemet learning and DNN.

DRL integrates the perception of DNN with the decision-making ability of reinforcement learning to achieve its goals of simulating the cognition and learning mode of human being. In this method, different sorts of information can be input, and then actions will be directly output through the simulation of DNN, which is controlled directly according to the input data without other outside supervision.

4. Conclusion and future development

Generally speaking, models usually have their shortages in real applications. The DRL also has its shortages and limitations. One key point is that the current DRL mainly use the historical data of one stock and process it as a single mode, which means that the model has a significant shortage of been in lack of the exploration of other dimensions which may potentially influence the final data. In this case, it is a highly considerable view to make the model been more compatible of the data structure with multimodal and promote the performance in general situations. The improvement of performance in generalization is important for the model, because this is closed related to the universality of it.

Apart from that, the current DRL model still can not handle the problems about portfolio strategy. This is because that the DRL model mainly focus on the problems about prediction. But in reality, on the other hand, it is more common to pursue the portfolio strategy according to the predicting result, which may have different results between by dividing it into two problems and by regarding the whole problem as an integral. How to connect the portfolio strategy to the current model and regard the problem as a whole may be a meaningful topic for prediction models, including the DRL model.

Finally, attention becomes popular these years, and it is especially suitable for stock prediction problems. As the prices of the stock have different influence on the future at different points of time in the past, it is really reasonable to add the attention mechanism to the current prediction model in the field of stock prediction.

References

[1]. Schwartz, R. A. (1970). Efficient capital markets: A review of theory and empirical work: Discussion. The Journal of Finance, 25(2), 421-423.

[2]. Malkiel, B. G. (2003). The efficient market hypothesis and its critics. Journal of economic perspectives, 17(1), 59-82.

[3]. De Bondt, W. F., & Thaler, R. (1985). Does the stock market overreact?. The Journal of finance, 40(3), 793-805.

[4]. Lim, K. P., & Brooks, R. (2011). The evolution of stock market efficiency over time: A survey of the empirical literature. Journal of economic surveys, 25(1), 69-108.

[5]. Sewell, M. (2011). History of the efficient market hypothesis. Rn, 11(04), 04.

[6]. Hadavandi, E., Shavandi, H., & Ghanbari, A. (2010). Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting. Knowledge-Based Systems, 23(8), 800-808.

[7]. Efficiency, S. M. (1993). Returns to Buying Winners and Selling Losers: Implications for. The Journal of Finance, 48(1), 65-91.

[8]. Pai, P. F., & Lin, C. S. (2005). A hybrid ARIMA and support vector machines model in stock price forecasting. Omega, 33(6), 497-505.

[9]. Wu, M. C., Lin, S. Y., & Lin, C. H. (2006). An effective application of decision tree to stock trading. Expert Systems with applications, 31(2), 270-274.

[10]. Khaidem, L., Saha, S., & Dey, S. R. (2016). Predicting the direction of stock market prices using random forest. arXiv preprint arXiv:1605.00003.

[11]. Gong, J., & Sun, S. (2009, June). A new approach of stock price prediction based on logistic regression model. In 2009 International conference on new trends in information and service science (pp. 1366-1371). IEEE.

[12]. Upadhyay, A., Bandyopadhyay, G., & Dutta, A. (2012). Forecasting stock performance in indian market using multinomial logistic regression. Journal of Business Studies Quarterly, 3(3), 16.

[13]. Heo, J., & Yang, J. Y. (2016). Stock price prediction based on financial statements using SVM. International Journal of Hybrid Information Technology, 9(2), 57-66.

[14]. Fenghua, W. E. N., Jihong, X. I. A. O., Zhifang, H. E., & Xu, G. O. N. G. (2014). Stock

[15]. Shen, F., Chao, J., & Zhao, J. (2015). Forecasting exchange rate using deep belief networks and conjugate gradient method. Neurocomputing, 167, 243-253.

[16]. Song, Y., Lee, J. W., & Lee, J. (2019). A study on novel filtering and relationship between input-features and target-vectors in a deep learning model for stock price prediction. Applied Intelligence, 49, 897-911.

[17]. Naik, N., & Mohan, B. R. (2019). Stock price movements classification using machine and deep learning techniques-the case study of indian stock market. In Engineering Applications of Neural Networks: 20th International Conference, EANN 2019, Xersonisos, Crete, Greece, May 24-26, 2019, Proceedings 20 (pp. 445-452). Springer International Publishing.

[18]. Chatzis, S. P., Siakoulis, V., Petropoulos, A., Stavroulakis, E., & Vlachogiannakis, N. (2018). Forecasting stock market crisis events using deep and statistical machine learning techniques. Expert systems with applications, 112, 353-371.

[19]. Nakagawa, K., Uchida, T., & Aoshima, T. (2018, September). Deep factor model: Explaining deep learning decisions for forecasting stock returns with layer-wise relevance propagation. In Workshop on Mining Data for Financial Applications (pp. 37-50). Cham: Springer International Publishing.

[20]. Chong, E., Han, C., & Park, F. C. (2017). Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies. Expert Systems with Applications, 83, 187-205.

[21]. Shin, H. G., Ra, I., & Choi, Y. H. (2019, October). A deep multimodal reinforcement learning system combined with CNN and LSTM for stock trading. In 2019 International conference on information and communication technology convergence (ICTC) (pp. 7-11). IEEE.

[22]. Jia, W. U., Chen, W. A. N. G., Xiong, L., & Hongyong, S. U. N. (2019, July). Quantitative trading on stock market based on deep reinforcement learning. In 2019 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.

[23]. Carapuço, J., Neves, R., & Horta, N. (2018). Reinforcement learning applied to Forex trading. Applied Soft Computing, 73, 783-794.

[24]. Kang, Q., Zhou, H., & Kang, Y. (2018, October). An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio management. In Proceedings of the 2nd International Conference on Big Data Research (pp. 141-145).

[25]. Yang, H., Liu, X. Y., Zhong, S., & Walid, A. (2020, October). Deep reinforcement learning for automated stock trading: An ensemble strategy. In Proceedings of the first ACM international conference on AI in finance (pp. 1-8).

[26]. Li, Y., Ni, P., & Chang, V. (2020). Application of deep reinforcement learning in stock trading strategies and stock forecasting. Computing, 102(6), 1305-1322.

Cite this article

Wang,M. (2024). Deep reinforcement learning for stock prediction. Applied and Computational Engineering,69,85-90.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 6th International Conference on Computing and Data Science

ISBN：978-1-83558-459-0(Print) / 978-1-83558-460-6(Online)

Editor：Alan Wang, Roman Bauer

Conference website: https://www.confcds.org/

Conference date: 12 September 2024

Series: Applied and Computational Engineering

Volume number: Vol.69

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).