Exploring Sentiment-based GANs for Stock Price Prediction

Yuxin Wang

doi:10.54254/2755-2721/2025.21280

1. Introduction

Market dynamics are inherently complex and volatile, making it a longstanding challenge to predict trends in the stock market. Stock price prediction has several traditional machine learning methodologies used. Illa et al. show the use of the Random Forest (RF) algorithm and Support Vector Machine (SVM) in stock price prediction [1]. A good choice is also such statistical models as ARIMA and GARCH, which capture historical stock trends and volatility patterns. As well, more advanced methods, such as neural networks or Long Short-Term Memory (LSTM) models could be implemented to allow for the capture of complex relationships in sequential data [2]. A lot of the research is riding on the merits of taking the historical price data and using some technical indications, however, it had been seen from the internet and social media, as well as news in so many cases, that that can make a huge amount of difference in what the stock price is going to be [3].

One of the most powerful ways to determine or measure public sentiment or opinion is by way of sentiment analysis, whereby information is pulled from a range of text sources like analyst reports, financial news, social media posts, etc. The usefulness of sentiment analysis lies mainly in its ability to predict market information related to investor’s emotions and reactions, by way of sentiment scores that express the market industry as a whole attitude toward some stocks or the market [4].

GANs, offer a promising framework for using sentiment data to predict stock prices. GANs consist of two networks: a generator, which creates synthetic data, and a discriminator, which assesses its realism [5]. Sentiment scores play a part in sentiment-based GAN by incorporating historical data along with sentiment, such that the model now simulates stock price movement based on both quantitative trends and qualitative sentiment shifts.

To show the benefits of sentiment-based GANs over conventional machine learning techniques, this paper explores sentiment-based GANs in stock prediction, analyzing current developments, popular datasets, and assessment measures. To increase the robustness and usability of sentiment-based GANs in financial forecasting, the paper concludes by discussing several issues, such as data quality and noise, computational complexity, and ethical concerns, and suggesting a possible future direction. Intending to improve prediction capacity, this paper attempts to clarify sentiment-based GANs as a potentially useful instrument for contemporary financial markets by combining quantitative and qualitative data.

2. Overview of Recent Mainstream Methods

2.1. Sentiment Analysis Techniques

Also known as text mining, sentiment analysis takes qualitative information like public opinion from social media, news, and analyst reports, and transforms it into quantifiable sentiment scores that act as a proxy for how a stock, or market, is feeling. Initially applied early sentiment analysis methods relied on lexicon-based techniques for opinion determination relying on predefined word lists. Unfortunately, however, these methods tended to struggle with context, especially in more complex financial language, filled with jargon and subtle elaboration. In fact, with recent applications in deep learning, especially transformer-based models like BERT and FinBERT, that are more suitable in the financial field, the results of sentiment analysis become much more accurate and context-aware.

As shown in Figure 1, the FinBERT language model is used to categorize a piece of news into a specific category. FinBERT analyses a textual input and provides an output between 0 and 1 and the sentiment label: It is positive, negative, and neutral. A higher score indicates a higher confidence in the label. This score derived from this analysis gives prediction models a useful data layer with which to capture the influence of investor behavior, media coverage, etc. on market trends.

/word/media/image1.png

Figure 1: Sentiment Analysis Diagram Flow [6].

2.2. GAN Application in Finance

As seen in Figure 2, GANs [5], are made up of two neural networks: a discriminator and a generator. The generator’s job is to generate the synthetic data samples, while simultaneously the discriminator assesses these to distinguish between the synthetic and actual data. Because GANs are adversarial setups where the discriminator and generator compete against each other to create very real data that would be able to uncover the structure of the training data [5, 7], GANs are an ideal choice for financial forecasting.

/word/media/image2.emf

Figure 2: GAN Architecture. (Picture credit : Original)

2.3. Integration of Sentiment Analysis with GANs

Sentiment-based GANs are proposed by integrating sentiment analysis with GANs, where sentiment data is used as an extra input for the GAN model to help improve predictive accuracy [4]. In sentiment-based GANs, sentiment scores from historical price data are used to combine with real-time social media and financial news as inputs into the generator network. The dual-input structure as such allows a GAN to learn to simulate price movements including both quantitative trend and qualitative sentiment shifts, adding a psychological and behavioral dimension to the model [8].

Since sentiment-based GANs use sentiment as an input, they can detect patterns that are often missed by models using historical pricing only. For example, these models can adjust projected price movements to accommodate plausible market reactions to new data, as one way they might respond to sudden changes in sentiment, stemming from large news events or social media trends. This flexibility is particularly useful in periods of extreme volatility when brand-influenced price fluctuations can drive significant changes in market behavior [6].

3. Common Datasets and Evaluation Metrics

3.1. Datasets

Stock prediction with sentiment based GANs utilizes sentiment terms with historical market data to interpret market behavior. Indexes such as the S&P 500, Dow Jones, or NASDAQ have a lot of historical stock data available on Yahoo Finance or Bloomberg, that can be used to identify price patterns [9]. Sentiment formed from social media like Reddit, and Twitter is real time sentiment and reflects public opinion and thus may affect price increases. Unlike general news sentiment that loves to post borderline headlines to garner views, Reuters and Dow Jones Newswires are more reliable because of their expert curation. For example, FinBERT aggregates sentiment on several different sources to provide more accurate financial forecasting [10].

3.2. Evaluation Metrics

The performance of sentiment-based GAN is measured with key metrics related to both accuracy and practical relevance, particularly in regression tasks like predicting stock trend changes in time series. Mean Squared Error (MSE) catches the average squared difference between predicted and actual values, while a heavier penalty is applied to larger errors. Root Mean Squared Error (RMSE) provides a consistent accuracy measure in the original data units, as given in Equation (1) and Equation (2) [5]. Mean Absolute Error (MAE), as shown in Equation (3), quantifies the average magnitude of prediction errors irrespective of direction and is more resistant to outliers. It is important for validation of model stability through sudden market changes, derives urgently for the time series forecasting [9].

\( MSE=\frac{1}{N}\sum _{t=1}^{N}{({y_{t}}-{\widetilde{y}_{t}})^{2}}\ \ \ (1) \)

\( RMSE=\sqrt[]{\frac{1}{N}\sum _{t=1}^{N}{({y_{t}}-{\widetilde{y}_{t}})^{2}}}\ \ \ (2) \)

\( MAE=\frac{1}{N}\sum _{t=1}^{N}|{y_{t}}-\widetilde{{y_{t}}}|\ \ \ (3) \)

Where: N is the total number of observations, \( {y_{t}} \) is the actual value at time t, \( \widetilde{{y_{t}}} \) is the predicted value at time t.

In classification tasks when predicting whether a stock’s price would rise or fall at a specific time point, accuracy, as shown in Equation (4), measures the percentage of correct predictions, assessing overall performance. However, in imbalanced datasets, accuracy can be misleading and requires more specialized metrics.

Detailed evaluation is given in terms of Precision, Recall, and F1 Score as defined in Equations (5), (6), and (7). High precision means reliable predictions of upward movements while recall deals with the ability of the model to detect all the possible upward price changes, which is in other words a sensitivity to profit opportunities.

The F1 Score, the harmonic mean of precision and recall balances the metrics. As false negatives and positives in trading can cost you a whole lot of money, it's crucial in trading. The usage of these metrics guarantees comprehensive evaluation for more accurate and more reliable stock trend predictions.

\( Accuracy=\frac{TP+TN}{TP+TN+FP+FN}×100\ \ \ (4) \)

\( Precision=\frac{TP}{TP+FP}\ \ \ (5) \)

\( Recall= \frac{TP}{TP+FP}\ \ \ (6) \)

\( F1=2×\frac{Precision × Recall}{Precision+Recall}\ \ \ (7) \)

Where: True Positives (TP) are the number of correctly predicted positive cases, False Positives (FP) are the number of instances incorrectly predicted as positive but actually negative, False negatives (FN) are the number of instances incorrectly predicted as negative but actually positive

Equation (8) is the Sharpe Ratio that measures how risky adjusted return is an investment’s excess return divided by its volatility. As part of performance evaluation, it can evaluate trading strategies and trading portfolios by comparing the return obtained and the risk one took.

\( Sharpe Ratio=\frac{{R_{P}}-{R_{f}}}{{σ_{p}}}\ \ \ (8) \)

Where: \( {R_{P}} \) is the mean return of the portfolio or investment, \( {R_{f}} \) is the risk-free rate and \( {σ_{p}} \) is the standard deviation of the portfolio’s returns

Together, these metrics provide a comprehensive view of GAN performance in stock forecasting.

4. Current State of Sentiment-based GANs in Stock Prediction

There has been significant improvement with sentiment-based GANs in stock prediction through the integration of real-time sentiment data together with historical prices and the models are much more sensitive to changes in market sentiments. Some of these include SentimentGAN, which is useful in high-frequency trading since it creates synthetic prices based upon new real-time tweet data to represent Sentiment. As opposed to the standard GAN, Price-Sentiment-WGAN exhibits stability based on the Wasserstein GAN model, thus synthesizing sentiment and price data to create an outplaying stock prediction model, especially in high-fluctuation markets [3].

As for other models, for instance, such a model as Attention-based Sentiment GAN (AS-GAN) incorporates attention mechanisms that will allow focusing on major changes in sentiment and improve the model’s response to what is moving the market. Conditional GAN with Sentiment (cGAN-S) employs sentiment scores as conditional inputs for predictions of certain scenarios and Financial Sentiment GAN (F-SentGAN) has a special focus on structured news sentiment, which it is precisely designed to predict news-induced price fluctuations. Compared to single-source sentiment GAN, the sensitivity of MS-GAN, which averages the sentiment from various sources to eliminate repetitive source preferences.

Such an advantage of sentiment-based GANs applies to contrast with other models aiming at predicting stock prices. With sentiment data feeds they can shift their models based on real-time sentiment to reflect mood changes, something that standard models cannot do. Sentiment makes the training of GANs more effective because sentiment-based GANs are trained to understand the correlation between sentiment and price shift, and thus can predict market reactions during fluctuations [5]. They are also less sensitive to overfitting; Since GANs produce multiple synthetic data sets, they do not overly rely on previous details [7]. These aspects ensure that sentiment-based GANs are classified into the category of qualitative analysis with a quantitative nature that makes it effective in real-stock forecasts [11].

In general, sentiment-based GANs are a new advancement in the field of financial forecasting that uses historical data and market sentiment to predict the results more accurately and flexibly. The use of sentiment-based GANs is expected to spread with higher levels of scalability and robustness with the rise in preprocessing as well as computing capabilities highlighting and solidifying their position as important tools for today’s financial markets [12].

5. Challenges and Future Directions

5.1. Challenges

Despite its promising potential, sentiment-based GANs have a number of drawbacks. Integration of unstructured sentiment points such as social media often causes the main issues – data noise and quality. Spatially biased or irrelevant material may also interfere with predictions by supplying inadequate mood signals [8].

Another problem is computational complexity; sentiment-based GANs need significant resources to analyze sentiment data in real time while also performing adversary training. This lowers the feasibility of high-frequency trading or continuous forecasting systems because it raises latency and restricts expansion [9].

Finally, sentiment-based GANs face ethical challenges, including privacy, fairness, and misuse. As data comes from public sources such as social media, privacy becomes a concern [13]. Biased sentiment data can lead to incorrect predictions or further emphasize stereotypes [14]. Furthermore, these models are prone to misuse, such as market or public opinion manipulation, and their black box nature obscures the transparency of critical applications [15, 16].

5.2. Potential Solutions and Future Directions

To address the current challenges and unlock the full potential of sentiment-based GANs, several forward-looking research directions offer promising solutions.

First, since the nature of social media posts can boast a high degree of bias and noise, a higher quality of sentiment inputs is achieved through enhanced validation of sentiment and data selection. Reliability can be increased by using multi-stage filtering techniques, which cross-validate sentiment signals from social media with more structured sources, such as financial news. This method optimizes sentiment data and selects relevant information for accurate prediction using context-aware NLP models, including BERT [8,17].

Composite model architectures can minimize computational costs in sentiment based GANs. Combining GANs with lightweight RNNs such as GRUs or attention mechanisms, can save processing demands for time series data. For instance, substituting some of the LSTM layers with Attention layers creates the simply performing model at nearly half the cost in Price-Sentiment-WGAN. Asynchronous processing provides the ability for real time handling of high frequency data streams. Model distillation provides another way to reduce model size without impacting accuracy, as well. Collectively these techniques increase computational efficiency to predict real time with high accuracy in dynamic financial environments [12].

Finally, as sentiment-based models are getting increasingly integrated into financial decision-making, the topic of ethical guard rails and manipulation prevention is relevant. Sentiment instability markers and measures for detecting sentiment data switched to negative or positive incorrectly, would help models to prevent overreaction to biased and unreliable inputs. These moral boundaries ensure that sentiment-based GANs provide responsible input to the forecasts of the financial industry, which commits to the increasing demand for ethical implementation of AI.

6. Conclusion

This paper offers a thorough review of sentiment-based GANs for stock price prediction, focusing on three main areas: introduction to how sentiment analysis is used with GANs with an overview of the benefits of GANs over traditional ML models and caveats. On top of historical market data, the combination of GANs integrates real-time qualitative sentiment data like social media and financial news) to form a more dynamic and sophisticated prediction model. And that integration allows models to better model movements in the market by changes in sentiment, which is particularly useful when sentiment drives move on the market the move are out of the ordinary or they are volatile.

However, despite all these advances, sentiment-based GAN's wider applicability and robustness are hampered by a small number of key limitations. Yet data quality remains a problem since sentiment data pulled from platforms such as Twitter often includes bias, noise, and irrelevant information that can cloud predictions. A further problem is computational complexity — GANs are not well suited to high-frequency trading scenarios, where resources for adversarial training and real-time sentiment monitoring are required. Ethical considerations, such as concerns for privacy, that they could be manipulated to influence sentiment, and the opaqueness of model decisions, all make their implementation in actual financial applications more difficult.

The crucial limitation when developing future research should thus be the optimization of data preprocessing and filtering methods, to increase the reliability of sentiment inputs. If highly successful networks like transformers or lightweight RNNs could be merged with GANs to form hybrid models, they might be able to simultaneously reduce processing requirements while still maintaining the same predictive height. In addition, the use of these models requires ethical protection to make sure they are correctly and practically used. Model transparency systems and sentiment manipulation systems are included in this.

This paper makes an important contribution to improving the knowledge and applications of sentiment-based GANs in financial forecasting. These models create great promise for revolutionizing stock price prediction by connecting quantitative market data with qualitative sentiment research findings. In addition to presenting aspects of their advantages, this paper also highlights areas of development and opens the door for further research on scalability, dependability, as well as moral use in highly sophisticated and dynamic financial markets.

References

[1]. Illa, P. K., Parvathala, B., & Sqharma, A. K. (2021). Stock price prediction methodology using random forest algorithm and support vector machine. Materials Today: Proceedings.

[2]. Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270(2), 654–669.

[3]. Asgarian, S., Ghasemi, R., & Momtazi, S. (2022). Generative adversarial network for sentiment‐based stock prediction. Concurrency and Computation: Practice and Experience, 35(2).

[4]. Zhang, Y., Li, J., Wang, H., & Choi, S.-C. T. (2021). Sentiment-Guided Adversarial Learning for Stock Price Prediction. Frontiers in Applied Mathematics and Statistics, 7.

[5]. Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2013, June 10). Generative Adversarial Networks. ArXiv.org.

[6]. Jain, J. K., & Agrawal, R. (2024). FB-GAN: A Novel Neural Sentiment-Enhanced Model for Stock Price Prediction. ACL Anthology, 85–93. https://aclanthology.org/2024.finnlp-1.9/

[7]. Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A. A. (2018). Generative Adversarial Networks: An Overview. IEEE Signal Processing Magazine, 35(1), 53–65. https://doi.org/10.1109/msp.2017.2765202.

[8]. Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8.

[9]. Chen, T., & He, Z. (2020). The use of social media sentiment in stock prediction. IEEE Transactions on Knowledge and Data Engineering.

[10]. Huang, W., Nakamori, Y., & Wang, S. Y. (2020). Forecasting stock market movement direction with support vector machine. Computers & Operations Research, 32(10), 2513–2522.

[11]. Bhardwaj, M., Roy, A., & Saurabh Bilgaiyan. (2024). StockGAN: Enhancing Stock. Price Prediction with GAN and Sentiment Analysis. 1–6.

[12]. Gudla, A., Reddy, P. C. S., & Praveen, P. (2024). Enhancing time series stock predictions using GANs with technical indicators and Twitter sentiment: Challenges with low-popularity tickers. Journal of Computational Analysis and Applications, 33(2), 438-451.

[13]. Rawat, A., Kumar, S., & Surender Singh Samant. (2024). Hate speech detection in social media: Techniques, recent trends, and future challenges. WIREs Computational Statistics, 16(2).

[14]. Liu, Q., & Son, H. (2024). Data selection and collection for constructing investor sentiment from social media. Humanities and Social Sciences Communications, 11(1).

[15]. Yin, R., Wu, J., Tian, R., & Gan, F. (2022). Topic modeling and sentiment analysis of Chinese people’s attitudes toward volunteerism amid the COVID-19 pandemic. Frontiers in Psychology, 13. https://doi.org/10.3389/fpsyg.2022.1064372.

[16]. You, G., Gan, S., Guo, H., & Dagestani, A. A. (2022). Public Opinion Spread and Guidance Strategy under COVID-19: A SIS Model Analysis. Axioms, 11(6), 296.

[17]. Logeswaran, L., Chang, M.-W., Lee, K., Toutanova, K., Devlin, J., & Lee, H. (2019). Zero-Shot Entity Linking by Reading Entity Descriptions. ArXiv.org. https://arxiv.org/abs/1906.07348.

Cite this article

Wang,Y. (2025). Exploring Sentiment-based GANs for Stock Price Prediction. Applied and Computational Engineering,125,239-246.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 3rd International Conference on Mechatronics and Smart Systems

ISBN：978-1-83558-909-0(Print) / 978-1-83558-910-6(Online)

Editor：Mian Umer Shafiq

Conference website: https://2025.confmss.org/

Conference date: 16 June 2025

Series: Applied and Computational Engineering

Volume number: Vol.125

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).