Applications of BERT in sentimental analysis

Zihan Su

doi:10.54254/2755-2721/92/20241711

1. Introduction

Natural language processing (NLP) aims to enable computers to understand, generate, and recognize human language. Pre-training language models have demonstrated improvements in various NLP tasks, including question answering and text summarization. Traditional pre-training methods have limitations due to their reliance on bidirectional context, which poses challenges in fine-tuning strategies. In contrast, BERT uses a "masked language model" pre-training objective, enabling the capture of left and right context to pre-train deep bidirectional representations, as outlined by Devlin et al. [1]. This innovative approach has allowed BERT to achieve state-of-the-art results across multiple NLP tasks, setting a new benchmark in the field.

This paper specifically investigates the application of BERT in sentiment analysis, comparing its performance with other approaches such as traditional rule-based methods, machine learning algorithms, and other deep learning models, as highlighted by Howard and Ruder [2], and Zhang and Yang [3]. The study aims to highlight the advancements brought by BERT and discuss its limitations and future research directions. Sentiment analysis is crucial in understanding public opinion, customer feedback, and social media interactions, making it an essential tool in various fields such as marketing, politics, and healthcare, as noted by Alaparthi and Mishra [4].

The primary research question addressed in this paper is how BERT's performance in sentiment analysis compares to traditional rule-based methods, machine learning algorithms, and other deep learning models. To explore this, the paper is structured as follows: an overview of BERT's architecture and training, methodologies of sentiment analysis, a comparative analysis of approaches, key study results, and an examination of the challenges and limitations faced by BERT in sentiment analysis. The paper concludes with future directions for research and applications.

By thoroughly analyzing these aspects, this study seeks to demonstrate the significant impact of BERT on sentiment analysis and propose ways to further enhance its effectiveness and application in various domains. This research contributes to the growing body of knowledge on NLP and sentiment analysis, offering insights into the practical and theoretical implications of using advanced models like BERT.

2. Background introducetion: BERT Architecture and Training

2.1. Model architecture

The multi-layer bidirectional transformer encoder that BERT uses, consists of multiple transformer blocks. In this, the parameters are the number of layers (L), the number of self-attention heads (A), and the hidden size (H), as implemented by Vaswani et al. [5]. Two sizes are available for BERT: BERTBASE, which has 110 million parameters total with L=12, H=768, A=12, and BERTLARGE, which has 340 million parameters total with L=24, H=1024, A=16 [1]. Unlike the limited self-attention in models such as GPT, where tokens only attend to their left context, the bidirectional self-attention mechanism in BERT enables each token to attend to all the tokens in the input sequence.

2.2. Input/output representations

The input representation of BERT is made to manage a range of downstream duties. Within a token sequence, it can represent a single sentence or two sentences together. One sentence or two sentences crammed together can make up a sequence. WordPiece embeddings with a 30,000 token vocabulary are used by the model. Sentences inside a sequence are separated by a special token ([SEP]), and each input sequence begins with a special classification token ([CLS]) [1]. The input representation of each token is the product of its location, segment, and token embeddings.

2.3. Pre-training BERT

In contrast to conventional left-to-right or right-to-left language models, BERT is pre-trained on two unsupervised tasks: Next Sentence Prediction (NSP) and Masked Language Modeling (MLM).

2.4. Language Modeling using Masks (MLM)

In MLM, a portion of the input tokens are randomly masked, and the masked tokens are predicted. Bidirectional conditioning isn't possible in typical language models because it would let the model see the token it is forecasting. In order to solve this, 15% of the WordPiece tokens in each sequence are randomly masked, and BERT then forecasts these masked tokens. A mismatch arises, however, between pre-training and fine-tuning since the [MASK] token is not present during fine-tuning. To counter this, 80% of the tokens time are replaced with [MASK] tokens, 10% with random tokens and 10% remain unchanged [1].

2.5. Next Sentence Prediction (NSP)

Next Sentence Prediction (NSP) is intended to assist BERT in comprehending sentence relationships, which are essential for tasks such as Question Answering (QA) and Natural Language Inference (NLI). In pre-training, sentence pairs are formed so that, in 50% of cases, the second sentence (B), designated as "IsNext," is the sentence that comes after the first (A). In 50% of cases, B is an arbitrary text selected at random from the corpus and marked as "NotNext". By practicing sentence relationships, this assignment helps BERT become more proficient in tasks that demand it later on.

2.6. Pre-Training data

Lists, tables, and headers are not included in the pre-training corpus, which consists of 800 million words from BooksCorpus and 2,500 million words from the English Wikipedia. Selecting an appropriate document-level corpus is crucial for the extraction of lengthy, uninterrupted sequences, which is necessary for efficient language model pre-training [1].

2.7. Fine tuning BERT

Because of the Transformer architecture's self-attention mechanism, fine-tuning BERT for particular activities is simple. Text pairings and bidirectional cross-attention, a component of the self-attention process, is how BERT carries out its encoding process. Any action that requires changing from input to output, BERT is linked to it [1]. This approach builds on the principles established by previous models like GPT and the Generative Pre-training (GPT) technique by Radford et al. [6], which focuses on improving language understanding through generative pre-training.

2.8. Application of BERT in Sentiment Analysis

Sentiment analysis (SA) is identifying, removing and classifying subjective elements from textual data as well as interpreting thoughts, attitudes, and sentiments as neutral, negative, or positive [7]. he effectiveness of BERT in this domain is also supported by the development of other pre-training models, such as XLNet, which combines autoregressive pre-training and can be adapted to improve the generalization of BERT models for sentiment analysis tasks [8].

3. Methodologies

According to Gunasekaran [7] techniques have been developed to facilitate enhanced sentiment analysis, leading to increased efficacy and accuracy. The growing popularity of social media sites like Twitter, which offer a wealth of real-time test data, is blamed for this.

3.1. Traditional approaches

3.1.1. Rule-Based methods. Despite being preferable because of its simplicity, Rule-Based Methods are not always able to account for the complex grammar and structure of spoken language as they rely on fixed language standards [3].

3.1.2. Machine learning algorithms. This model belongs to the class of Support Vector Machines (SVM) trained with labeled datasets and also includes Naive Bayes. Like humans, they are better at making decisions in a changing environment, still some tasks that do only by previous techniques [2].

3.1.3. Deep learning models. Sentiment analysis has gained a lot of traction due to the introduction of convolutional neural networks (CNNs) and recurrent neural networks (RNNs). These systems of artificial intelligence are more effective in detecting nuances of language as well as there being easy identification of complex patterns and relations in text data.

3.2. Leveraging BERT for Sentiment Analysis

BERT has become a vital sentiment analysis tool that boasts excellent contextual understanding and a vast amount of pre-trained data from big texts. While applying BERT in sentiment analysis, the steps comprise data cleaning, model optimization and evaluation of the performance of the model.

3.2.1. Data preprocessing. Important tokens for data preparation are [CLS] (classification token) and [SEP] (separator token), which are accepted by the tokenizer in BERT. It ensures that the input is appropriately formatted for BERT and tokenizes text into wordpieces.

• Tokenization: Utilizing BERT's tokenizer to divide text into smaller units (tokens).

• Addition of Special Tokens: Inserting [SEP] at the end of the input sequence and [CLS] at the start.

• Padding and Truncation: Modifying the tokenized sequences' length to guarantee consistency between batches.

3.2.2. Model Fine-tuning. Using labeled datasets, the pre-trained model is adjusted to the particular sentiment analysis task in order to fine-tune BERT. as per Gunasekaran [7], this procedure consists of:

• Using the [CLS] Token: For classification tasks, the aggregate sequence representation is usually the ultimate hidden state of the [CLS] token.

• Training: Using sentiment-labeled data, the model is trained, and the weights are changed to reduce classification loss like cross-entropy loss.

• Optimization: To improve model performance, hyperparameters like learning rate, batch size, and number of epochs are adjusted.

3.2.3. Performance evaluation. Several evaluation indicators are used to determine how effective BERT is in sentiment analysis. Evaluation metrics include accuracy, F1 score, and confusion matrix [7]. These metrics help assess the effectiveness of BERT in sentiment analysis, providing insights into its strengths and weaknesses. Accuracy measures the percentage of correct predictions, F1 score balances precision and recall, and the confusion matrix provides a detailed breakdown of true and false positives and negatives.

3.3. Comparison of approaches

Several benefits and difficulties become apparent when contrasting BERT-based sentiment analysis with conventional techniques:

3.3.1. Rule-Based vs. BERT. BERT considerably surpasses rule-based techniques by capturing the context and intricacies of language, even though rule-based techniques are simpler to implement and require less training data. BERT’s ability to understand context and handle complex language structures makes it more effective in sentiment analysis. Rule-based methods often fail to account for the subtleties and variations in natural language, leading to lower accuracy [3].

3.3.2. Machine Learning vs. BERT. While traditional machine learning algorithms such as Naive Bayes and SVM are efficient, they frequently struggle to handle context and subtleties, an area in which BERT's deep learning capabilities shine [2]. BERT’s pre-training on large datasets allows it to capture more nuanced patterns and relationships in text data, leading to higher accuracy in sentiment analysis. Machine learning models often rely on feature engineering, which can be time-consuming and less effective in capturing the full complexity of language.

3.3.3. BERT versus Deep Learning. CNN and RNN models have proven to be the favorite choices adopted by NLP practitioners for sentiment analysis in deep learning, but the bidirectional approach and pre-trained information give BERT an upper hand in recognizing context and producing more accurate results. BERT’s ability to understand both past and future context in a sentence makes it superior to traditional deep learning models in sentiment analysis. CNNs and RNNs often require large amounts of labeled data for training, whereas BERT can leverage its extensive pre-training to achieve high performance with less labeled data [3]. BERT’s architecture also benefits from the advances in deep contextualized word representations introduced by Peters et al. [9], further enhancing its ability to capture nuanced meanings in sentiment analysis. BERT has driven sentiment analysis technology and set the standard for the industry.

3.4. Results of some key studies

Studies by Sun et al [10] demonstrate BERT's superior performance in sentiment analysis, highlighting its stability and accuracy. The results of the experiment showed improvements in precision and accuracy, when compared to the use of traditional methods. The progress took place after BERT was trained using data from a multitude of sources, which were labeled for sentiment information. BERT's competence in dealing with the complex components of sentiment analysis, like comprehending context and maximizing text features, was successfully validated through the research. The contribution of Alaparthi, S., and Mishra [4] describes the sentiment classification which is examined based on accuracy, precision, recall, and the F1 score parameter using the labeled dataset of 50,000 movie reviews from IMDB. The study’s two main conclusions are that the four sentiment analysis algorithms have relative effectiveness and a deep learning algorithm called BERT is undeniably superior to the others in text sentiment categorization.

4. Challenges and prospects

Using BERT in sentimental analysis has its challenges like the fact that, it doesn't give context, which causes misconceptions, especially when sarcasm and irony are involved. Managing multilingual and domain-specific data is difficult, and models must also adjust to distinct language traits and industry-specific jargon. BERT’s focus on preserving the ethics and privacy of the public complicates its application [7]. From these, it is evident that there is a need for additional study and development to raise the reliability and efficacy of SA approaches in a variety of real-world settings. Additionally, managing multilingual data involves adapting BERT to different languages and cultural contexts, which requires significant resources and expertise.

Ethical and privacy concerns are also significant challenges in sentiment analysis. The use of sentiment analysis tools in sensitive areas such as healthcare and finance raises questions about data privacy and ethical considerations. Ensuring that these tools are used responsibly and that personal data is protected is crucial for maintaining public trust.

Future research should address handling complex language features, developing robust multilingual models, and addressing ethical and privacy issues. Integrating sentiment analysis with other domains can lead to multidisciplinary breakthroughs [7]. For instance, sentiment analysis can be employed in healthcare to monitor patient feedback and enhance services, in politics to gauge public opinion, and in finance to assess market sentiment. Developing more robust and adaptable models will enhance the effectiveness of sentiment analysis across various fields.

One promising direction is the development of hybrid models that combine the strengths of rule-based, machine learning, and deep learning approaches. Such models could leverage the simplicity and interpretability of rule-based methods, the efficiency of machine learning algorithms, and the contextual understanding of deep learning models like BERT.

Another critical area for future research is the development of more sophisticated techniques for handling sarcasm, irony, and other complex linguistic phenomena. This may involve integrating sentiment analysis with other NLP tasks such as emotion detection and sarcasm detection to provide a more comprehensive understanding of textual data.

Finally, addressing the ethical and privacy implications of sentiment analysis is paramount. Future research should focus on developing techniques for anonymizing data, ensuring that sentiment analysis models do not perpetuate biases, and establishing guidelines for the responsible use of these tools. By overcoming these challenges, researchers can enhance the accuracy, dependability, and ethical standards of sentiment analysis, ensuring its ongoing utility as a tool for managing public opinion and improving various applications.

5. Conclusion

BERT has revolutionized the field of NLP with its bidirectional representations and innovative pre-training strategies, such as Next Sentence Prediction (NSP) and Masked Language Modeling (MLM). These advancements have set a new standard for sentiment analysis and other NLP tasks, enabling models to achieve unprecedented levels of accuracy and contextual understanding. BERT’s ability to capture nuanced patterns and relationships within text data has proven superior to traditional methods and other deep learning models, making it an ideal tool for sentiment analysis. However, despite these advancements, challenges remain. Handling complex language features like sarcasm and irony continues to be a significant hurdle. Additionally, managing multilingual data and addressing ethical and privacy concerns are crucial areas that require further attention. These challenges highlight the need for ongoing research to enhance BERT’s applicability across diverse contexts and domains.

To fully unlock BERT's potential, future research should focus on developing more robust and adaptable models. This includes integrating sentiment analysis with other domains, such as healthcare, finance, and social sciences, which could open new avenues for interdisciplinary applications and insights. Moreover, addressing ethical issues, particularly concerning data privacy and bias, is essential to ensure the responsible use of sentiment analysis tools. By overcoming these challenges through innovative research and the development of hybrid models that combine rule-based, machine learning, and deep learning approaches, BERT's application in sentiment analysis will continue to evolve. This will not only enhance its accuracy and reliability but also solidify its role as a transformative tool in understanding and interpreting human language across various contexts.

References

[1]. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova. (2018) Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805.

[2]. J. Howard and S. Ruder. (2018) Universal language model fine-tuning for text classification, arXiv preprint arXiv:1801.06146.

[3]. Y. Zhang and Q. Yang. (2020) A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 12, pp. 5586-5609.

[4]. S. Alaparthi and M. Mishra. (2021) BERT: A sentiment analysis odyssey. [J] Journal of Marketing Analytics, vol. 9, no. 2, pp. 118-126.

[5]. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. (2017) Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017), pp. 6000-6010.

[6]. A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever. (2018) Improving language understanding by generative pre-training, OpenAI, 2018.

[7]. K. P. Gunasekaran. (2023) Exploring sentiment analysis techniques in natural language processing: A comprehensive review, arXiv preprint arXiv:2305.14842, 2023.

[8]. Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le. (2019) XLNet: Generalized autoregressive pretraining for language understanding, Advances in Neural Information Processing Systems, vol. 32.

[9]. Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365. https://arxiv.org/pdf/1802.05365

[10]. C. Sun, L. Huang, and X. Qiu. (2019) Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence, arXiv preprint arXiv:1903.09588, 2019.

Cite this article

Su,Z. (2024). Applications of BERT in sentimental analysis. Applied and Computational Engineering,92,147-152.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 6th International Conference on Computing and Data Science

ISBN：978-1-83558-595-5(Print) / 978-1-83558-596-2(Online)

Editor：Alan Wang, Roman Bauer

Conference website: https://2024.confcds.org/

Conference date: 12 September 2024

Series: Applied and Computational Engineering

Volume number: Vol.92

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).