
Spam email classification based on SVM, Transformer and Naive Bayes
- 1 Harbin Institute of Technology (Weihai)
* Author to whom correspondence should be addressed.
Abstract
As a matter of fact, with the booming of information and big data, there are too many unwanted e-mails called “spam” sent to people’s e-mail account in recent years. On this basis, it could lead to a lot of problems including occupying the public resources, causing financial loss and so on. With this in mind, spam filtering technique is in need to solve the problem and address the issues. In reality, based on previous analysis, machine learning methods are very effective in spam filtering. On this basis, this study carried out background research of machine learning algorithms in spam filtering, and find the spam e-mail dataset of Kaggle.com, and implement 3 algorithms on the dataset. According to the analysis, Transformer and SVM work better on the dataset, and SVM is the best. At the same time, the current limitations are discussed as well. In addition, the prospects are demonstrated in the meantime.
Keywords
Spam email, machine learning, Naive Bayes, Transformer, SVM
[1]. Emmanuel G D, Joseph S B, Haruna C, et al. 2019 Machine learning for email spam filtering: review, approaches and open research problems.Heliyon vol 5 p e01802.
[2]. Wu J 2008 Reduce the harm of network use, improve the efficiency of network application —— On the characteristics of spam and anti-spam technology.Huazhong Architecture vol 26(5) 48-4952,
[3]. Awad W A and Elseuofi S M 2011 Machine learning methods for spam e-mail classification.International Journal of Computer Science & Information Technology (IJCSIT) vol 3 p 1.
[4]. Hamed S and Ahmad C 2021 Cloud E- mail Security:An Accurate E- mail Spam Classification Based on EnhancedBinary Differential Evolution Algorithm[J].Computing vol 59(3) pp 5634-5648.
[5]. Vinitha H 2019 MapReduce mRMR:RandomForests- Based Email Spam Classification in Distributed Envi-ronment Computing[J].vol 735(3) pp 241-253.
[6]. Sumathi P 2021 Cognition based spam mailtext analysis using combined approach of deep neural networkclassifier and random forest[J].Computing vol 12(6) pp 5721-5731
[7]. Stehman S V 1997 Selecting and interpreting measures of thematic classification accuracy[J].Remote Sensing of Environment Remote Sensing of Environment vol 62(1) pp 77-89.
[8]. Metz C E 1978 Basic principles of ROC analysis[J].Seminars in Nuclear Medicine.vol 8(4) pp 283-298.
[9]. Olson D L and Delen D 2008 Advanced data mining techniques, Springer Science & Business Media, Berlin.
[10]. Taha A A and Hanbury A 2015 Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool[J].BMC Medical Imaging vol 15 p 29.
[11]. Kaiying Z 2021 Study of Chinese spam filtering Based on Improved Naive Bayesian Classification Algorithm[J].Journal of Physics: Conference Series vol 4 p 2083.
[12]. Ursula L, Dianne C, Stuart L and Burning S 2022 Reversing the Curse of Dimensionality in the Visualization of High-Dimensional Data[J].Journal of Computational and Graphical Statistics vol 31 p 1
Cite this article
Qiao,Y. (2024). Spam email classification based on SVM, Transformer and Naive Bayes. Applied and Computational Engineering,48,161-167.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 4th International Conference on Signal Processing and Machine Learning
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).