Research Article
Open access
Published on 19 March 2024
Download pdf
Qiao,Y. (2024). Spam email classification based on SVM, Transformer and Naive Bayes. Applied and Computational Engineering,48,161-167.
Export citation

Spam email classification based on SVM, Transformer and Naive Bayes

Yijun Qiao *,1,
  • 1 Harbin Institute of Technology (Weihai)

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/48/20241337

Abstract

As a matter of fact, with the booming of information and big data, there are too many unwanted e-mails called “spam” sent to people’s e-mail account in recent years. On this basis, it could lead to a lot of problems including occupying the public resources, causing financial loss and so on. With this in mind, spam filtering technique is in need to solve the problem and address the issues. In reality, based on previous analysis, machine learning methods are very effective in spam filtering. On this basis, this study carried out background research of machine learning algorithms in spam filtering, and find the spam e-mail dataset of Kaggle.com, and implement 3 algorithms on the dataset. According to the analysis, Transformer and SVM work better on the dataset, and SVM is the best. At the same time, the current limitations are discussed as well. In addition, the prospects are demonstrated in the meantime.

Keywords

Spam email, machine learning, Naive Bayes, Transformer, SVM

[1]. Emmanuel G D, Joseph S B, Haruna C, et al. 2019 Machine learning for email spam filtering: review, approaches and open research problems.Heliyon vol 5 p e01802.

[2]. Wu J 2008 Reduce the harm of network use, improve the efficiency of network application —— On the characteristics of spam and anti-spam technology.Huazhong Architecture vol 26(5) 48-4952,

[3]. Awad W A and Elseuofi S M 2011 Machine learning methods for spam e-mail classification.International Journal of Computer Science & Information Technology (IJCSIT) vol 3 p 1.

[4]. Hamed S and Ahmad C 2021 Cloud E- mail Security:An Accurate E- mail Spam Classification Based on EnhancedBinary Differential Evolution Algorithm[J].Computing vol 59(3) pp 5634-5648.

[5]. Vinitha H 2019 MapReduce mRMR:RandomForests- Based Email Spam Classification in Distributed Envi-ronment Computing[J].vol 735(3) pp 241-253.

[6]. Sumathi P 2021 Cognition based spam mailtext analysis using combined approach of deep neural networkclassifier and random forest[J].Computing vol 12(6) pp 5721-5731

[7]. Stehman S V 1997 Selecting and interpreting measures of thematic classification accuracy[J].Remote Sensing of Environment Remote Sensing of Environment vol 62(1) pp 77-89.

[8]. Metz C E 1978 Basic principles of ROC analysis[J].Seminars in Nuclear Medicine.vol 8(4) pp 283-298.

[9]. Olson D L and Delen D 2008 Advanced data mining techniques, Springer Science & Business Media, Berlin.

[10]. Taha A A and Hanbury A 2015 Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool[J].BMC Medical Imaging vol 15 p 29.

[11]. Kaiying Z 2021 Study of Chinese spam filtering Based on Improved Naive Bayesian Classification Algorithm[J].Journal of Physics: Conference Series vol 4 p 2083.

[12]. Ursula L, Dianne C, Stuart L and Burning S 2022 Reversing the Curse of Dimensionality in the Visualization of High-Dimensional Data[J].Journal of Computational and Graphical Statistics vol 31 p 1

Cite this article

Qiao,Y. (2024). Spam email classification based on SVM, Transformer and Naive Bayes. Applied and Computational Engineering,48,161-167.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 4th International Conference on Signal Processing and Machine Learning

Conference website: https://www.confspml.org/
ISBN:978-1-83558-336-4(Print) / 978-1-83558-338-8(Online)
Conference date: 15 January 2024
Editor:Marwan Omar
Series: Applied and Computational Engineering
Volume number: Vol.48
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).