Detection of malicious websites across multiple classes using n-gram features and VGG based on URL analysis

Research Article
Open access

Detection of malicious websites across multiple classes using n-gram features and VGG based on URL analysis

Qichen Liu 1*
  • 1 The University of Sydney    
  • *corresponding author z12321a@mail.nwpu.edu.cn
Published on 23 October 2023 | https://doi.org/10.54254/2755-2721/18/20230965
ACE Vol.18
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-83558-027-1
ISBN (Online): 978-1-83558-028-8

Abstract

Due to the ubiquity of the internet, cyber-attacks implemented through websites have become a severe issue with high frequency and appreciable overall financial damage. Detecting malicious URLs has become one of the most common solutions to tackle this threat, which is widely applied in the market and researched. Inspired by relevant work on URL classification using n-gram techniques and convolutional neural networks in other research areas, a method for detecting malicious websites using n-gram statistical features of URLs and a VGG-style neural network has been developed, which aims to provide classification for multiple website classes with arbitrary URL input lengths. Experimental results show that the method proposed in this paper provides an average accuracy of 96.60% on the 5-class ISCX-URL2016 dataset and 96.33% on the 4-class Malicious URLs dataset, which is 1.5 times larger. A further comparison reveals that the accuracies are competitive with similar methods for binary classifications that also use either n-gram features or a VGG-based network.

Keywords:

URL, multi-class, n-gram features, VGG

Liu,Q. (2023). Detection of malicious websites across multiple classes using n-gram features and VGG based on URL analysis. Applied and Computational Engineering,18,66-72.
Export citation

References

[1]. Ortega O B, Segura J R. Protocolo básico de ciberseguridad para pymes[J]. Interfases, 2022 (016): 168-186.

[2]. Wang C, Chen Y. TCURL: Exploring hybrid transformer and convolutional neural network on phishing URL detection[J]. Knowledge-Based Systems, 2022, 258: 109955.

[3]. Sharif M H U, Mohammed M A. A literature review of financial losses statistics for cyber security and future trend[J]. World Journal of Advanced Research and Reviews, 2022, 15(1): 138-156.

[4]. Gupta B B, Arachchilage N A G, Psannis K E. Defending against phishing attacks: taxonomy of methods, current issues and future directions[J]. Telecommunication Systems, 2018, 67: 247-267.

[5]. Lakshmi V. Beginning Security with Microsoft Technologies[J]. Beginning Security with Microsoft Technologies, 2019.

[6]. Day G. Security in the Digital World: For the home user, parent, consumer and home office[M]. IT Governance Ltd, 2017.

[7]. Dong R, Zhang Y, Zhao J. How green are the streets within the sixth ring road of Beijing? An analysis based on tencent street view pictures and the green view index[J]. International journal of environmental research and public health, 2018, 15(7): 1367.

[8]. Wang L, Guo S, Huang W, et al. Places205-vggnet models for scene recognition[J]. arXiv preprint arXiv:1508.01667, 2015.

[9]. Vecile S, Lacroix K, Grolinger K, et al. Malicious and Benign URL Dataset Generation Using Character-Level LSTM Models[C]//2022 IEEE Conference on Dependable and Secure Computing (DSC). IEEE, 2022: 1-8.

[10]. Ren F, Jiang Z, Liu J. A bi-directional LSTM model with attention for malicious URL detection[C]//2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). IEEE, 2019, 1: 300-305.

[11]. Bozkir A S, Dalgic F C, Aydos M. GramBeddings: A New Neural Network for URL Based Identification of Phishing Web Pages Through N-gram Embeddings[J]. Computers & Security, 2023, 124: 102964.

[12]. Alshingiti Z, Alaqel R, Al-Muhtadi J, et al. A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN[J]. Electronics, 2023, 12(1): 232.

[13]. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.

[14]. Li J, Wang D, Zhao C, et al. MUI-VB: Malicious URL Identification Model Combining VGG and Bi-LSTM[C]//Proceedings of the 2022 3rd International Conference on Control, Robotics and Intelligent System. 2022: 141-148.

[15]. Korkmaz M, Kocyigit E, Sahingoz O K, et al. Phishing web page detection using N-gram features extracted from URLs[C]//2021 3rd International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA). IEEE, 2021: 1-6.

[16]. Jolliffe I T, Cadima J. Principal component analysis: a review and recent developments[J]. Philosophical transactions of the royal society A: Mathematical, Physical and Engineering Sciences, 2016, 374(2065): 20150202.

[17]. Maas A L, Hannun A Y, Ng A Y. Rectifier nonlinearities improve neural network acoustic models[C]//Proc. icml. 2013, 30(1): 3.

[18]. Url T. Gesamtwirtschaftliche Auswirkungen der Exportgarantien in Österreich[J]. WIFO Studies, 2016.

[19]. Johnson C, Khadka B, Basnet R B, et al. Towards Detecting and Classifying Malicious URLs Using Deep Learning[J]. J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl., 2020, 11(4): 31-48.


Cite this article

Liu,Q. (2023). Detection of malicious websites across multiple classes using n-gram features and VGG based on URL analysis. Applied and Computational Engineering,18,66-72.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 5th International Conference on Computing and Data Science

ISBN:978-1-83558-027-1(Print) / 978-1-83558-028-8(Online)
Editor:Marwan Omar, Roman Bauer, Alan Wang
Conference website: https://2023.confcds.org/
Conference date: 14 July 2023
Series: Applied and Computational Engineering
Volume number: Vol.18
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[1]. Ortega O B, Segura J R. Protocolo básico de ciberseguridad para pymes[J]. Interfases, 2022 (016): 168-186.

[2]. Wang C, Chen Y. TCURL: Exploring hybrid transformer and convolutional neural network on phishing URL detection[J]. Knowledge-Based Systems, 2022, 258: 109955.

[3]. Sharif M H U, Mohammed M A. A literature review of financial losses statistics for cyber security and future trend[J]. World Journal of Advanced Research and Reviews, 2022, 15(1): 138-156.

[4]. Gupta B B, Arachchilage N A G, Psannis K E. Defending against phishing attacks: taxonomy of methods, current issues and future directions[J]. Telecommunication Systems, 2018, 67: 247-267.

[5]. Lakshmi V. Beginning Security with Microsoft Technologies[J]. Beginning Security with Microsoft Technologies, 2019.

[6]. Day G. Security in the Digital World: For the home user, parent, consumer and home office[M]. IT Governance Ltd, 2017.

[7]. Dong R, Zhang Y, Zhao J. How green are the streets within the sixth ring road of Beijing? An analysis based on tencent street view pictures and the green view index[J]. International journal of environmental research and public health, 2018, 15(7): 1367.

[8]. Wang L, Guo S, Huang W, et al. Places205-vggnet models for scene recognition[J]. arXiv preprint arXiv:1508.01667, 2015.

[9]. Vecile S, Lacroix K, Grolinger K, et al. Malicious and Benign URL Dataset Generation Using Character-Level LSTM Models[C]//2022 IEEE Conference on Dependable and Secure Computing (DSC). IEEE, 2022: 1-8.

[10]. Ren F, Jiang Z, Liu J. A bi-directional LSTM model with attention for malicious URL detection[C]//2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). IEEE, 2019, 1: 300-305.

[11]. Bozkir A S, Dalgic F C, Aydos M. GramBeddings: A New Neural Network for URL Based Identification of Phishing Web Pages Through N-gram Embeddings[J]. Computers & Security, 2023, 124: 102964.

[12]. Alshingiti Z, Alaqel R, Al-Muhtadi J, et al. A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN[J]. Electronics, 2023, 12(1): 232.

[13]. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.

[14]. Li J, Wang D, Zhao C, et al. MUI-VB: Malicious URL Identification Model Combining VGG and Bi-LSTM[C]//Proceedings of the 2022 3rd International Conference on Control, Robotics and Intelligent System. 2022: 141-148.

[15]. Korkmaz M, Kocyigit E, Sahingoz O K, et al. Phishing web page detection using N-gram features extracted from URLs[C]//2021 3rd International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA). IEEE, 2021: 1-6.

[16]. Jolliffe I T, Cadima J. Principal component analysis: a review and recent developments[J]. Philosophical transactions of the royal society A: Mathematical, Physical and Engineering Sciences, 2016, 374(2065): 20150202.

[17]. Maas A L, Hannun A Y, Ng A Y. Rectifier nonlinearities improve neural network acoustic models[C]//Proc. icml. 2013, 30(1): 3.

[18]. Url T. Gesamtwirtschaftliche Auswirkungen der Exportgarantien in Österreich[J]. WIFO Studies, 2016.

[19]. Johnson C, Khadka B, Basnet R B, et al. Towards Detecting and Classifying Malicious URLs Using Deep Learning[J]. J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl., 2020, 11(4): 31-48.