Research Article
Open access
Published on 30 April 2024
Download pdf
Liu,H. (2024). Dual attention-enhanced SSD: A novel deep learning model for object detection. Applied and Computational Engineering,57,26-39.
Export citation

Dual attention-enhanced SSD: A novel deep learning model for object detection

Haotian Liu *,1,
  • 1 Fudan University

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/57/20241308

Abstract

Object detection is a fundamental task in computer vision with significant implications across various applications, including autonomous driving, surveillance, and image understanding. The accurate and efficient detection of objects within images is crucial for enabling machines to interpret visual information and make informed decisions. In this paper, we present an enhanced version of the Single Shot MultiBox Detector (SSD) for object detection, leveraging the concept of dual attention mechanisms. Our proposed approach, named SSD-Dual Attention, integrates dual attention layers into the SSD framework. These dual attention layers are strategically positioned between feature maps and prediction convolutions, enhancing the model's ability to capture informative feature representations across a wide range of object scales and backgrounds. Experimental results on the PASCAL VOC 2007 and 2012 datasets validate the effectiveness of our approach. Notably, SSD-Dual Attention achieves an impressive mean Average Precision (mAP) of 78.1%, surpassing the performance of SSD models enhanced with attention mechanisms such as SSD-ECA, SSD-CBAM, SSD-Non-local attention and SSD-SE attention, as well as the original SSD. These results underscore the enhanced accuracy and precision of our object detection system, marking a substantial advancement in the field of computer vision. Code is available at https://github.com/AlexHunterLeo/Dual-attention-Enhanced-SSD-A-Novel-Deep-Learning-Model-for-Object-Detection

Keywords

Object Detection, SSD, Dual Attention, Position Attention, Channel Attention

[1]. 2011 Object Detection and Recognition (SpringerReference)

[2]. 1983 Computer vision (Computer Vision, Graphics, and Image Processing) vol 22, pp 410–411

[3]. Krizhevsky A, Sutskever I and Hinton G E 2017 ImageNet classification with deep convolutional neural networks (Commun. ACM) vol 60, pp 84–90

[4]. Heimberger M, Horgan J, Hughes C, McDonald J and Yogamani S 2017 Computer vision in automated parking systems: Design, implementation and challenges (Image and Vision Computing) vol 68, pp 88–101

[5]. Sagrebin M and Pauli J 2009 Real-Time Moving Object Detection for Video Surveillance (2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance) pp 31–36

[6]. Massoud Y and Laganiere R 2022 Learnable Fusion Mechanisms for Object Detection in Autonomous Vehicles

[7]. 2020 TT06 Computer Vision and Human-Machine Interaction in Industrial and Factory Automation (2020 25th IEEE International Conference on Emerging Technologies and Factory Automation) pp 502–504

[8]. Saikrishnan V and Karthikeyan M 2023 Automated Object Detection and Classification using Metaheuristics with Deep Learning on Surveillance Videos (2023 International Conference on Sustainable Computing and Data Communication Systems ) pp 29–34

[9]. Kogut G T and Trivedi M M 2001 Maintaining the identity of multiple vehicles as they travel through a video network (Proceedings 2001 IEEE Workshop on Multi-Object Tracking) pp 29–34

[10]. Ku B, Kim K and Jeong J 2022 Real-Time ISR-YOLOv4 Based Small Object Detection for Safe Shop Floor in Smart Factories (Electronics) vol 11, p 2348

[11]. Girshick R 2015 Fast R-CNN (2015 IEEE International Conference on Computer Vision) pp 1440–1448

[12]. Liu W, et al 2016 SSD: Single Shot MultiBox Detector. in Computer Vision (ECCV, 2016) ed Leibe B, Matas J, Sebe N and Welling M (Springer International Publishing, 2016) vol 9905, pp 21–37

[13]. Redmon J, Divvala S, Girshick R and Farhadi A 2015 You Only Look Once: Unified, Real-Time Object Detection

[14]. Fu J, et al 2018 Dual Attention Network for Scene Segmentation

[15]. Ren S, He K, Girshick R and Sun J 2015 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

[16]. Du L, Zhang R and Wang X 2020 Overview of two-stage object detection algorithms (J. Phys.: Conf. Ser.) 1544, 012033

[17]. Ouyang Z, Niu J, Liu Y and Guizani M 2020 Deep CNN-Based Real-Time Traffic Light Detector for Self-Driving Vehicles (IEEE Trans. on Mobile Comput) vol 19, pp 300–313

[18]. Hu J, Shen L, Albanie S, Sun G and Wu E 2017 Squeeze-and-Excitation Networks

[19]. Woo S, Park J, Lee J-Y and Kweon I S 2018 CBAM: Convolutional Block Attention Module

[20]. Wang Q, et al 2019 ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

[21]. Wang X, Girshick R, Gupta A and He K 2017 Non-local Neural Networks

[22]. Simonyan K and Zisserman A 2014 Very Deep Convolutional Networks for Large-Scale Image Recognition

[23]. He K, Zhang X, Ren S and Sun J 2015 Deep Residual Learning for Image Recognition

[24]. Selvaraju R R, et al 2016 Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

Cite this article

Liu,H. (2024). Dual attention-enhanced SSD: A novel deep learning model for object detection. Applied and Computational Engineering,57,26-39.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 6th International Conference on Computing and Data Science

Conference website: https://www.confcds.org/
ISBN:978-1-83558-393-7(Print) / 978-1-83558-394-4(Online)
Conference date: 12 September 2024
Editor:Alan Wang, Roman Bauer
Series: Applied and Computational Engineering
Volume number: Vol.57
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).