Single-loss hash image retrieval method based on improved visual transformer

Research Article
Open access

Single-loss hash image retrieval method based on improved visual transformer

Huanjie Pei 1* , Zhijie Wang 2
  • 1 Donghua University    
  • 2 Donghua University    
  • *corresponding author woshipeihuanjie@163.com
Published on 26 February 2024 | https://doi.org/10.54254/2755-2721/43/20230850
ACE Vol.43
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-83558-311-1
ISBN (Online): 978-1-83558-312-8

Abstract

Deep hashing methods have gained popularity in image retrieval due to their advantages such as low storage requirements and high efficiency. However, existing deep hashing methods for large-scale image retrieval tasks suffer from issues including low discriminative power of binary hash codes, difficult optimization of losses, and low retrieval accuracy. This paper proposes a single-loss hash image retrieval method based on an improved visual transformer to address these issues. The proposed method utilizes a pre-trained Vision Transformer (ViT) on ImageNet as the backbone network, augmented with a hash coding layer to extract image features more comprehensively. Additionally, we design a single learning objective loss function that addresses the discriminative power of hash codes and quantization errors, thereby eliminating the complexity of adjusting various loss weights. Experimental evaluations on ImageNet100, NUS-WIDE, CIFAR10, and MS-COCO datasets demonstrate the superior performance of the proposed method compared to contemporary methods, indicating its adaptability to diverse data.

Keywords:

Hashing; Image Retrieval; Discriminative Power; Quantization Error

Pei,H.;Wang,Z. (2024). Single-loss hash image retrieval method based on improved visual transformer. Applied and Computational Engineering,43,300-306.
Export citation

References

[1]. Bengio Y, Nicholas Léonard, Courville A C. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation: arXiv 2013.

[2]. Hanjiang Lai, Yan Pan, Ye Liu, and Shuicheng Yan. Simultaneous feature learning and hash coding withdeep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition,pages 3270–3278, 2015.

[3]. Yair Weiss, Antonio Torralba, and Rob Fergus. Spectral hashing. In D. Koller, D. Schuurmans, Y. Bengio,and L. Bottou, editors, Advances in Neural Information Processing Systems, volume 21. Curran Associates,Inc., 2009.

[4]. Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE transactions on pattern analysis and machine intelligence, 35(12):2916–2929, 2012.

[5]. Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC ’98, page 604–613, New Y ork, NY, USA, 1998. Association for Computing Machinery.

[6]. Brian Kulis and Kristen Grauman. Kernelized locality-sensitive hashing for scalable image search. In 2009 IEEE 12th international conference on computer vision, pages 2130–2137. IEEE, 2009.

[7]. BL Lu, L Zhang, J Kwok. Proceedings of the 18th international conference on Neural Information Processing - Volume Part II[C]// International Conference on Neural Information Processing. Springer-Verlag, 2011.

[8]. Cao Z, Long M , Wang J , et al. HashNet: Deep Learning to Hash by Continuation[J]. IEEE Computer Society, 2017.

[9]. Su S, Tian Y . Greedy Hash: Towards Fast Optimization for Accurate Hash Coding in CNN[C]// Neural Information Processing Systems. 2018.

[10]. Zheng X, Zhang Y, Lu X. Deep Balanced Discrete Hashing for Image Retrieval[J]. Neurocomputing, 2020, 403(3).

[11]. Dubey S R, Singh S K, Chu W T. Vision Transformer Hashing for Image Retrieval[J]. arXiv e-prints, 2021.

[12]. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale[C]// International Conference on Learning Representations. 2021.

[13]. Chen X, Yan B, Zhu J, et al. High-Performance Transformer Tracking[J]. 2022.

[14]. Zhang T, Zhu L, Zhao Q, et al. Neural Networks Weights Quantization: Target None-retraining Ternary (TNT)[J]. 2019.

[15]. Kulis B, Darrell T. Learning to Hash with Binary Reconstructive Embeddings[C]// International Conference on Neural Information Processing Systems. Curran Associates Inc. 2009.


Cite this article

Pei,H.;Wang,Z. (2024). Single-loss hash image retrieval method based on improved visual transformer. Applied and Computational Engineering,43,300-306.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 2023 International Conference on Machine Learning and Automation

ISBN:978-1-83558-311-1(Print) / 978-1-83558-312-8(Online)
Editor:Mustafa İSTANBULLU
Conference website: https://2023.confmla.org/
Conference date: 18 October 2023
Series: Applied and Computational Engineering
Volume number: Vol.43
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[1]. Bengio Y, Nicholas Léonard, Courville A C. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation: arXiv 2013.

[2]. Hanjiang Lai, Yan Pan, Ye Liu, and Shuicheng Yan. Simultaneous feature learning and hash coding withdeep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition,pages 3270–3278, 2015.

[3]. Yair Weiss, Antonio Torralba, and Rob Fergus. Spectral hashing. In D. Koller, D. Schuurmans, Y. Bengio,and L. Bottou, editors, Advances in Neural Information Processing Systems, volume 21. Curran Associates,Inc., 2009.

[4]. Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE transactions on pattern analysis and machine intelligence, 35(12):2916–2929, 2012.

[5]. Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC ’98, page 604–613, New Y ork, NY, USA, 1998. Association for Computing Machinery.

[6]. Brian Kulis and Kristen Grauman. Kernelized locality-sensitive hashing for scalable image search. In 2009 IEEE 12th international conference on computer vision, pages 2130–2137. IEEE, 2009.

[7]. BL Lu, L Zhang, J Kwok. Proceedings of the 18th international conference on Neural Information Processing - Volume Part II[C]// International Conference on Neural Information Processing. Springer-Verlag, 2011.

[8]. Cao Z, Long M , Wang J , et al. HashNet: Deep Learning to Hash by Continuation[J]. IEEE Computer Society, 2017.

[9]. Su S, Tian Y . Greedy Hash: Towards Fast Optimization for Accurate Hash Coding in CNN[C]// Neural Information Processing Systems. 2018.

[10]. Zheng X, Zhang Y, Lu X. Deep Balanced Discrete Hashing for Image Retrieval[J]. Neurocomputing, 2020, 403(3).

[11]. Dubey S R, Singh S K, Chu W T. Vision Transformer Hashing for Image Retrieval[J]. arXiv e-prints, 2021.

[12]. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale[C]// International Conference on Learning Representations. 2021.

[13]. Chen X, Yan B, Zhu J, et al. High-Performance Transformer Tracking[J]. 2022.

[14]. Zhang T, Zhu L, Zhao Q, et al. Neural Networks Weights Quantization: Target None-retraining Ternary (TNT)[J]. 2019.

[15]. Kulis B, Darrell T. Learning to Hash with Binary Reconstructive Embeddings[C]// International Conference on Neural Information Processing Systems. Curran Associates Inc. 2009.