Research Article
Open access
Published on 8 November 2024
Download pdf
Xu,C. (2024). Neural Networks for Audio Classification: Multi-scale CNN-LSTM Approach to Animal Sound Recognition. Applied and Computational Engineering,89,172-177.
Export citation

Neural Networks for Audio Classification: Multi-scale CNN-LSTM Approach to Animal Sound Recognition

Cong Xu *,1,
  • 1 Hong Kong University, Hong Kong, China

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/89/20241128

Abstract

This research explores the application of neural networks, specifically CNN-LSTM models, for classifying sound signals from dogs, frogs, and cats, selected from the ESC-50 dataset. The sound data was preprocessed using Mel-frequency cepstral coefficients (MFCCs) and augmented through time stretching, pitch shifting, and noise addition to enhance model generalization in varied acoustic environments. We compared two deep learning models: a traditional CNN-LSTM and an improved version with multi-scale feature extraction, allowing for capturing both short-term and long-term sound patterns. Our findings show that the multi-scale CNN-LSTM architecture outperforms the traditional CNN-LSTM, achieving a test accuracy of 86.11% compared to 80.56%. These results highlight the effectiveness of multi-scale feature extraction for handling complex audio signals. This research offers valuable insights into bioacoustics and has broader applications in areas such as environmental sound monitoring, wildlife preservation, and animal behavior analysis.

Keywords

Audio classification, CNN-LSTM, Multi-scale convolution, Animal sound recognition, MFCC.

[1]. K. J. Piczak. ESC: Dataset for Environmental Sound Classification. Proceedings of the 23rd Annual ACM Conference on Multimedia, Brisbane, Australia, 2015.

[2]. McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., & Nieto, O. (2015). librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference (pp. 18-25).

[3]. Gourisaria, M.K., Agrawal, R., Sahni, M. et al. Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques. Discov Internet Things 4, 1 (2024). https://doi.org/10.1007/s43926-023-00049-y

[4]. Swapna, G., Kp, S., & Vinayakumar, R. (2018). Automated detection of diabetes using CNN and CNN-LSTM network and heart rate signals. Procedia computer science, 132, 1253-1262.

[5]. Stowell, D., & Plumbley, M. D. (2014). Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ, 2, e488.

[6]. Shanmuga Sundari, M., Priya, K. S. S., Haripriya, N., & Sree, V. N. (2023, March). Music genre classification using librosa implementation in convolutional neural network. In Proceedings of Fourth International Conference on Computer and Communication Technologies: IC3T 2022 (pp. 583-591). Singapore: Springer Nature Singapore.

[7]. Prabakaran, D., & Sriuppili, S. (2021). Speech processing: MFCC based feature extraction techniques-an investigation. In Journal of Physics: Conference Series (Vol. 1717, No. 1, p. 012009). IOP Publishing.

[8]. Ko, T., Peddinti, V., Povey, D., & Khudanpur, S. (2015, September). Audio augmentation for speech recognition. In Interspeech (Vol. 2015, p. 3586).

[9]. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., ... & Zheng, X. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.

[10]. Dibbo, Sayanton V., et al. "Lcanets++: Robust audio classification using multi-layer neural networks with lateral competition." 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). IEEE, 2024.

Cite this article

Xu,C. (2024). Neural Networks for Audio Classification: Multi-scale CNN-LSTM Approach to Animal Sound Recognition. Applied and Computational Engineering,89,172-177.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 2nd International Conference on Functional Materials and Civil Engineering

Conference website: https://2024.conffmce.org/
ISBN:978-1-83558-605-1(Print) / 978-1-83558-606-8(Online)
Conference date: 23 August 2024
Editor:Ömer Burak İSTANBULLU, Alan Wang
Series: Applied and Computational Engineering
Volume number: Vol.89
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).