Research Article
Open access
Published on 19 December 2024
Download pdf
Yi,J. (2024). Research and Application Analysis on Key Problems of Automatic Speech Recognition for Dysarthria. Applied and Computational Engineering,115,110-116.
Export citation

Research and Application Analysis on Key Problems of Automatic Speech Recognition for Dysarthria

Jie Yi *,1,
  • 1 Institute of International Education, ZhengZhou University of Light Industry, Zhengzhou, China

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/2025.18486

Abstract

At present, automatic speech recognition technology has developed rapidly, and the recognition accuracy of Automatic Speech Recognition (ASR) based on deep learning has been very high. Therefore, the speech recognition problem of patients with dysarthria has been paid more attention in recent years. However, due to the particularity of dysarthria patients and the strong variability of their speech, the relevant available datasets are very scarce, and it is difficult to adapt to the current general recognition model. In order to promote the development and progress of this field, based on a large number of literature research, this paper summarizes the construction of data sets, the solution of speech variability, and the key issues and development trends of multi-feature speech data fusion. At the same time, it lists the current application of automatic speech recognition for dysarticulation in some fields. It is hoped that the need of social communication and intelligent life of patients with dysarthria who exist as a minority group can be solved as soon as possible.

Keywords

Dysarthria, Automatic Speech Recognition, Speech Variability, Speech Data Fusion

[1]. Kang, X. C., Dong, X. Y., Yao, D. F., Zhong, J. H. (2024). Research progress and prospects of speaker adaptation for dysarthria. Computer Science, 51(8): 11-19.

[2]. Shahamiri, S. R., Salim, S. S. B. (2014). Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Advanced Engineering Informatics, 28(1): 102-110.

[3]. Hawley, M. S., Enderby, P., Green, P., et al. (2007). A speech-controlled environmental control system for people with severe dysarthria. Medical Engineering & Physics, 29(5): 586-593.

[4]. Shahamiri, S. R. (2021). Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29: 852-861.

[5]. Markl, N. (2022). Language variation and algorithmic bias: understanding algorithmic bias in British English automatic speech recognition//Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 521-534.

[6]. Turrisi, R., Badino, L. (2022). Interpretable dysarthric speaker adaptation based on optimal-transport. arXiv preprint arXiv:2203.07143.

[7]. Geng, M., Xie, X., Ye, Z., et al. (2022). Speaker adaptation using spectro-temporal deep features for dysarthric and elderly speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30: 2597-2611.

[8]. Liu, X., Zhang, F., Hou, Z., et al. (2021). Self-supervised learning: Generative or contrastive. IEEE transactions on knowledge and data engineering, 35(1): 857-876.

[9]. Qian, Z., Xiao, K. (2023). A survey of automatic speech recognition for dysarthric speech. Electronics, 12(20): 4278.

[10]. Song, W., Zhang, Y. H. (2024). A review of research on speech recognition algorithms for dysarthria. Computer Engineering and Applications, 60(11): 62-74.

[11]. Marini, M., Viganò, M., Corbo, M., et al. (2021). IDEA: an Italian dysarthric speech database//2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 1086-1093.

[12]. Gandhi, A., Adhvaryu, K., Poria, S., et al. (2023). Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Information Fusion, 91: 424-444.

Cite this article

Yi,J. (2024). Research and Application Analysis on Key Problems of Automatic Speech Recognition for Dysarthria. Applied and Computational Engineering,115,110-116.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 5th International Conference on Signal Processing and Machine Learning

Conference website: https://2025.confspml.org/
ISBN:978-1-83558-789-8(Print) / 978-1-83558-790-4(Online)
Conference date: 12 January 2025
Editor:Stavros Shiaeles
Series: Applied and Computational Engineering
Volume number: Vol.115
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).