
Research and Application Analysis on Key Problems of Automatic Speech Recognition for Dysarthria
- 1 Institute of International Education, ZhengZhou University of Light Industry, Zhengzhou, China
* Author to whom correspondence should be addressed.
Abstract
At present, automatic speech recognition technology has developed rapidly, and the recognition accuracy of Automatic Speech Recognition (ASR) based on deep learning has been very high. Therefore, the speech recognition problem of patients with dysarthria has been paid more attention in recent years. However, due to the particularity of dysarthria patients and the strong variability of their speech, the relevant available datasets are very scarce, and it is difficult to adapt to the current general recognition model. In order to promote the development and progress of this field, based on a large number of literature research, this paper summarizes the construction of data sets, the solution of speech variability, and the key issues and development trends of multi-feature speech data fusion. At the same time, it lists the current application of automatic speech recognition for dysarticulation in some fields. It is hoped that the need of social communication and intelligent life of patients with dysarthria who exist as a minority group can be solved as soon as possible.
Keywords
Dysarthria, Automatic Speech Recognition, Speech Variability, Speech Data Fusion
[1]. Kang, X. C., Dong, X. Y., Yao, D. F., Zhong, J. H. (2024). Research progress and prospects of speaker adaptation for dysarthria. Computer Science, 51(8): 11-19.
[2]. Shahamiri, S. R., Salim, S. S. B. (2014). Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Advanced Engineering Informatics, 28(1): 102-110.
[3]. Hawley, M. S., Enderby, P., Green, P., et al. (2007). A speech-controlled environmental control system for people with severe dysarthria. Medical Engineering & Physics, 29(5): 586-593.
[4]. Shahamiri, S. R. (2021). Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29: 852-861.
[5]. Markl, N. (2022). Language variation and algorithmic bias: understanding algorithmic bias in British English automatic speech recognition//Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 521-534.
[6]. Turrisi, R., Badino, L. (2022). Interpretable dysarthric speaker adaptation based on optimal-transport. arXiv preprint arXiv:2203.07143.
[7]. Geng, M., Xie, X., Ye, Z., et al. (2022). Speaker adaptation using spectro-temporal deep features for dysarthric and elderly speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30: 2597-2611.
[8]. Liu, X., Zhang, F., Hou, Z., et al. (2021). Self-supervised learning: Generative or contrastive. IEEE transactions on knowledge and data engineering, 35(1): 857-876.
[9]. Qian, Z., Xiao, K. (2023). A survey of automatic speech recognition for dysarthric speech. Electronics, 12(20): 4278.
[10]. Song, W., Zhang, Y. H. (2024). A review of research on speech recognition algorithms for dysarthria. Computer Engineering and Applications, 60(11): 62-74.
[11]. Marini, M., Viganò, M., Corbo, M., et al. (2021). IDEA: an Italian dysarthric speech database//2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 1086-1093.
[12]. Gandhi, A., Adhvaryu, K., Poria, S., et al. (2023). Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Information Fusion, 91: 424-444.
Cite this article
Yi,J. (2024). Research and Application Analysis on Key Problems of Automatic Speech Recognition for Dysarthria. Applied and Computational Engineering,115,110-116.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 5th International Conference on Signal Processing and Machine Learning
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).