zur Hauptnavigation springen zum Inhaltsbereich springen

BayWISS-Kolleg Health www.baywiss.de

Projekte im Verbundkolleg Gesundheit

© eliola, Pixabay

Whispered and alaryngeal speech conversion

Whispered utterances and alaryngeal speech, i.e., the speech produced by a substitute voice after surgical larynx removal (laryngectomy), have similar characteristics. Primarily due to the absence of pitch, whispered and alaryngeal speech is perceived as less natural and intelligible than regular laryngeal voice production.

While whispering is a common method of speech communication that is usually applied only for a limited period (e.g. in areas where loud noises are prohibited), surgical treatments necessitated by laryngeal cancer, force the affected individuals to use the substitute voice as their permanent method of speech communication. In everyday situations, the properties of alaryngeal speech can become obstructive, ultimately resulting in a lower quality of life.

Recently, deep learning methods have been successfully employed to recover prosodic information from whispered speech signals. These methods usually combine a vocoder for analysis and synthesis of the speech signal with deep neural networks for the prediction of speech features.
This work aims to develop more efficient systems by integrating the transformation of whispered/alaryngeal inputs into voiced outputs directly into the vocoder, thereby removing the need for a separate feature predictor.
The developed systems are evaluated based on their ability to reconstruct voiced speech, and to create
realistic pitch contours. The goal is not only to apply speech conversion techniques to whispered signals, but also to recordings obtained from patients who underwent laryngectomy.
The methods applied in this work are generative models, such as Generative Adversarial Networks (GANs). These systems are not specifically designed for the transformation of speech. Therefore, adjustments with respect to their architecture and training criteria need to be made.

MEMBER IN THE JOINT ACADEMIC PARTNERSHIP

since

Publikationen

Bayerl, S. P., Wagner, D., Baumann, I., Hönig, F., Bocklet, T.,Nöth, E. & K. Riedhammer (2023):
A Stutter Seldom Comes Alone – Cross-Corpus Stuttering Detection as a Multi-label Problem. Proc.INTERSPEECH 2023, 1538–1542.

Bayerl, S. P., Wagner, D., Baumann, I., Bocklet, T. & K. Riedhammer (2023):
Detecting Vocal Fatigue with Neural Embeddings. Journal of Voice.

Wagner, D., Baumann, I., Braun, F., Bayerl, S. P., Nöth, E., Riedhammer, K. & T. Bocklet (2023):
Multi-class Detection of Pathological Speech with Latent Features: How does it perform onunseen data? Proc. INTERSPEECH 2023, 2318–2322.

Baumann, I., Wagner, D., Braun, F., Bayerl, S. P., Nöth, E., Riedhammer, K. & T. Bocklet (2023):
Influence of Utterance and Speaker Characteristics on the Classification of Children with CleftLip and Palate. Proc. INTERSPEECH 2023, 4648–4652.

Wagner, D., Bayerl, S. P., & T. Bocklet (2023):
Implementing Easy-to-Use Recipes for the Switchboard Benchmark. In C. Draxler(Ed.), Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2023 (pp. 150–157). TUDpress, Dresden.

Wagner, D., Churchill, A., Sigtia, S., Georgiou, P., Mirsamadi,M., Mishra, A. & E. Marchi (2023):
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models. Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III) at NeurIPS 2023.

Riedhammer, K., Baumann, I., Bayerl, S. P., Bocklet, T., Braun, F. & D. Wagner (2023):
Medical Speech Processing for Diagnosisand Monitoring: Clinical Use Cases. Fortschritte Der Akustik - DAGA2023.

Wagner, D., Baumann, I., Bayerl, S. P., Riedhammer, K. & T. Bocklet (2023):
Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments. 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

Wagner, D., Bayerl, S. P., Maruri, H. C., & Bocklet, T. (2022). (2022):
Generative Models for Improved Naturalness Intelligibilityand Voicing of Whispered Speech. In 2022 IEEE Spoken LanguageTechnology Workshop (SLT).

Baumann, I., Wagner, D., Bayerl, S. P., & T. Bocklet (2022):
Nonwords Pronunciation Classification in Language Development Testsfor Preschool Children. In Proc. Interspeech 2022 (pp. 3643–3647).

Bayerl, S. P., Wagner, D., Nöth, E., & K. Riedhammer (2022):
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0. InProc. Interspeech 2022 (pp. 2868–2872).

Bayerl, S. P., Wagner, D., Nöth, E., Bocklet, T. und K. Riedhammer (2022):
The Influence of Dataset Partitioning on DysfluencyDetection Systems. In P. Sojka, A. Horák, I. Kopeček, & K. Pala(Eds.), Text, Speech, and Dialogue (pp. 423–436). SpringerInternational Publishing.

Wagner, D. (2019): Latent Representations of Transaction Network Graphs in Continuous Vector spaces as Features for Money Laundering Detection. In SKILL 2019 - Studierendenkonferenz Informatik (pp. 143–154 ). Gesellschaft für Informatik e.V.

Dominik Wagner

Dominik Wagner

Nuremberg Institute of Technology

Coordinator

Get in contact with us. We look forward to receiving your questions and suggestions on the Joint Academic Partnership Health.

Dr. Sabine Fütterer-Akili

Dr. Sabine Fütterer-Akili

Koordinatorin BayWISS-Verbundkolleg Gesundheit und BayWISS-Verbundkolleg Economics and Business