Department of Information and Communications Engineering

Speech communication technology

Speech communication technology aims at describing, explaining and reproducing communication by speech.
Speech production

The focus of the team is on fundamental research questions of speech communication. Our research has always been characterized by its interdisciplinary nature. Joint research has been conducted across science boundaries, especially with physicians, brain researchers, phoneticians and mathematicians.  Some of the topics studied are application-oriented and have been investigated jointly with ICT industry.

The research topics are various, but all of them address speech in one form or another. The main topics of our research (both past and current) are:

  • analysis and parameterization of speech production
  • artificial bandwidth extension of speech
  • brain functions in speech perception
  • occupational voice care
  • robust feature extraction in speech and speaker recognition
  • spectral modelling of speech
  • speech-based biomarking of human health
  • speech intelligibility improvement
  • statistical parametric speech synthesis

The team has acquired funding from the Academy of Finland, the EU, Nokia, Huawei, Tekes and Aalto University.

Examples of our recent articles:

  • Farhad Javanmardi, Sudarsana Reddy Kadiri, Paavo Alku: Pre-trained models for detection and severity level classification of dysarthria from speech. Speech Communication, Vol. 158, Article 103047, 2024.
  • Farhad Javanmardi, Sudarsana Reddy Kadiri, Paavo Alku: Exploring the impact of fine-tuning the Wav2vec2 model in database-independent detection of dysarthric speech. IEEE Journal of Biomedical and Health Informatics, Vol. 28, Issue 8, pp. 4951-4962, 2024.
  • Mittapalle Kiran Reddy, Paavo Alku: Classification of phonation types in singing voice using wavelet scattering network-based features. JASA Express Letters, Vol. 4, Issue 6, Article 065201, 2024.
  • Anne-Maria Laukkanen, Sudarsana Reddy Kadiri, Shrikanth Narayanan, Paavo Alku. Can a machine distinguish high and low amount of social creak in speech? Journal of Voice, 2024 (In press).
  • Madhu Keerthana Yagnavajjula, Kiran Reddy Mittapalle, Paavo Alku, Sreenivasa Rao, Pabitra Mitra: Automatic classification of neurological voice disorders using wavelet scattering features. Speech Communication, Vol. 157, Article 103040, 2024
  • Paavo Alku, Manila Kodali, Laura Laaksonen, Sudarsana Reddy Kadiri: AVID: A speech database for machine learning studies on vocal intensity. Speech Communication, Vol. 157, Article 103039, 2024.
  • Mittapalle Kiran Reddy, Yagnavajjula Madhu Keerthana, Paavo Alku: Classification of functional dysphonia using the tunable Q wavelet transform. Speech Communication, Vol. 155, Article 102989, 2023.
  • Sudarsana Reddy Kadiri, Farhad Javanmardi, Paavo Alku: Investigation of self-supervised pre-trained models for classification of voice quality from speech and neck surface accelerometer signals. Computer Speech and Language, Vol. 83, Article 101550, 2023.
  • Manila Kodali, Sudarsana Reddy Kadiri, Paavo Alku: Automatic classification of the severity level of Parkinson’s disease: A comparison of speaking tasks, features, and classifiers. Computer Speech and Language, Vol. 83, Article 101548, 2023.
  • Paavo Alku, Sudarsana Reddy Kadiri, Dhananjaya Gowda: Refining a deep learning-based formant tracker using linear prediction methods. Computer Speech and Language, Vol. 81, Article 101515, 2023.
  • Saska Tirronen, Sudarsana Reddy Kadiri, Paavo Alku: Hierarchical multi-class classification of voice disorders using self-supervised models and glottal features. IEEE Open Journal of Signal Processing, Vol. 4, pp. 80-88, 2023.
  • Mittapalle Kiran Reddy, Paavo Alku: Exemplar-based sparse representations for detection of Parkinson’s disease from speech. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 1386-1396, 2023.
  • Sudarsana Reddy Kadiri, Paavo Alku, Bayya Yegnanarayana: Analysis of instantaneous frequency components of speech signals for epoch extraction. Computer Speech and Language, Vol. 78, Article 101443, 2023.
  • Yuanyuan Liu, Mittapalle Kiran Reddy, Nelly Penttilä, Tiina Ihalainen, Paavo Alku, Okko Räsänen: Automatic assessment of Parkinson’s disease using speech representations of phonation and articulation. IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 31, pp. 242-255, 2023.
  • Mittapalle Kiran Reddy, Yagnavajjula Madhu Keerthana, Paavo Alku: End-to-end pathological speech detection using wavelet scattering network. IEEE Signal Processing Letters, Vol. 29, pp. 1863-1867, 2022.
  • Mittapalle Kiran Reddy, Hilla Pohjalainen, Pyry Helkkula, Kasimir Kaitue, Mikko Minkkinen, Heli Tolppanen, Tuomo Nieminen, Paavo Alku: Glottal flow characteristics in vowels produced by speakers with heart failure. Speech Communication, Vol. 137, pp. 35-43, 2022.
  • Sudarsana Reddy Kadiri, Paavo Alku, Bayya Yegnanarayana: Extraction and utilization of excitation information of speech: A review. Proceedings of the IEEE, Vol. 109, Issue 12, pp. 1920-2941, 2021.

The research team is led by Professor Paavo Alku.

  • Published:
  • Updated: