Self-imitating feedback generation using GAN for computer-assisted pronunciation training

Yang SH, Chung M (2019)


Publication Type: Conference contribution

Publication year: 2019

Publisher: International Speech Communication Association

Book Volume: 2019-September

Pages Range: 1881-1885

Conference Proceedings Title: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Event location: Graz AT

DOI: 10.21437/Interspeech.2019-1478

Abstract

Self-imitating feedback is an effective and learner-friendly method for non-native learners in Computer-Assisted Pronunciation Training. Acoustic characteristics in native utterances are extracted and transplanted onto learner's own speech input, and given back to the learner as a corrective feedback. Previous works focused on speech conversion using prosodic transplantation techniques based on PSOLA algorithm. Motivated by the visual differences found in spectrograms of native and non-native speeches, we investigated applying GAN to generate self-imitating feedback by utilizing generator's ability through adversarial training. Because this mapping is highly under-constrained, we also adopt cycle consistency loss to encourage the output to preserve the global structure, which is shared by native and non-native utterances. Trained on 97,200 spectrogram images of short utterances produced by native and non-native speakers of Korean, the generator is able to successfully transform the non-native spectrogram input to a spectrogram with properties of self-imitating feedback. Furthermore, the transformed spectrogram shows segmental corrections that cannot be obtained by prosodic transplantation. Perceptual test comparing the self-imitating and correcting abilities of our method with the baseline PSOLA method shows that the generative approach with cycle consistency loss is promising.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Yang, S.H., & Chung, M. (2019). Self-imitating feedback generation using GAN for computer-assisted pronunciation training. In Gernot Kubin, Thomas Hain, Bjorn Schuller, Dina El Zarka, Petra Hodl (Eds.), Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 1881-1885). Graz, AT: International Speech Communication Association.

MLA:

Yang, Seung Hee, and Minhwa Chung. "Self-imitating feedback generation using GAN for computer-assisted pronunciation training." Proceedings of the 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019, Graz Ed. Gernot Kubin, Thomas Hain, Bjorn Schuller, Dina El Zarka, Petra Hodl, International Speech Communication Association, 2019. 1881-1885.

BibTeX: Download