Yang SH, Chung M (2019)
Publication Type: Conference contribution
Publication year: 2019
Publisher: International Speech Communication Association
Book Volume: 2019-September
Pages Range: 1881-1885
Conference Proceedings Title: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOI: 10.21437/Interspeech.2019-1478
Self-imitating feedback is an effective and learner-friendly method for non-native learners in Computer-Assisted Pronunciation Training. Acoustic characteristics in native utterances are extracted and transplanted onto learner's own speech input, and given back to the learner as a corrective feedback. Previous works focused on speech conversion using prosodic transplantation techniques based on PSOLA algorithm. Motivated by the visual differences found in spectrograms of native and non-native speeches, we investigated applying GAN to generate self-imitating feedback by utilizing generator's ability through adversarial training. Because this mapping is highly under-constrained, we also adopt cycle consistency loss to encourage the output to preserve the global structure, which is shared by native and non-native utterances. Trained on 97,200 spectrogram images of short utterances produced by native and non-native speakers of Korean, the generator is able to successfully transform the non-native spectrogram input to a spectrogram with properties of self-imitating feedback. Furthermore, the transformed spectrogram shows segmental corrections that cannot be obtained by prosodic transplantation. Perceptual test comparing the self-imitating and correcting abilities of our method with the baseline PSOLA method shows that the generative approach with cycle consistency loss is promising.
APA:
Yang, S.H., & Chung, M. (2019). Self-imitating feedback generation using GAN for computer-assisted pronunciation training. In Gernot Kubin, Thomas Hain, Bjorn Schuller, Dina El Zarka, Petra Hodl (Eds.), Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 1881-1885). Graz, AT: International Speech Communication Association.
MLA:
Yang, Seung Hee, and Minhwa Chung. "Self-imitating feedback generation using GAN for computer-assisted pronunciation training." Proceedings of the 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019, Graz Ed. Gernot Kubin, Thomas Hain, Bjorn Schuller, Dina El Zarka, Petra Hodl, International Speech Communication Association, 2019. 1881-1885.
BibTeX: Download