Das D, Khan Q, Cremers D (2022)
Publication Type: Conference contribution
Publication year: 2022
Publisher: IEEE Computer Society
Pages Range: 1716-1720
Conference Proceedings Title: Proceedings - International Conference on Image Processing, ICIP
Event location: Bordeaux, FRA
ISBN: 9781665496209
DOI: 10.1109/ICIP46576.2022.9897657
In this paper, we propose Ventriloquist-Net: A Talking Head Generation model that uses only a speech segment and a single source face image. It places emphasis on emotive expressions. Cues for generating these expressions are implicitly inferred from the speech clip only. We formulate our framework to comprise of independently trained modules to expedite convergence. This not only allows extension to datasets in a semi-supervised manner but also facilitates handling in-the-wild source images. Quantitative and qualitative evaluations on generated videos demonstrate state-of-the-art performance even on unseen input data. Implementation and supplementary videos are available at https://github.com/dipnds/VentriloquistNet.
APA:
Das, D., Khan, Q., & Cremers, D. (2022). VENTRILOQUIST-NET: LEVERAGING SPEECH CUES FOR EMOTIVE TALKING HEAD GENERATION. In Proceedings - International Conference on Image Processing, ICIP (pp. 1716-1720). Bordeaux, FRA: IEEE Computer Society.
MLA:
Das, Deepan, Qadeer Khan, and Daniel Cremers. "VENTRILOQUIST-NET: LEVERAGING SPEECH CUES FOR EMOTIVE TALKING HEAD GENERATION." Proceedings of the 29th IEEE International Conference on Image Processing, ICIP 2022, Bordeaux, FRA IEEE Computer Society, 2022. 1716-1720.
BibTeX: Download