SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement

Strauß M, Pia N, K. S. Rao N, Edler B (2023)


Publication Type: Conference contribution, Conference Contribution

Publication year: 2023

Publisher: IEEE

City/Town: New Paltz, NY, USA

Conference Proceedings Title: 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

ISBN: 979-8-3503-2373-3

DOI: 10.1109/WASPAA58266.2023.10248144

Abstract

This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE). For this, a DNN is trained to synthesize the enhanced speech conditioned on noisy speech using a Normalizing Flow (NF) as generator in a GAN framework. While the combination of likelihood models and GANs is not trivial, SEFGAN demonstrates that a hybrid adversarial and maximum likelihood training approach enables the model to maintain high quality audio generation and log-likelihood estimation. Our experiments indicate that this approach strongly outperforms the baseline NF-based model without introducing additional complexity to the enhancement network. A comparison using computational metrics and a listening experiment reveals that SEFGAN is competitive with other state-of-the-art models.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Strauß, M., Pia, N., K. S. Rao, N., & Edler, B. (2023). SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement. In 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY, USA: IEEE.

MLA:

Strauß, Martin, et al. "SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement." Proceedings of the 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) New Paltz, NY, USA: IEEE, 2023.

BibTeX: Download