AmbiSep: Joint Ambisonic-to-Ambisonic Speech Separation and Noise Reduction

Herzog A, Chetupalli SR, Habets E (2023)


Publication Type: Journal article

Publication year: 2023

Journal

Pages Range: 1-13

DOI: 10.1109/TASLP.2023.3297954

Abstract

Blind separation of the sounds in an Ambisonic sound scene is a challenging problem, especially when the spatial impression of these sounds needs to be preserved. In this work, we consider Ambisonic-to-Ambisonic separation of reverberant speech mixtures, optionally containing noise. A supervised learning approach is adopted utilizing a transformer-based deep neural network, denoted by AmbiSep. AmbiSep takes mutichannel Ambisonic signals as input and estimates separate multichannel Ambisonic signals for each speaker while preserving their spatial images including reverberation. The GPU memory requirement of AmbiSep during training increases with the number of Ambisonic channels. To overcome this issue, we propose different aggregation methods.The model is trained and evaluated for first-order and second-order Ambisonics using simulated speech mixtures. Experimental results show that the model performs well on clean and noisy reverberant speech mixtures, and also generalizes to mixtures generated with measured Ambisonic impulse responses.

Authors with CRIS profile

How to cite

APA:

Herzog, A., Chetupalli, S.R., & Habets, E. (2023). AmbiSep: Joint Ambisonic-to-Ambisonic Speech Separation and Noise Reduction. IEEE/ACM Transactions on Audio, Speech and Language Processing, 1-13. https://doi.org/10.1109/TASLP.2023.3297954

MLA:

Herzog, Adrian, Srikanth Raj Chetupalli, and Emanuël Habets. "AmbiSep: Joint Ambisonic-to-Ambisonic Speech Separation and Noise Reduction." IEEE/ACM Transactions on Audio, Speech and Language Processing (2023): 1-13.

BibTeX: Download