Chetupalli SR, Habets E (2022)
Publication Type: Conference contribution
Publication year: 2022
Publisher: International Speech Communication Association
Book Volume: 2022-September
Pages Range: 5393-5397
Conference Proceedings Title: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Event location: Incheon, KOR
DOI: 10.21437/Interspeech.2022-10849
Speaker-independent speech separation for single-channel mixtures with an unknown number of multiple speakers in the waveform domain is considered in this paper. To deal with the unknown number of sources, we incorporate an encoder-decoder attractor (EDA) module into a speech separation network. The neural network architecture consists of a trainable encoder-decoder pair and a masking network. The mask network in the proposed approach is inspired by the transformer-based SepFormer separation system. It contains a dual-path block and a triple path block, each block modeling both short-time and long-time dependencies in the signal. The EDA module first summarises the dual-path block output using an LSTM encoder and generates one attractor vector per speaker in the mixture using an LSTM decoder. The attractors are combined with the dual-path block output to generate speaker channels, which are processed jointly by the triple-path block to predict the mask. Further, a linear-sigmoid layer, with attractors as the input, predicts a binary output to indicate a stopping criterion for attractor generation. The proposed approach is evaluated on the WSJ0-mix dataset with mixtures of up to five speakers. State-of-the-art results are obtained in the speech separation quality and speaker counting for all the mixtures.
APA:
Chetupalli, S.R., & Habets, E. (2022). Speech Separation for an Unknown Number of Speakers Using Transformers With Encoder-Decoder Attractors. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 5393-5397). Incheon, KOR: International Speech Communication Association.
MLA:
Chetupalli, Srikanth Raj, and Emanuël Habets. "Speech Separation for an Unknown Number of Speakers Using Transformers With Encoder-Decoder Attractors." Proceedings of the 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, Incheon, KOR International Speech Communication Association, 2022. 5393-5397.
BibTeX: Download