Elminshawi M, Chetupalli SR, Habets E (2024)
Publication Type: Journal article
Publication year: 2024
Book Volume: 31
Pages Range: 2205-2209
Neural networks for speech separation generally exhibit high computational costs and large memory footprints. Moreover, typical separation networks have a fixed computational graph that processes all input frames at a uniform computational cost, even though intensive processing may not be necessary for frames containing silence or a single active speaker. Addressing this computational inefficiency is especially crucial when these networks are deployed on resource-constrained devices. In this letter, we propose a dynamic slimmable network for speech separation that mitigates the computational inefficiency of existing networks. We introduce slimmable layers with a gating mechanism that can adapt their computational complexity based on the input characteristics. As an example, we propose to use the slimmable layers in the intra-chunk blocks of a dual-path structure-based network to facilitate adaptation based on the local characteristics of the input signal. Experimental evaluation on simulated two-speaker mixtures from the WSJ0-2mix dataset demonstrates that the proposed method substantially reduces the computational cost while maintaining comparable performance to fully utilized static networks.
APA:
Elminshawi, M., Chetupalli, S.R., & Habets, E. (2024). Dynamic Slimmable Network for Speech Separation. IEEE Signal Processing Letters, 31, 2205-2209. https://doi.org/10.1109/LSP.2024.3445304
MLA:
Elminshawi, Mohamed, Srikanth Raj Chetupalli, and Emanuël Habets. "Dynamic Slimmable Network for Speech Separation." IEEE Signal Processing Letters 31 (2024): 2205-2209.
BibTeX: Download