Schwär S, Krause M, Fast M, Rosenzweig S, Scherbaum F, Müller M (2024)
Publication Type: Journal article
Publication year: 2024
Book Volume: 7
Pages Range: 30-43
Issue: 1
Journal Issue: 1
DOI: 10.5334/tismir.166
ading: A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction
Download
A-
Decrease article font size
A+
Increase article font size
Cite this article
Print this article
Alt.
DisplayAlternative display
Share:
Share on Facebook
Share on X
Share on LinkedIn
Share as email
Dataset articles
A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction
Authors:
Simon SchwärEmail Simon Schwär
Michael Krause
Michael Fast
Sebastian Rosenzweig
Frank Scherbaum
Meinard Müller
Toggle author information panel
Abstract
Larynx microphones (LMs) make it possible to obtain practically crosstalk-free recordings of the human voice by picking up vibrations directly from the throat. This can be useful in a multitude of music information retrieval scenarios related to singing, e.g., the analysis of individual voices recorded in environments with lots of interfering noise. However, LMs have a limited frequency range and barely capture the effects of the vocal tract, which makes the recorded signal unsuitable for downstream tasks that require high-quality recordings. In this paper, we introduce the task of reconstructing a natural sounding, high-quality singing voice recording from an LM signal. With an explicit focus on the singing voice, the problem lies at the intersection of speech enhancement and singing voice synthesis with the additional requirement of faithful reproduction of expressive parameters like intonation. In this context, we make three main contributions. First, we publish a dataset with over 4 hours of popular music we recorded with four amateur singers accompanied by a guitar, where both LM and clean close-up microphone signals are available. Second, we propose a data-driven baseline approach for singing voice reconstruction from LM signals using differentiable signal processing, inspired by a source-filter model that emulates the missing vocal tract effects. Third, we evaluate the baseline with a listening test and further show that it can improve the accuracy of lyrics transcription as an exemplary downstream task.
APA:
Schwär, S., Krause, M., Fast, M., Rosenzweig, S., Scherbaum, F., & Müller, M. (2024). A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction. Transactions of the International Society for Music Information Retrieval, 7(1), 30-43. https://doi.org/10.5334/tismir.166
MLA:
Schwär, Simon, et al. "A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction." Transactions of the International Society for Music Information Retrieval 7.1 (2024): 30-43.
BibTeX: Download