A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction

Schwär S, Krause M, Fast M, Rosenzweig S, Scherbaum F, Müller M (2024)

Publication Type: Journal article

Publication year: 2024

Journal

Transactions of the International Society for Music Information Retrieval ISMIR

Book Volume: 7

Pages Range: 30-43

Issue: 1

Journal Issue: 1

DOI: 10.5334/tismir.166

Abstract

ading: A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction

Download

A-

Decrease article font size

A+

Increase article font size

Cite this article

Print this article

Alt.

DisplayAlternative display

Share on Facebook

Share on X

Share on LinkedIn

Share as email

Dataset articles

A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction

Authors:

Simon SchwärEmail Simon Schwär

Michael Krause

Michael Fast

Sebastian Rosenzweig

Frank Scherbaum

Meinard Müller

Toggle author information panel

Abstract

Larynx microphones (LMs) make it possible to obtain practically crosstalk-free recordings of the human voice by picking up vibrations directly from the throat. This can be useful in a multitude of music information retrieval scenarios related to singing, e.g., the analysis of individual voices recorded in environments with lots of interfering noise. However, LMs have a limited frequency range and barely capture the effects of the vocal tract, which makes the recorded signal unsuitable for downstream tasks that require high-quality recordings. In this paper, we introduce the task of reconstructing a natural sounding, high-quality singing voice recording from an LM signal. With an explicit focus on the singing voice, the problem lies at the intersection of speech enhancement and singing voice synthesis with the additional requirement of faithful reproduction of expressive parameters like intonation. In this context, we make three main contributions. First, we publish a dataset with over 4 hours of popular music we recorded with four amateur singers accompanied by a guitar, where both LM and clean close-up microphone signals are available. Second, we propose a data-driven baseline approach for singing voice reconstruction from LM signals using differentiable signal processing, inspired by a source-filter model that emulates the missing vocal tract effects. Third, we evaluate the baseline with a listening test and further show that it can improve the accuracy of lyrics transcription as an exemplary downstream task.

Authors with CRIS profile

Simon Schwär International Audio Laboratories Erlangen (AudioLabs) Michael Krause International Audio Laboratories Erlangen (AudioLabs) Sebastian Rosenzweig International Audio Laboratories Erlangen (AudioLabs) Meinard Müller Lehrstuhl für Semantische Audiosignalverarbeitung (AudioLabs)

Involved external institutions

Universität Potsdam

Germany (DE)

How to cite

APA:

Schwär, S., Krause, M., Fast, M., Rosenzweig, S., Scherbaum, F., & Müller, M. (2024). A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction. Transactions of the International Society for Music Information Retrieval, 7(1), 30-43. https://doi.org/10.5334/tismir.166

MLA:

Schwär, Simon, et al. "A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction." Transactions of the International Society for Music Information Retrieval 7.1 (2024): 30-43.

BibTeX: Download