About the HDF5 and the audio generation. #31

ghost · 2021-01-14T09:01:17Z

hi, to generate singing voice, it expects a .hdf5 file from the dataset. Generated .hdf5 needs wave file, Can it not use wave files?

Originally posted by @Kerry0123 in #29 (comment)

ghost · 2021-01-14T09:27:10Z

I will attempt to answer you, but i'm not the author of the project.

HDF5 format is a container, in any way the .hdf5 in this project contains audio files (wav) but some (vocoder) decompositions (F0 and other spectrals things) labelled by phonetic. This is a design choice for the project, but not a requirement. Take a look to the code to understand where and how the content is used.

For the inference demo, the hdf5 are used to only to get the F0 and phonetic from a singer to get the audio (spectral) features from the AI model and generate the audio output. In the test_file_hdf5 in models.py, you can see :

feats, f0_nor, pho_target = self.read_hdf5_file(file_name)
out_feats = self.process_file(f0_nor, pho_target, singer_index,  sess)

That mean that only the normalized F0 and phonetic target (not the features) inputs are used to generate the (overlapped) output features from the AI model. Theses features are vocoded (WORLD) back using some post-processing (SPTK) before.

So, if you want to use the (current) trained model with your own melody/words, you need to pass your own normalized F0 and phonetic labels to the process_file method (do not use the read_hdf5_file method).

Kerry0123 · 2021-01-15T02:21:41Z

Thank you very much. I understand that WGANSING is just a vocoder. A song synthesis system needs a synthesizer(acoustic model) to generate f0_nor.

ghost · 2021-01-15T09:19:43Z

No, this is not a vocoder. This is a singing synthesizer based on AI model that generate audio features needed for the (third party) vocoder. WGANSing do not generate f0_nor, WGANSing need f0_nor to generate audio.

ghost changed the title ~~hi, to generate singing voice, it expects a .hdf5 file from the dataset. Generated .hdf5 needs wave file, Can it not use wave files?~~ About the HDF5 and the audio generation. Jan 14, 2021

ghost mentioned this issue Jan 14, 2021

What should the input file look like when using in test mode? #30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the HDF5 and the audio generation. #31

About the HDF5 and the audio generation. #31

ghost commented Jan 14, 2021

ghost commented Jan 14, 2021 •

edited by ghost

Loading

Kerry0123 commented Jan 15, 2021

ghost commented Jan 15, 2021

About the HDF5 and the audio generation. #31

About the HDF5 and the audio generation. #31

Comments

ghost commented Jan 14, 2021

ghost commented Jan 14, 2021 • edited by ghost Loading

Kerry0123 commented Jan 15, 2021

ghost commented Jan 15, 2021

ghost commented Jan 14, 2021 •

edited by ghost

Loading