Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the HDF5 and the audio generation. #31

Open
ghost opened this issue Jan 14, 2021 · 3 comments
Open

About the HDF5 and the audio generation. #31

ghost opened this issue Jan 14, 2021 · 3 comments

Comments

@ghost
Copy link

ghost commented Jan 14, 2021

hi, to generate singing voice, it expects a .hdf5 file from the dataset. Generated .hdf5 needs wave file, Can it not use wave files?

Originally posted by @Kerry0123 in #29 (comment)

@ghost
Copy link
Author

ghost commented Jan 14, 2021

I will attempt to answer you, but i'm not the author of the project.

HDF5 format is a container, in any way the .hdf5 in this project contains audio files (wav) but some (vocoder) decompositions (F0 and other spectrals things) labelled by phonetic. This is a design choice for the project, but not a requirement. Take a look to the code to understand where and how the content is used.

For the inference demo, the hdf5 are used to only to get the F0 and phonetic from a singer to get the audio (spectral) features from the AI model and generate the audio output. In the test_file_hdf5 in models.py, you can see :

feats, f0_nor, pho_target = self.read_hdf5_file(file_name)
out_feats = self.process_file(f0_nor, pho_target, singer_index,  sess)

That mean that only the normalized F0 and phonetic target (not the features) inputs are used to generate the (overlapped) output features from the AI model. Theses features are vocoded (WORLD) back using some post-processing (SPTK) before.

So, if you want to use the (current) trained model with your own melody/words, you need to pass your own normalized F0 and phonetic labels to the process_file method (do not use the read_hdf5_file method).

@ghost ghost changed the title hi, to generate singing voice, it expects a .hdf5 file from the dataset. Generated .hdf5 needs wave file, Can it not use wave files? About the HDF5 and the audio generation. Jan 14, 2021
@Kerry0123
Copy link

Thank you very much. I understand that WGANSING is just a vocoder. A song synthesis system needs a synthesizer(acoustic model) to generate f0_nor.

@ghost
Copy link
Author

ghost commented Jan 15, 2021

No, this is not a vocoder. This is a singing synthesizer based on AI model that generate audio features needed for the (third party) vocoder. WGANSing do not generate f0_nor, WGANSing need f0_nor to generate audio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant