-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the HDF5 and the audio generation. #31
Comments
I will attempt to answer you, but i'm not the author of the project. HDF5 format is a container, in any way the .hdf5 in this project contains audio files (wav) but some (vocoder) decompositions (F0 and other spectrals things) labelled by phonetic. This is a design choice for the project, but not a requirement. Take a look to the code to understand where and how the content is used. For the inference demo, the hdf5 are used to only to get the F0 and phonetic from a singer to get the audio (spectral) features from the AI model and generate the audio output. In the
That mean that only the normalized F0 and phonetic target (not the features) inputs are used to generate the (overlapped) output features from the AI model. Theses features are vocoded (WORLD) back using some post-processing (SPTK) before. So, if you want to use the (current) trained model with your own melody/words, you need to pass your own normalized F0 and phonetic labels to the |
Thank you very much. I understand that WGANSING is just a vocoder. A song synthesis system needs a synthesizer(acoustic model) to generate f0_nor. |
No, this is not a vocoder. This is a singing synthesizer based on AI model that generate audio features needed for the (third party) vocoder. WGANSing do not generate f0_nor, WGANSing need f0_nor to generate audio. |
hi, to generate singing voice, it expects a .hdf5 file from the dataset. Generated .hdf5 needs wave file, Can it not use wave files?
Originally posted by @Kerry0123 in #29 (comment)
The text was updated successfully, but these errors were encountered: