You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I looked at the MERT example and noticed that it's actually preprocessing the input. The code looks really convoluted but in the case of batch size 1, the net effect is that it's normalizing things to have zero mean and unit variance instead of passing it in directly.
Note that you can use the processor provided in the example if you want, but I decided not to for my case because my wav_input is already in CUDA and transformers forces everything into numpy and therefore CPU 😞 , resulting in expensive copies as you move data back and forth. Wasn't sure which one you've got here :))
The text was updated successfully, but these errors were encountered:
hey @LWprogramming , thanks for taking a look at the code! The input is normalized to zero mean unit variance when loading the data here. The normalize argument is set when initializing the datasets in trainer. This will normalize the data before it is cropped.
Alternatively we could normalize the cropped input right before passing it into MERT. Not sure which would be better, but normalizing it in the beginning made more sense to me.
also your comment made me realize there was an issue in the infer coarse script where I forgot to normalize the audio before passing it in. fixed in c9a167e!
I looked at the MERT example and noticed that it's actually preprocessing the input. The code looks really convoluted but in the case of batch size 1, the net effect is that it's normalizing things to have zero mean and unit variance instead of passing it in directly.
Note that you can use the
processor
provided in the example if you want, but I decided not to for my case because mywav_input
is already in CUDA andtransformers
forces everything into numpy and therefore CPU 😞 , resulting in expensive copies as you move data back and forth. Wasn't sure which one you've got here :))The text was updated successfully, but these errors were encountered: