add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334

teticio · 2022-11-17T17:45:47Z

I have added AudioDiffusionPipeline and LatentAudioDiffusionPipeline which I intend to migrate from https://github.com/teticio/audio-diffusion. I have added them to the main src as opposed to the community pipelines due to the inheritance of LatentAudioDiffusionPipeline from AudioDiffusionPipeline, which cannot be done in a single pipeline file, as well as the fact that the Mel class is needed to convert from audio to images and vice versa. It might make sense to move the Mel class somewhere more central, as it could be used by other pipelines.

teticio · 2022-11-17T17:48:43Z

@patrickvonplaten @Vaibhavs10 I'd be very grateful if you could have a look at this. I'll fix the failing tests tomorrow.

HuggingFaceDocBuilderDev · 2022-11-17T19:18:44Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2022-11-20T19:30:41Z

src/diffusers/__init__.py

        LDMPipeline,
        LDMSuperResolutionPipeline,
+        Mel,


Could we remove this from the general __init__.py function? -> I don't think one would use "Mel" witouth the pipelines no? :-)

A mel instance is a parameter to the pipeline and it is useful for creating the dataset and training the model. I agree that it will probably only be used in conjunction with the pipelines, so as long as you can import it from diffusers.pipelines that should be Ok. Is that what you mean? Thanks!

patrickvonplaten · 2022-11-20T19:30:52Z

docs/source/using-diffusers/audio.mdx

patrickvonplaten · 2022-11-20T19:31:11Z

docs/source/api/pipelines/audio_diffusion.mdx

+specific language governing permissions and limitations under the License.
+-->
+
+# Audio Diffusion


patrickvonplaten · 2022-11-20T19:31:34Z

docker/diffusers-pytorch-cpu/Dockerfile

@@ -36,6 +37,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
        numpy \
        scipy \
        tensorboard \
-        transformers
+        transformers \
+        librosa


Ok for me @anton-l what do you think?

Thanks for reviewing @patrickvonplaten !

patrickvonplaten · 2022-11-20T19:32:40Z

tests/pipelines/audio_diffusion/test_audio_diffusion.py

+    def test_audio_diffusion(self):
+        device = torch_device
+
+        mel = Mel()


Should we maybe create this automatically directly in the pipeline? It might be a bit more user-friendly?

patrickvonplaten · 2022-11-20T19:32:45Z

tests/pipelines/audio_diffusion/test_audio_diffusion.py

+import numpy as np
+import torch
+
+from diffusers import (


Very nice tests!

patrickvonplaten · 2022-11-20T19:33:25Z

src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py

+    @torch.no_grad()
+    def __call__(
+        self,
+        mel: Mel,


It's a bit weird to me that one has to pass a class here for __call__ -> wouldn't it be better to just do this inside the pipeline?

patrickvonplaten

Hey @teticio,

This looks generally very nice to me and I think we can merge this soon :-)
One big thing we need to change is to make librosa an optional dependency.
You can do this as follows:

Add some "is_librosa_available()" logic here:

diffusers/src/diffusers/utils/import_utils.py

Line 129 in ab1f01e

_unidecode_available = importlib.util.find_spec("unidecode") is not None

And then import the pipeline only if librosa is available. Maybe similar to how we do it for the LMSScheduler here:

diffusers/src/diffusers/__init__.py

Line 61 in ab1f01e

if is_torch_available() and is_scipy_available():

=> e.g. only import your pipelines if librosa is available :-)

Also I would maybe not accept an "empyt" Mel() class as an input to the call function, IMO that's a bit unintuitive design-wise - could we maybe just better create this inside the call method? Wdyt?

Finally, let's maybe not add Mel to the public init as I don't think anybody would import just the Mel class no?

Overall, great work though! Very happy to soon have a second audio diffusion model 😍

patrickvonplaten · 2022-11-20T19:37:43Z

setup.py

@@ -187,7 +188,8 @@ def run(self):
    "sentencepiece",
    "scipy",
    "torchvision",
-    "transformers"
+    "transformers",
+    "librosa"


This is fine for me @anton-l wdyt?

patrickvonplaten · 2022-11-20T19:38:35Z

@anton-l could you quickly check the docker file changes here regarding the new librosa dependency? No need to review the whole PR :-)

teticio · 2022-11-20T19:47:01Z

Hey @teticio,

This looks generally very nice to me and I think we can merge this soon :-)

One big thing we need to change is to make librosa an optional dependency.

You can do this as follows:

Add some "is_librosa_available()" logic here:

diffusers/src/diffusers/utils/import_utils.py

Line 129 in ab1f01e

_unidecode_available = importlib.util.find_spec("unidecode") is not None

And then import the pipeline only if librosa is available. Maybe similar to how we do it for the LMSScheduler here:

diffusers/src/diffusers/__init__.py

Line 61 in ab1f01e

if is_torch_available() and is_scipy_available():

=> e.g. only import your pipelines if librosa is available :-)

Great, thanks for the tip. I'll do that tomorrow.

Also I would maybe not accept an "empyt" Mel() class as an input to the call function, IMO that's a bit unintuitive design-wise - could we maybe just better create this inside the call method? Wdyt?

The Mel class encapsulates a few parameters (like hop length and so on) which I decided against adding to the model configs, so as not to pollute things. However, I can default the parameter so that it creates a Mel with the default parameters. I initially decided against this because I wanted to make it explicit that these parameters need setting. Do you think the best thing is to add the parameters like hop length etc to the pipeline call with suitable defaults that are then used to create the mel object? Happy to follow your guidance here.

Finally, let's maybe not add Mel to the public init as I don't think anybody would import just the Mel class no?

So I think the Mel class will be imported separately from the pipeline (for dataset creation and training). Shall I just make it importable from diffusers.pipelines and not from diffusers?

Overall, great work though! Very happy to soon have a second audio diffusion model 😍

Thank you! Very excited to add my 2 cents to this excellent repo!

teticio · 2022-11-20T20:01:32Z

...or maybe it makes more sense to pass the Mel object (or the parameters needed to instantiate it) as kwargs in the constructor instead of the call method. Let me know what you think.

patrickvonplaten · 2022-11-21T11:09:12Z

Hey @teticio,

Sorry I think the commit history got messed up :-/
Could you maybe try to fix it or just open a new PR if it's easier 😅

Yes, definitely fine for me to pass Mel() to the init method of the model

HuggingFaceDocBuilderDev · 2022-11-21T14:36:19Z

The documentation is not available anymore as the PR was closed or merged.

author teticio <teticio@gmail.com> 1668765652 +0000 committer teticio <teticio@gmail.com> 1669041721 +0000 parent 499ff34 author teticio <teticio@gmail.com> 1668765652 +0000 committer teticio <teticio@gmail.com> 1669041704 +0000 add colab notebook [Flax] Fix loading scheduler from subfolder (#1319) [FLAX] Fix loading scheduler from subfolder Fix/Enable all schedulers for in-painting (#1331) * inpaint fix k lms * onnox as well * up Correct path to schedlure (#1322) * [Examples] Correct path * uP Avoid nested fix-copies (#1332) * Avoid nested `# Copied from` statements during `make fix-copies` * style Fix img2img speed with LMS-Discrete Scheduler (#896) Casting `self.sigmas` into a different dtype (the one of original_samples) is not advisable. In my img2img pipeline this leads to a long running time in the `integrate.quad` call later on- by long I mean more than 10x slower. Co-authored-by: Anton Lozhkov <anton@huggingface.co> Fix the order of casts for onnx inpainting (#1338) Legacy Inpainting Pipeline for Onnx Models (#1237) * Add legacy inpainting pipeline compatibility for onnx * remove commented out line * Add onnx legacy inpainting test * Fix slow decorators * pep8 styling * isort styling * dummy object * ordering consistency * style * docstring styles * Refactor common prompt encoding pattern * Update tests to permanent repository home * support all available schedulers until ONNX IO binding is available Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * updated styling from PR suggested feedback Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> Jax infer support negative prompt (#1337) * support negative prompts in sd jax pipeline * pass batched neg_prompt * only encode when negative prompt is None Co-authored-by: Juan Acevedo <jfacevedo@google.com> Update README.md: Minor change to Imagic code snippet, missing dir error (#1347) Minor change to Imagic Readme Missing dir causes an error when running the example code. make style change the sample model (#1352) * Update alt_diffusion.mdx * Update alt_diffusion.mdx Add bit diffusion [WIP] (#971) * Create bit_diffusion.py Bit diffusion based on the paper, arXiv:2208.04202, Chen2022AnalogBG * adding bit diffusion to new branch ran tests * tests * tests * tests * tests * removed test folders + added to README * Update README.md Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

teticio · 2022-11-21T14:55:40Z

Hi @patrickvonplaten

Hey @teticio,

Sorry I think the commit history got messed up :-/ Could you maybe try to fix it or just open a new PR if it's easier 😅

Yeah, sorry about that - not sure what I did wrong there, but it should be OK now.

Yes, definitely fine for me to pass Mel() to the init method of the model

So I took a different tack. I thought it would make more sense for Mel to be set up in the constructor of the pipeline. For that to be possible, it needs to be a module. So I moved it to models and derived it from ConfigMixin. For it to be able to be load_from_pretrained, I added Mel to the LOADABLE_CLASSES. The thinking is that this module could be replaced by some other audio<->image transformation in a composable way. For example, a neural one instead of the mel spectrogram. Arguably, there should be a base class (e.g., Audio2Image or something) from which Mel is derived, but I thought it might make more sense to refactor that if and when an alternative is implemented.

The great advantage of doing it this way is that the Mel object is guaranteed to be consistent with the rest of the model; before the user had to make sure that he / she was using the configuration used to train the model in the first place.

For this to work, I have to update the models uploaded to HF hub to include the Mel config. So currently the 'slow' test will fail. However, the colab notebook linked in the documentation is currently pointing to a modified version of the models in HF hub, so you can try it out with a pre-trained model there.

[A consequence of this is that Mel is still importable from diffusers...]

teticio · 2022-11-21T14:58:12Z

=> e.g. only import your pipelines if librosa is available :-)

Done!

teticio · 2022-11-21T18:33:25Z

@patrickvonplaten As soon as you confirm that this approach is acceptable (making Mel a module in the pipeline), I will make the corresponding changes in my current repo (from which I am migrating) and update the model repos accordingly, so that there can be a seamless switch over when the new version of diffusers comes out. Here is an example of how it will look in the model config: https://huggingface.co/teticio/latent-audio-diffusion-ddim-256-new/blob/main/mel/mel_config.json

teticio · 2022-11-21T18:59:41Z

Ah, one last thing. If you don't like Mel being a LOADABLE_CLASS, maybe we could make ConfigMixin one instead. This would allow config of classes without model weights other than Schedulers to be loaded and saved.

teticio · 2022-11-23T09:49:20Z

@patrickvonplaten I put you back as reviewer to check the changes I made to Mel ^. I didn't realize it would remove anton-I

teticio · 2022-11-25T18:31:23Z

Sorry. I dk wtf happened with my repo. I am going to start a new PR from scratch

) * add AudioDiffusionPipeline and LatentAudioDiffusionPipeline * add docs to toc * fix tests * fix tests * fix tests * fix tests * fix tests * Update pr_tests.yml Fix tests * parent 499ff34 author teticio <teticio@gmail.com> 1668765652 +0000 committer teticio <teticio@gmail.com> 1669041721 +0000 parent 499ff34 author teticio <teticio@gmail.com> 1668765652 +0000 committer teticio <teticio@gmail.com> 1669041704 +0000 add colab notebook [Flax] Fix loading scheduler from subfolder (#1319) [FLAX] Fix loading scheduler from subfolder Fix/Enable all schedulers for in-painting (#1331) * inpaint fix k lms * onnox as well * up Correct path to schedlure (#1322) * [Examples] Correct path * uP Avoid nested fix-copies (#1332) * Avoid nested `# Copied from` statements during `make fix-copies` * style Fix img2img speed with LMS-Discrete Scheduler (#896) Casting `self.sigmas` into a different dtype (the one of original_samples) is not advisable. In my img2img pipeline this leads to a long running time in the `integrate.quad` call later on- by long I mean more than 10x slower. Co-authored-by: Anton Lozhkov <anton@huggingface.co> Fix the order of casts for onnx inpainting (#1338) Legacy Inpainting Pipeline for Onnx Models (#1237) * Add legacy inpainting pipeline compatibility for onnx * remove commented out line * Add onnx legacy inpainting test * Fix slow decorators * pep8 styling * isort styling * dummy object * ordering consistency * style * docstring styles * Refactor common prompt encoding pattern * Update tests to permanent repository home * support all available schedulers until ONNX IO binding is available Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * updated styling from PR suggested feedback Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> Jax infer support negative prompt (#1337) * support negative prompts in sd jax pipeline * pass batched neg_prompt * only encode when negative prompt is None Co-authored-by: Juan Acevedo <jfacevedo@google.com> Update README.md: Minor change to Imagic code snippet, missing dir error (#1347) Minor change to Imagic Readme Missing dir causes an error when running the example code. make style change the sample model (#1352) * Update alt_diffusion.mdx * Update alt_diffusion.mdx Add bit diffusion [WIP] (#971) * Create bit_diffusion.py Bit diffusion based on the paper, arXiv:2208.04202, Chen2022AnalogBG * adding bit diffusion to new branch ran tests * tests * tests * tests * tests * removed test folders + added to README * Update README.md Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * move Mel to module in pipeline construction, make librosa optional * fix imports * fix copy & paste error in comment * fix style * add missing register_to_config * fix class docstrings * fix class docstrings * tweak docstrings * tweak docstrings * update slow test * put trailing commas back * respect alphabetical order * remove LatentAudioDiffusion, make vqvae optional * move Mel from models back to pipelines :-) * allow loading of pretrained audiodiffusion models * fix tests * fix dummies * remove reference to latent_audio_diffusion in docs * unused import * inherit from SchedulerMixin to make loadable * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

…ce#1334 (huggingface#1426) * add AudioDiffusionPipeline and LatentAudioDiffusionPipeline * add docs to toc * fix tests * fix tests * fix tests * fix tests * fix tests * Update pr_tests.yml Fix tests * parent 499ff34 author teticio <teticio@gmail.com> 1668765652 +0000 committer teticio <teticio@gmail.com> 1669041721 +0000 parent 499ff34 author teticio <teticio@gmail.com> 1668765652 +0000 committer teticio <teticio@gmail.com> 1669041704 +0000 add colab notebook [Flax] Fix loading scheduler from subfolder (huggingface#1319) [FLAX] Fix loading scheduler from subfolder Fix/Enable all schedulers for in-painting (huggingface#1331) * inpaint fix k lms * onnox as well * up Correct path to schedlure (huggingface#1322) * [Examples] Correct path * uP Avoid nested fix-copies (huggingface#1332) * Avoid nested `# Copied from` statements during `make fix-copies` * style Fix img2img speed with LMS-Discrete Scheduler (huggingface#896) Casting `self.sigmas` into a different dtype (the one of original_samples) is not advisable. In my img2img pipeline this leads to a long running time in the `integrate.quad` call later on- by long I mean more than 10x slower. Co-authored-by: Anton Lozhkov <anton@huggingface.co> Fix the order of casts for onnx inpainting (huggingface#1338) Legacy Inpainting Pipeline for Onnx Models (huggingface#1237) * Add legacy inpainting pipeline compatibility for onnx * remove commented out line * Add onnx legacy inpainting test * Fix slow decorators * pep8 styling * isort styling * dummy object * ordering consistency * style * docstring styles * Refactor common prompt encoding pattern * Update tests to permanent repository home * support all available schedulers until ONNX IO binding is available Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * updated styling from PR suggested feedback Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> Jax infer support negative prompt (huggingface#1337) * support negative prompts in sd jax pipeline * pass batched neg_prompt * only encode when negative prompt is None Co-authored-by: Juan Acevedo <jfacevedo@google.com> Update README.md: Minor change to Imagic code snippet, missing dir error (huggingface#1347) Minor change to Imagic Readme Missing dir causes an error when running the example code. make style change the sample model (huggingface#1352) * Update alt_diffusion.mdx * Update alt_diffusion.mdx Add bit diffusion [WIP] (huggingface#971) * Create bit_diffusion.py Bit diffusion based on the paper, arXiv:2208.04202, Chen2022AnalogBG * adding bit diffusion to new branch ran tests * tests * tests * tests * tests * removed test folders + added to README * Update README.md Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * move Mel to module in pipeline construction, make librosa optional * fix imports * fix copy & paste error in comment * fix style * add missing register_to_config * fix class docstrings * fix class docstrings * tweak docstrings * tweak docstrings * update slow test * put trailing commas back * respect alphabetical order * remove LatentAudioDiffusion, make vqvae optional * move Mel from models back to pipelines :-) * allow loading of pretrained audiodiffusion models * fix tests * fix dummies * remove reference to latent_audio_diffusion in docs * unused import * inherit from SchedulerMixin to make loadable * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

teticio mentioned this pull request Nov 18, 2022

[Community Pipelines] #841

Open

6 tasks

patrickvonplaten self-assigned this Nov 20, 2022

patrickvonplaten reviewed Nov 20, 2022

View reviewed changes

patrickvonplaten requested a review from anton-l November 20, 2022 19:37

patrickvonplaten assigned anton-l Nov 20, 2022

teticio changed the base branch from main to 1d_blocks November 21, 2022 14:02

teticio changed the base branch from 1d_blocks to main November 21, 2022 14:02

teticio closed this Nov 21, 2022

teticio reopened this Nov 21, 2022

teticio added 6 commits November 21, 2022 14:41

add AudioDiffusionPipeline and LatentAudioDiffusionPipeline

f44a57a

add docs to toc

2ad4d9e

fix tests

2e25d89

fix tests

54d6625

fix tests

0700c19

fix tests

13efb5b

teticio and others added 3 commits November 21, 2022 14:42

move Mel to module in pipeline construction, make librosa optional

6957b5a

fix imports

95e0908

teticio changed the base branch from main to 1d_blocks November 21, 2022 14:44

teticio changed the base branch from 1d_blocks to main November 21, 2022 14:44

teticio added 5 commits November 21, 2022 15:28

fix copy & paste error in comment

6a3d02f

fix style

c15504c

add missing register_to_config

2045310

fix class docstrings

f0c0a62

fix class docstrings

3971462

teticio added 3 commits November 21, 2022 19:10

tweak docstrings

9e5ea5c

tweak docstrings

e7be31c

update slow test

e9d4e43

teticio requested review from patrickvonplaten and removed request for anton-l November 23, 2022 07:42

merge with upstream

b873540

teticio closed this Nov 25, 2022

teticio reopened this Nov 25, 2022

teticio closed this Nov 25, 2022

teticio mentioned this pull request Nov 27, 2022

add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334 #1426

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334

add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334

teticio commented Nov 17, 2022

teticio commented Nov 17, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 17, 2022 •

edited

Loading

patrickvonplaten Nov 20, 2022

teticio Nov 20, 2022

patrickvonplaten Nov 20, 2022

patrickvonplaten Nov 20, 2022

patrickvonplaten Nov 20, 2022

teticio Nov 20, 2022

patrickvonplaten Nov 20, 2022

patrickvonplaten Nov 20, 2022

patrickvonplaten Nov 20, 2022

patrickvonplaten left a comment

patrickvonplaten Nov 20, 2022

patrickvonplaten commented Nov 20, 2022

teticio commented Nov 20, 2022

teticio commented Nov 20, 2022 •

edited

Loading

patrickvonplaten commented Nov 21, 2022

HuggingFaceDocBuilderDev commented Nov 21, 2022 •

edited

Loading

teticio commented Nov 21, 2022 •

edited

Loading

teticio commented Nov 21, 2022

teticio commented Nov 21, 2022 •

edited

Loading

teticio commented Nov 21, 2022

teticio commented Nov 23, 2022

teticio commented Nov 25, 2022

add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334

add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334

Conversation

teticio commented Nov 17, 2022

teticio commented Nov 17, 2022 • edited Loading

HuggingFaceDocBuilderDev commented Nov 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten commented Nov 20, 2022

teticio commented Nov 20, 2022

teticio commented Nov 20, 2022 • edited Loading

patrickvonplaten commented Nov 21, 2022

HuggingFaceDocBuilderDev commented Nov 21, 2022 • edited Loading

teticio commented Nov 21, 2022 • edited Loading

teticio commented Nov 21, 2022

teticio commented Nov 21, 2022 • edited Loading

teticio commented Nov 21, 2022

teticio commented Nov 23, 2022

teticio commented Nov 25, 2022

teticio commented Nov 17, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 17, 2022 •

edited

Loading

teticio commented Nov 20, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 21, 2022 •

edited

Loading

teticio commented Nov 21, 2022 •

edited

Loading

teticio commented Nov 21, 2022 •

edited

Loading