Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension: Stable Diffusion Api integration #309

Merged
merged 3 commits into from
Mar 19, 2023
Merged

Extension: Stable Diffusion Api integration #309

merged 3 commits into from
Mar 19, 2023

Conversation

Brawlence
Copy link
Contributor

@Brawlence Brawlence commented Mar 14, 2023

Description:

Lets the bot answer you with a picture!

Load it in the --cai-chat mode with --extension sd_api_pictures alongside send_pictures (it's not really required, but completes the picture).

If enabled, the image generation is triggered either:

  • manually through the extension buttons OR
  • IF the words 'send | mail | me' are detected simultaneously with 'image | pic | picture | photo'

One needs an available instance of Automatic1111's webui running with an --api flag. Ain't tested with a notebook / cloud hosted one but should be possible. I'm running it locally in parallel on the same machine as the textgen-webui. One also needs to specify custom --listen-port if he's gonna run everything locally.

For the record, 12 GB VRAM is barely enough to run NeverEndingDream 512×512 fp16 and LLaMA-7b in 4bit precision.
TODO: We should really think about a way to juggle models around RAM and VRAM for this project to work on lower VRAM cards.


Extension interface

Interface
Don't mind the Windranger Arcana key in the Prompt Prefix, that's just the name of an embedding I trained beforehand.

Demonstrations:

Conversation 1

EXA1
EXA2
EXA3
EXA4

Conversation 2

Hist1
Hist2
Hist3

Lets the bot answer you with a picture!
@djkacevedo
Copy link

djkacevedo commented Mar 14, 2023

Very nice.

Idea: Can you image to image the profile picture for any detected expression changes?

something like:

image

@oobabooga
Copy link
Owner

I couldn't test the extension so far, probably because I don't have the 'NeverEndingDream' model installed. I will try again later.

@Brawlence
Copy link
Contributor Author

Brawlence commented Mar 17, 2023 via email

@oobabooga
Copy link
Owner

oobabooga commented Mar 19, 2023

This is extremely amusing. It just worked in the end, all I had to do was tick "Activate SD Api integration" and change the host address to http://192.168.0.32:7861 where 192.168.0.32 is the IP of the machine where I am running stable diffusion.

img1

img2

@oobabooga oobabooga merged commit 4bafe45 into oobabooga:main Mar 19, 2023
@RandomInternetPreson
Copy link
Contributor

Yeass!! I was able to get this to work, but I had to remove the "modules.py" and "modules-1.0.0.dist-info" folder from my textgen environment for it to work. I'm 'running on windows without wsl.

@0xbitches
Copy link

Yeah modules is listed in the list of requirements for the extension but it will conflict with the modules/ folder in the textgen webui directory. Please consider removing this in a commit.

oobabooga added a commit that referenced this pull request Mar 20, 2023
@St33lMouse
Copy link

This sounds great! We just need a little bit more information to avoid guessing at how to get them to communicate.

Here's my Stable Diffusion launch line:
./webui.sh --no-half-vae --listen --port 7032 --api

And here's my Ooba text gen launch:
python server.py --model opt-1.3b --cai-chat

I don't think this will make them talk. Both programs are running on the same machine in the same browser in two different tabs. How should those lines read to allow textgen to utilize Stable Diffusion?

And if I need to know my local machine's IP, how do I do that? If you can answer those questions, maybe we could put the answers in the wiki so people don't bug you about it.

@karlwancl
Copy link

karlwancl commented Mar 20, 2023

Yea, VRAM probably is the problem, you cant really host 2 VRAM eaters in the same consumer machine. That's another reason for moving the chatting AI to CPU (supported by AVX2) like the llama.cpp (https://github.com/ggerganov/llama.cpp) /alpaca.app (https://github.com/antimatter15/alpaca.cpp) projects, so we consume RAM instead of VRAM.

But text-generation-webui seems not supported yet and some people are working on the integration: #447

If that's done, I guess this extension would be more usable in average consumer machines.

@Brawlence
Copy link
Contributor Author

@St33lMouse

And if I need to know my local machine's IP, how do I do that?

You don't; if you're running them on the same machine you can use a special address 127.0.0.1 which basically means 'on this machine' for any network. So in your case, just go to ooba's extension tab, tick the API checkbox to enable and change the address to 127.0.0.1:7032 - it should work out of the box

@JohnWJarrett
Copy link

Ok, so my problem seems to be with Auto having SSL as I am getting a "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate" Error, any suggestions?

@Brawlence
Copy link
Contributor Author

@JohnWJarrett
What are your launch parameters for both repos and how do you usually open the Auto1111's webUI? Through the https://, I presume?

@JohnWJarrett
Copy link

My TGWI params are (When trying to use SD along side)

python server.py --auto-devices --gpu-memory 5 --cai-chat --listen --listen-port 8888 --extension sd_api_pictures send_pictures

and my WUI params are

--xformers --deepdanbooru --api --listen --listen-port 8880

And yes, i use the SSL addon for WUI so yeah, through https

@Brawlence
Copy link
Contributor Author

Thanks! I haven't yet tested if the API works correctly when used through https, and that probably is the root cause of the issue.
You could try temporarily disabling SSL for WUI; please report if it works in that state.

I'll try to look for the fix for https in the meantime

@ewof
Copy link
Contributor

ewof commented Mar 20, 2023

r we able to use this when we're not in cai mode?

@Simon1V
Copy link

Simon1V commented Mar 20, 2023

I am working on trying integration of some multimodal models like mm-cot or nvidia prismer currently. Maybe it would be possible to have a common interface for picture handling? Both receiving and sending.

@JohnWJarrett
Copy link

JohnWJarrett commented Mar 20, 2023

@Brawlence, yeah, it get's past the cert error if I disable the SSL, but then I got a different error, one that I actually have a solution for... So, seeing as I am using a different port than WUI's default, I just copied and pasted the new url (http://127.0.0.1:8880/) into the settings on TGW, which I am guessing you might see the issue, or you might not, I didn't for about an hour until I was looking into the log and tried, on a whim, to do "localhost:8880" which gave me this error

requests.exceptions.InvalidSchema: No connection adapters were found for 'localhost:8880//sdapi/v1/txt2img'

which is when I noticed the "8880//sdapi", so I think you should truncate the trailing "/" in the IP if the user accidentally leaves it there, it was a thing I overlooked and I'm sure I wont be the only one, it's a stupid user error, sure, but I'd imagine it'd be an easy fix? I don't know, I hate Python with a passion so I never bothered learning it that much.

But other than that, yeah, it works fine, even on my 8GB GFX, I am not gonna try push it for anything over 256 images, but then again, I don't really need to, it's more just for the extra fun than anything.

EDIT: Also, while playing around, and this is just some general info for anyone who was wondering, you can put a LoRa into the "Prompt Prefix" and it will work, which would be good for getting a very consistent character.

@ItsOkayItsOfficial
Copy link

@Brawlence I've made some updates that I'd be happy to share!

Now one can optionally use 'subject' and 'pronoun' that will replace I have and My in the prompt sent to SD. This produces way better results on a wider-array of SD models and/or lets users with embeddings or Dreambooth models to specify their unique token.

Also added a Suffix field so that someone can better dial in other details of the scene if they want.

What I'd like to do next is actually read this information out of the Character json schema so that all a person has to do is load up their Character and the correct class and subject tokens are set. Heck, could even provide for model hash in that too...

I'm also able to get SD models working, but unfortunately I can't find where in the SD API that allows you to set the model.

image

@RandomInternetPreson
Copy link
Contributor

Ooh yes please I'd like to try your updates out 😁

@Brawlence
Copy link
Contributor Author

Brawlence commented Mar 22, 2023

@JohnWJarrett

I think you should truncate the trailing "/" in the IP if the user accidentally leaves it there, it was a thing I overlooked and I'm sure I wont be the only one, it's a stupid user error, sure, but I'd imagine it'd be an easy fix?

Thanks for your feedback! Here's a preview for the upcoming change:

Connection feature demo

It's gonna strip the http(s):// part and the trailing / if present and also return the status when pressing Enter in that field

@karlwancl

you cant really host 2 VRAM eaters in the same consumer machine

But yes one can! I have already tested the memory juggling feature (see #471 and AUTOMATIC1111/stable-diffusion-webui/pull/8780) and if both of those patches are accepted then it would be possible to:

  1. unload LLM to RAM,
  2. load Stable Diffusion checkpoint,
  3. generate the image and pass it to oobabooga UI,
  4. unload the SD checkpoint,
  5. load LLM back into VRAM

— all at the cost of ~20 additional seconds spent on shuffling models around. I've already tested it on my machine and it works.

Demo

Testing VRAM conservation
As you can see, it successfully performs all the above steps, at least on my local rig with all the fixes implemented.

Of course, I'd be more than happy to have llama.cpp implemented as well, more options are always better

@zyxpixel
Copy link

My current opinion: while llama.cpp uses CPU and RAM, and SD uses GPU and VRAM. These two will not conflict with each other. For now, llama cares more about the size of the RAM/VRAM and GPU acceleration is not obvious, and in most PCs RAM is much larger than VRAM.

@Andy-Goodheart
Copy link

Hi, I am having some trouble in getting this extension to work. I always get the same error.

File "C:\Users\user\text-generation-webui\extensions\sd_api_pictures\script.py", line 85, in get_SD_pictures
for img_str in r['images']:
KeyError: 'images'

Now, it seems that the key "Images" in the directory r is not existing. How can I fix this? (I am new to github, sorry if I posted this in the wrong place. I don't find the same issue under issues.

Thank you for your answer.

@oobabooga
Copy link
Owner

@Andy-Goodheart it looks like the SD API is not responding to your requests. Make sure that the IP and port under "Stable Diffusion host address" are correct and that SD is started with the --api flag.

@Andy-Goodheart
Copy link

@oobabooga Thanks a lot! =) That solved it for me. I didn't have the --api Argument in the webui-user.bat file.

@DerAlo
Copy link

DerAlo commented Apr 3, 2023

image
Any idea why i always recieve such creepy pics? ^^

@Brawlence
Copy link
Contributor Author

@DerAlo Hmmmmm. What SD model do you use? Try generating the description verbatim in Auto1111's interface, what do you get there?

For me, such pictures are usually generated either when the model tries to do something it was not trained on OR when CFG_scale is set too high

@DerAlo
Copy link

DerAlo commented Apr 3, 2023

@DerAlo Hmmmmm. What SD model do you use? Try generating the description verbatim in Auto1111's interface, what do you get there?

For me, such pictures are usually generated either when the model tries to do something it was not trained on OR when CFG_scale is set too high

Its strange - in 1111's interface everythin' is fine.. Model is 'SD_model': 'sd-v1-4' and cfg is at 7.... i really dont get it^^ but thx 4 ur reply :)

@francoisatt
Copy link

anyone could write a tutorial for this extension, i can't strat this without error (rtx 3070) :(

@Brawlence
Copy link
Contributor Author

@francoisatt what's the error, what's the parameters on the launch, what models do you use and how much VRAM you got?

@altoiddealer
Copy link
Contributor

I have this extension running and it seems like it is working as intended-
-The bot types something
-SD using that as a prompt to generate an image
-The image appears in the chat, and also in the /sd_api_pictures/outputs directory.

However, the output .PNG images do not have any Stable Diffusion metadata., which is very unfortunate.

@francoisatt
Copy link

Hello,
I use the one click installer,my .bat:
image

on the web interface, when i Activate SD Api integration, and i click "generate an image reponse", i obtain this error:

image

My configuration is: i7-11800H ram 16go and rtx3070 vram 8go .

thanks for your help.

@ClayShoaf
Copy link
Contributor

@ItsOkayItsOfficial
Any updates?

@Brawlence
Copy link
Contributor Author

@francoisatt

Try to reorder the Google translation extension after the sd-api or if that does not help, removing it (even for a while to test)

I have a hunch that it's messing with the outputted text which is needed for SD-api to work

@Brawlence
Copy link
Contributor Author

Brawlence commented Apr 8, 2023

@altoiddealer

I have this extension running and it seems like it is working as intended- -The bot types something -SD using that as a prompt to generate an image -The image appears in the chat, and also in the /sd_api_pictures/outputs directory.

However, the output .PNG images do not have any Stable Diffusion metadata., which is very unfortunate.

Could you open this as an issue? I'll look if it's possible to get metadata sent via the API too

@francoisatt
Copy link

without googletranslate:
image

I don't uderstand the problem...

@tandpastatester
Copy link

tandpastatester commented Apr 9, 2023

@Brawlence
Awesome work so far. I successfully got your extension running with my Ooba and SD installations. The VRAM loading/offloading works well, which is very useful to run everything on the same machine.

I would like to understand a little better how the prompting works, though. I can't seem to get the extension to output images that are corresponding with the context/theme of the conversation. E.g:
{0131F241-126F-4F99-ABE1-BC94610166FD}

I assume it just sent “Alright. Let's see if you are impressed.” as a prompt to SD, without using any of the prompt prefixes or negative prompts that I configured in the settings below the chat interface. Is there any way I can actually see the full prompts that are being communicated? Is it logged somewhere (either in SD or in the text-generation-webui) where I can see what it does (and does not) in order to understand what it's trying to do and how to get a better result?

@Brawlence
Copy link
Contributor Author

@tandpastatester the easiest way right now is to open the auto1111's WebUI (even with the currently unloaded model) and click the 'show previous generation parameters' button (the leftmost one under the 'Generate/Stop' on the txt2img tab

I'm currently figuring out how to solve for #920 which would shed light on this issue as well.

In the meantime, try changing the prefix to include more tags for the character OR forcing the generation on something you predict to be very descriptive, it usually does way better on long outputs than on short ones

@ClayShoaf
Copy link
Contributor

ClayShoaf commented Apr 10, 2023

@tandpastatester That looks like you are not using the correct VAE in Stable Diffusion. You can see what is being sent by SD underneather the picture that is returned in oobabooga. It is a combination of:

  • the generation parameters in the sd_api_pictures block on the text-generation-webui and
  • followed by the text that you see underneath the image that is returned.

Unfortunately, you can't really get detailed generations like what you're looking for. I was trying to work on something that could make them a little more customizable, but, like @Brawlence, I cannot seem to figure out how to get the name of the currently used bot anywhere from within this project.

EDIT: Wow... I just checked ffd102e and everything I was doing will have to be reworked. I can't keep up with this stuff.

Ph0rk0z pushed a commit to Ph0rk0z/text-generation-webui-testing that referenced this pull request Apr 17, 2023
 Extension: Stable Diffusion Api integration
Ph0rk0z pushed a commit to Ph0rk0z/text-generation-webui-testing that referenced this pull request Apr 17, 2023
@St33lMouse
Copy link

Here's a trick: You can use a lora of your character if you have one and put it in the prompt. That will make a stable character when you ask it to send photos of itself. You can also use png information to give you the prompt which generated the character, so you'll get a good start of what your character should look like.

@Doomed1986
Copy link

This is awesome...wish i could use it though :-( I'm using a 7.16gb model on a 8gb gpu.
Any word of implementing this extension for poor people like me? Something like shifting recourses if possible or unloading model shifting vram producing image then reloading model and resuming chat and receive picture. I'd wait a few minutes for this to happen as a trade off to get low vram card users onboard the picture gen train!

@Brawlence
Copy link
Contributor Author

Something like shifting recourses if possible or unloading model shifting vram producing image then reloading model and resuming chat and receive picture

That's literally already implemented

@Doomed1986
Copy link

It is? Great news!
I saw "TODO: We should really think about a way to juggle models around RAM and VRAM for this project to work on lower VRAM cards." in the description so.

@Xyem
Copy link

Xyem commented Nov 8, 2023

Is anyone aware of an equivalent that uses ComfyUI for the backend?

@PRCbubu
Copy link

PRCbubu commented Dec 25, 2023

Can anyone tell me how to disable SSL verification so that I don't get CERTIFICATE_VERIFY_FAILED error

image

@RandomInternetPreson
Copy link
Contributor

If you do this don't let your computer access the internet, I don't think this is very secure idk...But I like to use the webui on my mobile devices on my network and need to run the site as an https://ip:port so I can use things like the microphone on the mobile device. The best web browser I've found to work is Opera, you need to use the --ssl-keyfile and --ssl-certfile flags in the CMD_FLAGS.txt file, https://github.com/oobabooga/text-generation-webui?tab=readme-ov-file#gradio

I used this website: https://regery.com/en/security/ssl-tools/self-signed-certificate-generator to create the keys and certs, downloaded them, and put the location of the files after the appropriate flags.

These keys and certs are self signed, they normally come from a trusted external sources, so when you try to access via a web browser you will get a warning that the certs are not recognized.

@guispfilho
Copy link

Can anyone tell me how to disable SSL verification so that I don't get CERTIFICATE_VERIFY_FAILED error

Hi. Did you ended up figuring out how to fix this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.