Skip to content

Conversation

@Oceanswave
Copy link
Contributor

@Oceanswave Oceanswave commented Aug 26, 2022

Resolves #87

@lstein lstein self-assigned this Aug 26, 2022
ldm/simplet2i.py Outdated
batch_size,
steps,cfg_scale,ddim_eta,
skip_normalize,
gfpgan_strength,
Copy link
Contributor

@bakkot bakkot Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: as a matter of style, rather than threading this through into _txt2img and _img2img, it would make more sense for prompt2image to be responsible for calling this, by putting the call to _run_gfpgan just before this line. That way the underlying _txt2img and _img2img can continue to only have a single responsibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense, I had already pulled when main was updated, so will go back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well actually, the _samples_to_images generator is being run in each of those two methods which means they still have the side effect of generating the image and writing it to disk - which means we have to pass the information down.

If we have both methods yield 'samples' enums we can then do it at the top level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see. disregard!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved, thank you!

@lstein lstein merged commit dd67020 into invoke-ai:main Aug 26, 2022
@lstein
Copy link
Collaborator

lstein commented Aug 26, 2022

Wow! This Feature is really a Feat. I've just spent a couple hours playing with it. Very, very useful. Thank you!

BTW @Oceanswave, I have temporarily disabled the variant generation code because I think we can do something more generic by providing a routine that generates multiple prompts and parameters according to a templating system. In the meantime, I've let people select the previous nth images seed using -S <-n>, as in -S -1 to get the previous image's seed. And it just now dawned on me that the right thing to do is to create a listable queue of images and seeds so that the user can select them for img2img variations:

dream> my prompt -S -2 -I -3
reusing image outputs/image-samples/000004.45632957.png
reusing seed 83184575
dream>/queue
[-10] outputs/image-samples/000004.533432952.png
[-9] outputs/image-samples/000005.83721917.png
...

@blessedcoolant
Copy link
Collaborator

blessedcoolant commented Aug 26, 2022

Thank you for this PR but I feel there's a couple of issues I want to bring up here.

Preloading the GFPGAN module has downsides as expected.

  1. It takes up some VRAM away from Stable Diffusion which I feel should be avoided at all costs. The goal of this repo should be to maximize the performance of the base SD code.
  2. GFPFAN is performed on every image if it is preloaded with the initial script irrespective of whether a prompt needs it or not. Which makes the interactive tool a bit counter productive.

A proposed solution here would be to load the GFPGAN Model ONLY if the prompt has the -G argument. Even in this case, I'd recommend loading the GFPGAN content only after the image has been generated. This might take a few seconds to load, upscale and offload but from a user experience point of view, I feel this is far superior. This way we'd retain the best of both worlds.

In simpler terms, GFPGAN and any other module that is added to enhance the result of the SD code should be implemented and treated as post processing unless a process explicitly demands interfering with the image generation process of SD.

I've done a similar implementation locally with the GoBIG module and it works a whole lot better in the bigger picture.

@Oceanswave
Copy link
Contributor Author

Oceanswave commented Aug 26, 2022

Thoughts:

  1. This is totally optional, opt-in, you don't need any GPFGAN dependencies if you don't want them - I don't think the goal should be performance, rather, ease of use.
  2. Again, if you don't add the --gfpgan flag at startup, it doesn't load the GFPGAN model at all. Additionally, after you've specified --gfpgan at startup, -G is only run if it's specified with a positive float along with the prompt, totally skips it on each image if you don't. Your suggestion is already part of the repo. As far as dynamically loading, it actually takes longer to load than to process the image by like a factor of 10. So if the focus is on performance, and one has opted in, then by those terms it's doing the most performant thing ;)

GPFGAN v1.3 is 340MB

@blessedcoolant
Copy link
Collaborator

blessedcoolant commented Aug 26, 2022

Thoughts:

  1. This is totally optional, opt-in, you don't need any GPFGAN dependencies if you don't want them - I don't think the goal should be performance, rather, ease of use.
  2. Again, if you don't add the --gfpgan flag at startup, it doesn't load the GFPGAN model at all. Additionally, after you've specified --gfpgan at startup, -G is only run if it's specified with a positive float along with the prompt, totally skips it on each image if you don't. Your suggestion is already part of the repo. As far as dynamically loading, it actually takes longer to load than to process the image by like a factor of 10. So if the focus is on performance, and one has opted in, then by those terms it's doing the most performant thing ;)

GPFGAN v1.3 is 340KB

  1. I have to disagree. Performance is critical. By letting the user have total control on when to load GFPGAN, not only will you be allowing a better experience with SD itself but also the post processing with GFPGAN. That would contribute to what you term as 'ease of use'. As you can see, a similar issue has already been raised as a concern by someone else. GFPGAN tries to run even when I don't specify --gfpgan #100
  2. As for your second point, there is zero reason for GFPGAN to be loaded at initialization. Not only does this slow the entire process but also apply GFPGAN to every prompt irrespective of whether it needs it or not. GFPGAN should be loaded and offloaded only when a prompt calls for it. There is no reason for it to be hogging memory when it is not needed. I just moved around some of the code you wrote to load GFPGAN with a prompt and I can already see the performance and user experience boost. I hope you do it yourself and see how it feels.
  3. GPGAN is 340MB. Not 340KB which is a considerable size.

@Oceanswave
Copy link
Contributor Author

Oceanswave commented Aug 26, 2022

Yep, that's a bug - thanks for the co-report, #101

And that's not loaded. Hence the error.

And of course you're right it's 340MB

@blessedcoolant
Copy link
Collaborator

Fixing this bug does not necessarily solve the memory issue. You're just bypassing it.

By booting GFPGAN at load, you are blocking a good chunk of VRAM for GFPGAN whether it is called into play or not. This decreases the ability of SD itself and for no reason.

GFPGAN can be called and loaded AFTER the image has been generated with SD and then immediately offloaded after the upscaling is done. This solves a bunch of issues.

  1. You will no longer be blocking VRAM that can be used by SD. In my case, it's nearly 100px worth of resolution that needs to come from SD.
  2. You will not be loading the heavy torch modules twice. First in dream.py to load GFPGAN and then again in the T2I module. By loading GFPGAN after, you use it directly from the memory.
  3. All you need to do to check this out is set the GFPGAN default strength to 0 and let -G argument for any value above 0 trigger the entire GFPGAN module. Instead of loading GFPGAN in dream.py, load it ONLY when -G value is set to greater than 0 by the user. If it is set so, load GFPGAN, do the upscaling, offload GFPGAN. Done. This is the only time this needs to take memory. No reason to have it running all the time.

@Oceanswave
Copy link
Contributor Author

Oceanswave commented Aug 26, 2022

Feel free to create a PR to do that, I'll be forking because I don't want to wait the 20 seconds for it to run a single GPFGAN run that literally takes 2 seconds

@blessedcoolant
Copy link
Collaborator

It will not take 20 seconds. torch is already loaded. GFPGAN loads up in just a couple of seconds.. maybe one or two more when reinitialized. Once that is done, you just offload the GFPGANer and not the entire torch library.

I'll try and make a PR when I get the chance.

@Oceanswave
Copy link
Contributor Author

Oceanswave commented Aug 26, 2022

Yeah i'm not going to wait extra seconds for 340MB to be transferred from disk to ram to vram on every image. Sorry. have fun!

Sorry, I think I see where it can just be loaded on each batch, not every image. It's 5am where I'm at and I'm a bit grouchy having been up all night messing with this.

@blessedcoolant
Copy link
Collaborator

Yeah i'm not going to wait extra seconds for 340MB to be transferred from disk to ram to vram on every image. Sorry. have fun!

Sorry, I think I see where it can just be loaded on each batch, not every image. It's 5am where I'm at and I'm a bit grouchy having been up all night messing with this.

All good. Just trying to find the ideal solution for this. I'm not a fan of memory overload especially when it takes away from the original script itself.

Because as good as GFPGAN is, it's usage is situational. I'm just trying to find the best way to make it be a prompt line argument rather than something that is preloaded.

@Oceanswave
Copy link
Contributor Author

Alright, being more objective about this, sounds like you want it on-demand, I want it always, and others want it never.

can we make the initial startup command indicate the behavior? --gfpgan always or --gfpgan ondemand. The default is never, obviously.

@blessedcoolant
Copy link
Collaborator

Alright, being more objective about this, sounds like you want it on-demand, I want it always, and others want it never.

can we make the initial startup command indicate the behavior? --gfpgan always or --gfpgan ondemand. The default is never, obviously.

I'm refactoring it in a way where the gfpgan_strength (-G) argument will control it. It will be defaulted to 0. Meaning it is turned off by default. The other gfpgan boot arguments that you added will remain as it is. We can move them to work later but I think I'd rather keep them here. That way the user prompt will be super clean without having to worry about these.

When a user types -G0.5 with their prompt, GFPGAN will be enabled for that generation with a strength of 0.5 -- GFPGAN will do its job and offload.

It's taking me 2 seconds for the entire process at the moment and the memory for SD remains intact.

I'll push a PR in a short while.

Also, RealESRGAN has a x4 model. We should look at implementing that. The inference offers the code for it but I'm not sure if the utils script we're supporting at the moment supports it. From my few tries with it, I noticed it might be better. Even if we don't offer x4 scaling, we can use it to scale x4 and then download x2 for a much sharper image. Have a look and see what you think about it.

@blessedcoolant
Copy link
Collaborator

@Oceanswave #102 Made a PR with the changes. Please feel free to have a look and let me know if anything needs changing or can be improved.

From the little tests I did, there's virtually no difference in inference time while VRAM is being saved for SD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Include GFPGAN

4 participants