FEAT: Adds ability to integrate with GFPGAN to enhance faces and upscale images #98

Oceanswave · 2022-08-26T03:20:44Z

Resolves #87

…n-dream into add-gfpgan-option

bakkot · 2022-08-26T04:21:36Z

ldm/simplet2i.py

                 batch_size,
                 steps,cfg_scale,ddim_eta,
                 skip_normalize,
+                 gfpgan_strength,


Nit: as a matter of style, rather than threading this through into _txt2img and _img2img, it would make more sense for prompt2image to be responsible for calling this, by putting the call to _run_gfpgan just before this line. That way the underlying _txt2img and _img2img can continue to only have a single responsibility.

Yeah, that makes sense, I had already pulled when main was updated, so will go back.

well actually, the _samples_to_images generator is being run in each of those two methods which means they still have the side effect of generating the image and writing it to disk - which means we have to pass the information down.

If we have both methods yield 'samples' enums we can then do it at the top level

oh I see. disregard!

Resolved, thank you!

…n-dream into add-gfpgan-option

lstein · 2022-08-26T06:22:27Z

Wow! This Feature is really a Feat. I've just spent a couple hours playing with it. Very, very useful. Thank you!

BTW @Oceanswave, I have temporarily disabled the variant generation code because I think we can do something more generic by providing a routine that generates multiple prompts and parameters according to a templating system. In the meantime, I've let people select the previous nth images seed using -S <-n>, as in -S -1 to get the previous image's seed. And it just now dawned on me that the right thing to do is to create a listable queue of images and seeds so that the user can select them for img2img variations:

dream> my prompt -S -2 -I -3
reusing image outputs/image-samples/000004.45632957.png
reusing seed 83184575
dream>/queue
[-10] outputs/image-samples/000004.533432952.png
[-9] outputs/image-samples/000005.83721917.png
...

blessedcoolant · 2022-08-26T07:12:23Z

Thank you for this PR but I feel there's a couple of issues I want to bring up here.

Preloading the GFPGAN module has downsides as expected.

It takes up some VRAM away from Stable Diffusion which I feel should be avoided at all costs. The goal of this repo should be to maximize the performance of the base SD code.
GFPFAN is performed on every image if it is preloaded with the initial script irrespective of whether a prompt needs it or not. Which makes the interactive tool a bit counter productive.

A proposed solution here would be to load the GFPGAN Model ONLY if the prompt has the -G argument. Even in this case, I'd recommend loading the GFPGAN content only after the image has been generated. This might take a few seconds to load, upscale and offload but from a user experience point of view, I feel this is far superior. This way we'd retain the best of both worlds.

In simpler terms, GFPGAN and any other module that is added to enhance the result of the SD code should be implemented and treated as post processing unless a process explicitly demands interfering with the image generation process of SD.

I've done a similar implementation locally with the GoBIG module and it works a whole lot better in the bigger picture.

Oceanswave · 2022-08-26T07:33:21Z

Thoughts:

This is totally optional, opt-in, you don't need any GPFGAN dependencies if you don't want them - I don't think the goal should be performance, rather, ease of use.
Again, if you don't add the --gfpgan flag at startup, it doesn't load the GFPGAN model at all. Additionally, after you've specified --gfpgan at startup, -G is only run if it's specified with a positive float along with the prompt, totally skips it on each image if you don't. Your suggestion is already part of the repo. As far as dynamically loading, it actually takes longer to load than to process the image by like a factor of 10. So if the focus is on performance, and one has opted in, then by those terms it's doing the most performant thing ;)

GPFGAN v1.3 is 340MB

blessedcoolant · 2022-08-26T07:40:43Z

Thoughts:

This is totally optional, opt-in, you don't need any GPFGAN dependencies if you don't want them - I don't think the goal should be performance, rather, ease of use.

Again, if you don't add the --gfpgan flag at startup, it doesn't load the GFPGAN model at all. Additionally, after you've specified --gfpgan at startup, -G is only run if it's specified with a positive float along with the prompt, totally skips it on each image if you don't. Your suggestion is already part of the repo. As far as dynamically loading, it actually takes longer to load than to process the image by like a factor of 10. So if the focus is on performance, and one has opted in, then by those terms it's doing the most performant thing ;)

GPFGAN v1.3 is 340KB

I have to disagree. Performance is critical. By letting the user have total control on when to load GFPGAN, not only will you be allowing a better experience with SD itself but also the post processing with GFPGAN. That would contribute to what you term as 'ease of use'. As you can see, a similar issue has already been raised as a concern by someone else. GFPGAN tries to run even when I don't specify --gfpgan #100
As for your second point, there is zero reason for GFPGAN to be loaded at initialization. Not only does this slow the entire process but also apply GFPGAN to every prompt irrespective of whether it needs it or not. GFPGAN should be loaded and offloaded only when a prompt calls for it. There is no reason for it to be hogging memory when it is not needed. I just moved around some of the code you wrote to load GFPGAN with a prompt and I can already see the performance and user experience boost. I hope you do it yourself and see how it feels.
GPGAN is 340MB. Not 340KB which is a considerable size.

Oceanswave · 2022-08-26T07:58:01Z

Yep, that's a bug - thanks for the co-report, #101

And that's not loaded. Hence the error.

And of course you're right it's 340MB

blessedcoolant · 2022-08-26T08:13:17Z

Fixing this bug does not necessarily solve the memory issue. You're just bypassing it.

By booting GFPGAN at load, you are blocking a good chunk of VRAM for GFPGAN whether it is called into play or not. This decreases the ability of SD itself and for no reason.

GFPGAN can be called and loaded AFTER the image has been generated with SD and then immediately offloaded after the upscaling is done. This solves a bunch of issues.

You will no longer be blocking VRAM that can be used by SD. In my case, it's nearly 100px worth of resolution that needs to come from SD.
You will not be loading the heavy torch modules twice. First in dream.py to load GFPGAN and then again in the T2I module. By loading GFPGAN after, you use it directly from the memory.
All you need to do to check this out is set the GFPGAN default strength to 0 and let -G argument for any value above 0 trigger the entire GFPGAN module. Instead of loading GFPGAN in dream.py, load it ONLY when -G value is set to greater than 0 by the user. If it is set so, load GFPGAN, do the upscaling, offload GFPGAN. Done. This is the only time this needs to take memory. No reason to have it running all the time.

Oceanswave · 2022-08-26T08:17:24Z

Feel free to create a PR to do that, I'll be forking because I don't want to wait the 20 seconds for it to run a single GPFGAN run that literally takes 2 seconds

blessedcoolant · 2022-08-26T08:36:19Z

It will not take 20 seconds. torch is already loaded. GFPGAN loads up in just a couple of seconds.. maybe one or two more when reinitialized. Once that is done, you just offload the GFPGANer and not the entire torch library.

I'll try and make a PR when I get the chance.

Oceanswave · 2022-08-26T08:48:16Z

Yeah i'm not going to wait extra seconds for 340MB to be transferred from disk to ram to vram on every image. Sorry. have fun!

Sorry, I think I see where it can just be loaded on each batch, not every image. It's 5am where I'm at and I'm a bit grouchy having been up all night messing with this.

blessedcoolant · 2022-08-26T09:13:31Z

Yeah i'm not going to wait extra seconds for 340MB to be transferred from disk to ram to vram on every image. Sorry. have fun!

Sorry, I think I see where it can just be loaded on each batch, not every image. It's 5am where I'm at and I'm a bit grouchy having been up all night messing with this.

All good. Just trying to find the ideal solution for this. I'm not a fan of memory overload especially when it takes away from the original script itself.

Because as good as GFPGAN is, it's usage is situational. I'm just trying to find the best way to make it be a prompt line argument rather than something that is preloaded.

Oceanswave · 2022-08-26T09:44:18Z

Alright, being more objective about this, sounds like you want it on-demand, I want it always, and others want it never.

can we make the initial startup command indicate the behavior? --gfpgan always or --gfpgan ondemand. The default is never, obviously.

blessedcoolant · 2022-08-26T09:53:24Z

Alright, being more objective about this, sounds like you want it on-demand, I want it always, and others want it never.

can we make the initial startup command indicate the behavior? --gfpgan always or --gfpgan ondemand. The default is never, obviously.

I'm refactoring it in a way where the gfpgan_strength (-G) argument will control it. It will be defaulted to 0. Meaning it is turned off by default. The other gfpgan boot arguments that you added will remain as it is. We can move them to work later but I think I'd rather keep them here. That way the user prompt will be super clean without having to worry about these.

When a user types -G0.5 with their prompt, GFPGAN will be enabled for that generation with a strength of 0.5 -- GFPGAN will do its job and offload.

It's taking me 2 seconds for the entire process at the moment and the memory for SD remains intact.

I'll push a PR in a short while.

Also, RealESRGAN has a x4 model. We should look at implementing that. The inference offers the code for it but I'm not sure if the utils script we're supporting at the moment supports it. From my few tries with it, I noticed it might be better. Even if we don't offer x4 scaling, we can use it to scale x4 and then download x2 for a much sharper image. Have a look and see what you think about it.

blessedcoolant · 2022-08-26T10:25:41Z

@Oceanswave #102 Made a PR with the changes. Please feel free to have a look and let me know if anything needs changing or can be improved.

From the little tests I did, there's virtually no difference in inference time while VRAM is being saved for SD.

Oceanswave added 5 commits August 25, 2022 22:57

Add optional GFPGAN support

5711b6d

Merge branch 'main' of https://github.com/BaristaLabs/stable-diffusio…

dbb9132

…n-dream into add-gfpgan-option

Update readme, fix defaults for case-sensitive fs's

60ed004

Fix not being able to disable bgupscaler; update readme

3a30a8f

Remove the redundancy, better logging

cb86b9a

lstein self-assigned this Aug 26, 2022

bakkot reviewed Aug 26, 2022

View reviewed changes

Oceanswave added 5 commits August 26, 2022 00:26

Merge branch 'main' of https://github.com/BaristaLabs/stable-diffusio…

470a62d

…n-dream into add-gfpgan-option

Refactor so that behavior is consolidated at top level

fcdd95b

remove params

4f1664e

Fix blend if the target image has been upscaled

f1ffb5b

Fix backwards logic

407d70a

lstein merged commit dd67020 into invoke-ai:main Aug 26, 2022

FEAT: Adds ability to integrate with GFPGAN to enhance faces and upscale images #98

FEAT: Adds ability to integrate with GFPGAN to enhance faces and upscale images #98

Uh oh!

Conversation

Oceanswave commented Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bakkot Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Oceanswave Aug 26, 2022

Choose a reason for hiding this comment

Uh oh!

Oceanswave Aug 26, 2022

Choose a reason for hiding this comment

Uh oh!

Oceanswave Aug 26, 2022

Choose a reason for hiding this comment

Uh oh!

Oceanswave Aug 26, 2022

Choose a reason for hiding this comment

Uh oh!

lstein commented Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blessedcoolant commented Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Oceanswave commented Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blessedcoolant commented Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Oceanswave commented Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blessedcoolant commented Aug 26, 2022

Uh oh!

Oceanswave commented Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blessedcoolant commented Aug 26, 2022

Uh oh!

Oceanswave commented Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blessedcoolant commented Aug 26, 2022

Uh oh!

Oceanswave commented Aug 26, 2022

Uh oh!

blessedcoolant commented Aug 26, 2022

Uh oh!

blessedcoolant commented Aug 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Oceanswave commented Aug 26, 2022 •

edited

Loading

bakkot Aug 26, 2022 •

edited

Loading

lstein commented Aug 26, 2022 •

edited

Loading

blessedcoolant commented Aug 26, 2022 •

edited

Loading

Oceanswave commented Aug 26, 2022 •

edited

Loading

blessedcoolant commented Aug 26, 2022 •

edited

Loading

Oceanswave commented Aug 26, 2022 •

edited

Loading

Oceanswave commented Aug 26, 2022 •

edited

Loading

Oceanswave commented Aug 26, 2022 •

edited

Loading