-
Notifications
You must be signed in to change notification settings - Fork 2.8k
FEAT: Adds ability to integrate with GFPGAN to enhance faces and upscale images #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…n-dream into add-gfpgan-option
ldm/simplet2i.py
Outdated
| batch_size, | ||
| steps,cfg_scale,ddim_eta, | ||
| skip_normalize, | ||
| gfpgan_strength, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: as a matter of style, rather than threading this through into _txt2img and _img2img, it would make more sense for prompt2image to be responsible for calling this, by putting the call to _run_gfpgan just before this line. That way the underlying _txt2img and _img2img can continue to only have a single responsibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense, I had already pulled when main was updated, so will go back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well actually, the _samples_to_images generator is being run in each of those two methods which means they still have the side effect of generating the image and writing it to disk - which means we have to pass the information down.
If we have both methods yield 'samples' enums we can then do it at the top level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see. disregard!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved, thank you!
|
Wow! This Feature is really a Feat. I've just spent a couple hours playing with it. Very, very useful. Thank you! BTW @Oceanswave, I have temporarily disabled the variant generation code because I think we can do something more generic by providing a routine that generates multiple prompts and parameters according to a templating system. In the meantime, I've let people select the previous nth images seed using -S <-n>, as in -S -1 to get the previous image's seed. And it just now dawned on me that the right thing to do is to create a listable queue of images and seeds so that the user can select them for img2img variations: dream> my prompt -S -2 -I -3 |
|
Thank you for this PR but I feel there's a couple of issues I want to bring up here. Preloading the GFPGAN module has downsides as expected.
A proposed solution here would be to load the GFPGAN Model ONLY if the prompt has the -G argument. Even in this case, I'd recommend loading the GFPGAN content only after the image has been generated. This might take a few seconds to load, upscale and offload but from a user experience point of view, I feel this is far superior. This way we'd retain the best of both worlds. In simpler terms, GFPGAN and any other module that is added to enhance the result of the SD code should be implemented and treated as post processing unless a process explicitly demands interfering with the image generation process of SD. I've done a similar implementation locally with the GoBIG module and it works a whole lot better in the bigger picture. |
|
Thoughts:
GPFGAN v1.3 is 340MB |
|
|
Yep, that's a bug - thanks for the co-report, #101 And that's not loaded. Hence the error. And of course you're right it's 340MB |
|
Fixing this bug does not necessarily solve the memory issue. You're just bypassing it. By booting GFPGAN at load, you are blocking a good chunk of VRAM for GFPGAN whether it is called into play or not. This decreases the ability of SD itself and for no reason. GFPGAN can be called and loaded AFTER the image has been generated with SD and then immediately offloaded after the upscaling is done. This solves a bunch of issues.
|
|
Feel free to create a PR to do that, I'll be forking because I don't want to wait the 20 seconds for it to run a single GPFGAN run that literally takes 2 seconds |
|
It will not take 20 seconds. torch is already loaded. GFPGAN loads up in just a couple of seconds.. maybe one or two more when reinitialized. Once that is done, you just offload the GFPGANer and not the entire torch library. I'll try and make a PR when I get the chance. |
|
Yeah i'm not going to wait extra seconds for 340MB to be transferred from disk to ram to vram on every image. Sorry. have fun! Sorry, I think I see where it can just be loaded on each batch, not every image. It's 5am where I'm at and I'm a bit grouchy having been up all night messing with this. |
All good. Just trying to find the ideal solution for this. I'm not a fan of memory overload especially when it takes away from the original script itself. Because as good as GFPGAN is, it's usage is situational. I'm just trying to find the best way to make it be a prompt line argument rather than something that is preloaded. |
|
Alright, being more objective about this, sounds like you want it on-demand, I want it always, and others want it never. can we make the initial startup command indicate the behavior? --gfpgan always or --gfpgan ondemand. The default is never, obviously. |
I'm refactoring it in a way where the gfpgan_strength (-G) argument will control it. It will be defaulted to 0. Meaning it is turned off by default. The other gfpgan boot arguments that you added will remain as it is. We can move them to work later but I think I'd rather keep them here. That way the user prompt will be super clean without having to worry about these. When a user types -G0.5 with their prompt, GFPGAN will be enabled for that generation with a strength of 0.5 -- GFPGAN will do its job and offload. It's taking me 2 seconds for the entire process at the moment and the memory for SD remains intact. I'll push a PR in a short while. Also, RealESRGAN has a x4 model. We should look at implementing that. The inference offers the code for it but I'm not sure if the utils script we're supporting at the moment supports it. From my few tries with it, I noticed it might be better. Even if we don't offer x4 scaling, we can use it to scale x4 and then download x2 for a much sharper image. Have a look and see what you think about it. |
|
@Oceanswave #102 Made a PR with the changes. Please feel free to have a look and let me know if anything needs changing or can be improved. From the little tests I did, there's virtually no difference in inference time while VRAM is being saved for SD. |
Resolves #87