-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Optimize GFPGAN and Improve User Experience #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Additionally, I'd also look into separating both GFPGAN and Real-ESRGAN. GFPGAN is for blind face restoration and Real-ESRGAN is for upscaling. I think it'd be better for users to have the option to access these individually so they can use GFPGAN with Real-ESRGAN on images with faces and use just Real-ESRGAN for images that just need upscaling. |
The script now generates the low-res images first and then performs an upscaling function. This is good because you are less likely to avoid memory crashes when there is a large batch and more iterations. This way the memory is not shared between SD and GFPGAN. Probably the best solution to this particular issue.
|
I've made some further improvements. Instead of running GFPGAN while still processing SD and leading to a ton of memory issues, the app now runs GFPGAN after SD's images have been generated. While SD is running, the app generates images and previews for the user to see their progress. When the processing ends, the app rescales all of the generated images to an upscaled version using GFPGAN and Real-ESRGAN. This way no matter how big the batch size gets or samples get, there will likely not be any memory or CUDA crashes. I've tested it out quite a bit. Seems to be working great. Let me know if you want any changes. @lstein Should be good to pull now if u wish. |
|
Hi Folks, will inspect and merge this shortly. This project has grown quite a bit so I'm going to start to do formal feature branches, bugfix branches, etc. I don't want users to get frustrated. |
Thank you. Will keep an eye out for it. No issues. It's the right thing to do if this repo is going to grow larger. |
|
Unable to get this to run - each iteration says GFPGAN not initialized, it must be loaded via the --gfpgan argument when specifying the -G argument, but that option has been removed |
|
Also, why iterate through the images twice when it's already happening after each image is generated on line 321? |
ldm/simplet2i.py
Outdated
| image_callback(image, seed) | ||
| seed = self._new_seed() | ||
|
|
||
| if gfpgan_strength > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This never gets run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you passing GFPGAN as a prompt argument?
<prompt> -G 0.5
I noticed another PR changed GFPGAN's value to null instead of 0. Maybe that's causing an overlap issue?
Wait i'll do a fresh pull and see. Coz it's working flawlessly on my side.
Not sure. Its working fine on my end with the current live code. You no longer do the --gfpgan launch command. Does it say where error originates? Also the reason I'm iterating twice is to find a solution for single and multiple iterations. If I run GFPGAN for a single iteration in the same loop as SD, it's fine. But the cache does not get flushed if the user does multiple iterations. And generating a completely new tensor for each and every iteration for both GFPGAN and SD was causing a memory crash for the first iteration. So instead of doing that, I chose to save the resulting data and upscale them after SD has done its job. I could loop just once but I did it twice ... so that I can save images while SD is working so the user can have feedback and at the same time have upscaled images when SD is done performing. Basically a workaround to have both the memory optimization and also retain the functionality that exists right now. |
|
It doesn't report any error, just the message that it can't be initialized and it fails, I get the non gfpgan'd image in my output |
Unsure. It's hard to tell without more info. I just made a clean pull from github and merged with mine. Seems to be working fine. Can you check if the GFPGANer class init is correctly set to self.gfpgan. That's the only reason I can think of that would stop GFPGAN from starting. Also check if gpfgan_strength is defaulted to None instead of 0 in dream.py |
|
ok, I created a new conda environment and activated that, probably a me issue from testing the installation yesterday My time for is now twice over as long and it outputs that it processed 12 files instead of 6: So that seems to be bad, also my VRAM usage is now worse after the first run because it seems to keep it gfpgan around until the GC gets to it, so I have to wait around for it to clear. This seems to be an extremely bad user experience. It gives me a TON of errors if I didn't set my gfpgan up correctly the first time (which I didn't since I switched conda environments). |
|
I haven't even gotten that far. The problem I'm having with this PR is that the --gfpgan command line switch is missing from the argv parsing list. Was it intended that gfpgan be activated by specifying --gfpgan_strength when starting the script? |
|
Ok, I think this is still a work in progress and will convert into a draft. I'm going to branch the current version (with Oceanswave second PR) into a feature branch so that people can continue to work with it. |
The GFPGAN implementation is yours. I did nothing there. If anything, I got rid of extra user warnings that are from the obsolete GFPGAN code. Even in your code, you were generating 12 images -- 6 from SD and 6 Upscaled. The only difference is that I am materializing them to the disk so the user has a preview. I am not sure what issues you are facing with VRAM usage. For me the load has been far better because you are no longer running GFPGAN and SD at the same time. In your implementation,. 512x768 which easily renders on my system kept crashing because GFPGAN started to hog the memory while the SD tensors were still in loop. The GC collection remains the same. To be very honest, I think this is a more ideal and user friendly way to do it. Because instead of going through the hoops of setting up a bunch of starter commands, a user can just trigger and launch GFPGAN with the help of a prompt which is the most natural way to do it. If you hate seeing the log of images being saved, it's a one line fix to do that. Either way, I'll let @lstein decide which one he prefers. Personally I think I am staying with mine because I truly believe it consumes far less memory overload and has a much nicer way of doing upscaling through prompts. |
You no longer need to use --gfpgan with this method. You can use the other commands to change the locations of the GFPGAN folders but otherwise, the entire process now works directly from the prompt that depends on the value of the gfpgan_strength. It's intended. Why have two different start up modes for the script when the base script can handle the same thing through prompts right? And I feel a user would much rather just type -G 0.7 and have the upscaling trigger rather than restart the script with an without --gfpgan. Not to mention, the initial boot time will be much faster because you're not preloading GFPGAN. And the eventual GFPGAN load will be faster too because the torch modules are already in play by then thanks to Stable Diffusion. |
|
Everyone, thank you all for your efforts on this. Further work on gfpgan should be done on a new branch named gfpgan-development. From the maintenance point of view, I'd like to see the majority of the gfpgan code moved into its own module located under ldm/dream/ so that it isn't cluttering the already busy dream.py and simplet2i.py modules. |
But we talked about this, another loop has been introduced and it's running _run_gfpgan for each image twice.
But it's not, since every image it loads and unloads a new instance of GFPGAN
I'm a big fan of failing fast - if a user started with --gfpgan, we know apriori that we should test for gfpgan dependencies before failing at runtime |
|
Slightly off topic, but is there a way to get both GPUs going during an
iteration run on a multi-GPU system? That would be a major benefit for a
few lucky people.
I’m taking a break from code review and testing tonight, and will weigh in
on the GFPGAN controversy tomorrow.
Lincoln
On Fri, Aug 26, 2022 at 1:37 PM Sean McLellan ***@***.***> wrote:
Agreed. The current implementation has all the images logged at break
time. Either if its the regular SD version or the upscaled.. depending on
what part of the process we're at.
But if I say -n5 -G1, and I ^C after the 3rd, because I saw the first and
I got want I wanted in terms of content, I don't have the upscaled version
at that point, so the operation isn't really 'complete' at that point. on
the main branch, if run and see an image on the file system it's done done.
no takebacksies.
There's virtually no difference for inference of GFPGAN whether we load it
initially or load it during. The only difference is the memory. Having
GFPGAN in memory when not being used is not worth it imo.
Say I have a slow network share that my code is on, that I'm mounting. for
every image it's making a call to see if gfpgan exists, loading the model
(it could have changed) across the network, then getting it into VRAM.
processing the image, setting gfpgan to None, which lets the GC free it on
the next cycle.
Probably what I'm hitting on this PR is things are getting loaded faster
than the GC is able to get to them and I eventually get an OOM
—
Reply to this email directly, view it on GitHub
<#102 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA3EVPIU7XAQ3VKTFZ2Q23V3D6FZANCNFSM57WGQHAQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Written on my cell phone. Anything that seems odd is the fault of
auto-correct.
|
|
So I made some changes on my end. Here's what I settled with. Patch 1Both GFPGAN and Real-ESRGAN are now independent of each other. Rather than triggering both each time, users can now control them independently depending on the kind of their prompt. To support this, I added two new prompt arguments.
Supplying the -U prompt argument will trigger Real-ESRGAN Image Scaling (that was running currently with GFPGAN). This will NOT perform Face Restoration. That is the job of GFPGAN so we should let it do that.
Supplying the -G prompt argument will trigger GFPGAN. This exclusively performs face restoration. It no longer performs scaling.
Memory IssuesAfter trying out a ton of options, I have to once again get back to the point that my implementation of the pipeline is the most ideal one in regards to GFPGAN and Real-ESRGAN at this current point of time. The main goal is to create beautiful images and hopefully cater to systems even with lower VRAM so the maximum number of people can use it. With that in mind, I have to reaffirm on my feeling that to avoid any memory issues, GFPGAN and Real ESRGAN will be performed after SD has completed image generation. Performing them while SD is still in memory will inevitably crash the generation sooner than later. I have generated over 1500 images since we last spoke with the new system while scaling all of them by 2X and performing face restoration. I have not had a single crash by OOM so far. The big issue of a user canceling a generation to not have an upscaled result is no longer a problem. If a user cancels a generation, they will have all images generated until that point. If they like any of those images, they can just get the seed from it and repass it to the generator along with -U and -G to perform scaling and face restoration if they want to. Problem solved. This is what I have so far. I am trying to make the whole process even better. If I come up with any more ideas, I'll post here. But if you guys agree with this, I think I might make a new PR for that in a day or two with all the changes I mentioned above and any others that I might add. @lstein I will have a look into it. Maybe there are ways to defer them to a secondary GPU. But like you said, it'll be handy only for a lucky few .. but no reason not to implement it if we can. |
|
@lstein @Oceanswave I've pushed the Patch 1 detailed above and all the changes that I made with it. Completed the documentation too. I've generated 1500 images upscaled to 2x and face restored overnight. It did not crash at all on an RTX 3080 with 8GB. The base images are inferred and ready for viewing at the same time as it is now. The upscaling is performed after image generation without it affecting usage or viewing of the results in anyway. Overall at this current time I strongly believe there is no better way of implementing this than what is currently done. It is almost fool proof while providing great results, ease of usage and no memory compromise. For detailed documentation on how GFPGAN and RealESRGAN work now, please read the documentation update I made in the Readme. https://github.com/blessedcoolant/stable-diffusion#gfpgan-and-real-esrgan-support Thank you. NOTE The outputs console shows two images being generated when stuff is either face restored or upscaled. This is technically true because two images are being generated while only 1 is being saved. I can suppress this if needed but I let it stay because the user knows they have the option to use |
|
Thanks for all the work on this! I am testing now after having taken a break for a family event. |
No rush mate. :) Take your time. Thank you for your time and effort too. |
|
@blessedcoolant Is there any particular reason you changed the default image height to 768? |
Oh crap. That's just my defaults for my own system so I don't have to keep passing it. I forgot to reset it before I made the push. I changed it back now. Sorry about that. |
|
@blessedcoolant I've run into another problem. For benchmarking I'm running 5 iterations with the same starting seed each time. What I see is five different images being generated during the stable diffusion pass, but then during the GFPGAN pass only the second image is processed -- five times! So I get five of the same image. The GFPGAN logging messages do refer to the correct seeds (the fixed starting seed followed by 4 randomly generated seeds), so the logic is off somewhere. |
|
@lstein Please check now. Fixed it. |
|
Hi folks. I've done a series of benchmarks on my 32G GPU, and summary is here: https://docs.google.com/spreadsheets/d/1YXRlzp9TT9gRQMvbzKXW0xRjUI4bllHgfVcil0kS4oI/edit?usp=sharing Take home messages:
I think that users who want high quality faces and upscaling would rather trade off 10% runtime increase for 10% VRAM utilisation decrease. So my inclination is to bring in @blessedcoolant 's implementation and acknowledge both contributors for contributing the code to this working. I am so impressed with the improvements one can achieve with the combo of GFPGAN and Real-ESGAN! I'll just wait for the afternoon (~4h) for any comments on the benchmarking and conclusions before hitting the merge button. |
Thank you so much for taking the time to do this testing. Looking forward to the merge. As for point 4 you mentioned, you will be able to repro that issue when you hit a resolution that sits on the very edge of what your system can handle. For a lot of people that is 512x512 on 6GB cards and 512x768 on 8GB cards. But either way, this PR solves it. Even with img2img. I am currently looking at ways to actually decrease the memory footprint of SD at the very core but trying to understand that code is a bit above my head. But I'm learning. If I can figure something out, I'll make a new PR. |
|
@blessedcoolant; I apologize profusely for coming into this so late, but this PR really needs to have a OTOH, the final merge decision is, as always, up to @lstein. |
I agree. It had a lot of changes made. I should have kept it cleaner but I was focused on the optimization. Sorry. If @lstein wishes to, he can rebase this and pull it. If not, I can do a fresh PR with a cleaner commit history. Either works fine with me. You could rebase it and make a PR that I can pull too if you have the time. I just popped out and won't be back for a few hours and don't want to hold up @lstein if he wishes to merge it before that. |
I really did consider this, but I'd be starting from 0 in understanding what's going on in here, which is, eh, "sub-optimal" at best when doing aggressive rebasing 😆 |
Here's all the list of changed files. ldm\dream\pngwriter.py |
|
Yes, I'm definitely going to rebase. |
|
@blessedcoolant I'm actually having a difficult time getting this to rebase cleanly. I wonder if you could rebase your commits and generate a clean PR against main? Thanks. |
Yep. I'm closing this PR. I'll raise a new clean one. |
|
Thanks so much. |
|
Picked up in #162 |
This PR addresses the following issues.
NOTE: I am not on my primary system. So I actually manually updated the code on Github. Someone double check that the formatting is correct before it is merged.
Thank you.