Skip to content

Conversation

@yunsaki
Copy link
Contributor

@yunsaki yunsaki commented Aug 25, 2022

First of all I want to apologise for this pull request. I really like this project and it's currently the best way to run stable diffusion locally imho. But I wanted to change some things...

What the hell I did

Well, I basically rewrote the entire dream.py and added some new things.

Why I did it

My idea was to add the ability to change parameters over time based on cmd settings, without having to copy, edit and re-enter every command. I had issues working with the original code, so I decided to rewrite it. However I do not think that my code is "better" or anything. I just did so I could find out how everything works and things everything according to my idea.

How it works

Every argument that I thought is worth modulating now takes a string that can either be <value> or <init_value>:<increment>. The type which they are interpreted is int by default and float for cfg_scale and strength. Using <value> result in a static interpretation of the value, while <init_value>:<increment> starts at the init_value which is the incremented every repetition by the increment. The repetitions are set via -r or --repeats. Supported values are: steps, seed, width, height, cfg_scale, strength. Additionally I have also added the -B or --feedback argument which passes the first image of every repetitions results to the next repetition. This however supersedes the recent addition of the -v option, which I am sorry for.

An example

a cyberpunk cityscape in the style of wadim kashin -S 10:1 -s 25:5 -C 7:0.5 -r 5 -B
The repeats are set to 5 (default 0), so 6 images will be generated. The seeds starts at 10 and is incremented by 1 every repetition. The steps start at 25 and are incremented by 5. The cfg_scale has an initial value of 7 and increases by 0.5 in every repetition. -B is enabled so the 5 images following the first one get the most recently generated image as an init_img (since n is not set to something higher than 1). This results in the following log output:

"a cyberpunk cityscape in the style of wadim kashin" -s 25:5 -b 1 -W 512 -H 512 -C 7:0.5 -r 5 -B
# outputs/img-samples/000131.10.png: "a cyberpunk cityscape in the style of wadim kashin" -s 25 -b 1 -W 512 -H 512 -C 7.0 -S 10
# outputs/img-samples/000132.11.png: "a cyberpunk cityscape in the style of wadim kashin" -s 30 -b 1 -W 512 -H 512 -C 7.5 -I outputs/img-samples/000131.10.png -f 0.75 -S 11
# outputs/img-samples/000133.12.png: "a cyberpunk cityscape in the style of wadim kashin" -s 35 -b 1 -W 512 -H 512 -C 8.0 -I outputs/img-samples/000132.11.png -f 0.75 -S 12
# outputs/img-samples/000134.13.png: "a cyberpunk cityscape in the style of wadim kashin" -s 40 -b 1 -W 512 -H 512 -C 8.5 -I outputs/img-samples/000133.12.png -f 0.75 -S 13
# outputs/img-samples/000135.14.png: "a cyberpunk cityscape in the style of wadim kashin" -s 45 -b 1 -W 512 -H 512 -C 9.0 -I outputs/img-samples/000134.13.png -f 0.75 -S 14
# outputs/img-samples/000136.15.png: "a cyberpunk cityscape in the style of wadim kashin" -s 50 -b 1 -W 512 -H 512 -C 9.5 -I outputs/img-samples/000135.14.png -f 0.75 -S 15

And the following images:
000131 10
000132 11
000133 12
000134 13
000135 14
000136 15

Granted, those aren't amazing, but I think with some experimentation you can do some nice things with those extra options.

What I have planned

I also want to add prompt modulation. Let's assume we have the prompt oil painting of a landscape and a list of things we want to add over time and increase the weighting of "in spring", "in summer", "in autumn", "in winter". The base prompt and the first item from the list get a set weighting of let's say 50: oil painting of a landscape:50 in spring:50. In the next step summer is added: oil painting of a landscape:50 in spring:49 in summer:1 and so on. Giving a list of prompts as arguments could also be nice. For some modification it might make more sense to adjust the ldm/simplet2i.py directly though.

What I didn't do

Extended user/usage documentation.

Thanks for reading this abomination.

@yunsaki
Copy link
Contributor Author

yunsaki commented Aug 25, 2022

Some changes I did are very arbitrary btw and I don't intend to force them.

@tildebyte
Copy link
Contributor

  1. Amazing ideas
  2. From a sheer engineering/architecture standpoint, this will probably be very difficult to rebase into this repo. Have you considered adding a completely separate script with a new name? Obviously, it's up to @lstein whether or not he wants an expanding collection of generation scripts, but we do already have an incoming 'dream_web.py' - maybe you could make a 'dream_variations.py' or something?

@yunsaki
Copy link
Contributor Author

yunsaki commented Aug 25, 2022

@tildebyte

Thanks! I did consider this being a separate file; and actually while I worked on it it was separate as well. If that is a better solution we can go with that too!

@lstein
Copy link
Collaborator

lstein commented Aug 25, 2022

@yunsaki I really appreciate the vision and engineering that went into this work. As you probably can tell I am a beginning python scripter (wrote my first script 2 weeks ago) and I can learn a lot from the idioms you used. However, I'm in the process of refactoring dream.py and simplet2i to make the whole system more flexible and easier to maintain, and at this point it will be very difficult for me to merge your changes into the repo. Also, as @tildebyte mentioned, there is now a dream_web.py script, and your syntax for introducing variations would be great to have there too.

So how about this? For now I can put your PR into a public branch and point people at it from the README because I think it will be very popular. Then, after I finish refactoring I will go through your code carefully and figure out how we can separate the prompt morphing code from the command-line processing and web processing code. I think there should be a module that takes a text prompt containing your variation syntax and returns a list of prompts that can be passed to the generation routines. This will preserve the basic architecture and separate the fancy bits from the UI bits. It will also support the web server well.

I also agree that this work supersedes the more limited variant generation that was brought in by an earlier PR.

Let me know what you prefer.

@bakkot
Copy link
Contributor

bakkot commented Aug 25, 2022

Incrementing the seed will produce completely different results - unlike parameters like step count or cfg_scale, seed x and x + 1 are basically unrelated to each other. So I don't think it makes sense to try to increment seed the way you increment other parameters.

But, you can pick two seeds and then interpolate between the two of them by running the noise generation step (which is where the seed is used) and then interpolating between those two arrays. That's what #81 does.

So I think it might make sense to handle this morphing a different way: instead of specifying a base value, a step size, and a number of steps for each parameter (which is what you currently do), you could instead specify a base value, a target value, and a number of steps, and then interpolate between those values as appropriate for each parameter. For simple parameters like step count there's no meaningful difference between the two, but that design will allow you to interpolate between seeds (and prompts!) as well as simpler parameters.

@yunsaki
Copy link
Contributor Author

yunsaki commented Aug 25, 2022

@lstein Thank you for the nice response! Great to hear that you can learn something from this. I'm not a python expert by any means though, so maybe just keep that in mind. :)

Putting this implementation into a public branch sounds good to me, feel free to do that! I will probably try to hack on your refactored version as well in the next few days, if I find the time to do so.

Great work, keep it up!

@bakkot I agree. I basically added the seed parameter to have an easy way to control/reproduce a changing seed. Honestly though, I would argue that you could keep my simpler implementation and add yours as well. Would it do any harm to do both things?

@lstein lstein changed the base branch from main to yunsaki-morphing-dream August 26, 2022 03:31
@lstein lstein marked this pull request as ready for review August 26, 2022 03:31
@lstein lstein deleted the branch invoke-ai:yunsaki-morphing-dream August 26, 2022 03:33
@lstein lstein closed this Aug 26, 2022
@lstein
Copy link
Collaborator

lstein commented Aug 26, 2022

@yunsaki I've never done a merge into a non-main branch before and I screwed it up. I'm trying to rectify it now.

@lstein lstein reopened this Aug 26, 2022
@lstein lstein merged commit cc0520a into invoke-ai:yunsaki-morphing-dream Aug 26, 2022
@TingTingin
Copy link

If steps are the only thing being interpolated then the image doesn't need to be completed to be generated and can instead be generated immediately as soon as that step is completed similar to what's asked in #99

@TingTingin
Copy link

TingTingin commented Aug 26, 2022

also a sort of prompt matrix like this

copied from https://github.com/hlky/stable-diffusion-webui
Prompt matrix
Separate multiple prompts using the | character, and the system will produce an image for every combination of them. For example, if you use a busy city street in a modern city|illustration|cinematic lighting prompt, there are four combinations possible (first part of prompt is always kept):

a busy city street in a modern city
a busy city street in a modern city, illustration
a busy city street in a modern city, cinematic lighting
a busy city street in a modern city, illustration, cinematic lighting

would be good to add too

@bakkot
Copy link
Contributor

bakkot commented Aug 27, 2022

If steps are the only thing being interpolated then the image doesn't need to be completed to be generated and can instead be generated immediately as soon as that step is completed

I don't think that's true. Running with 50 steps vs running with 100 steps but stopping early after 50 produces different results.

@yunsaki
Copy link
Contributor Author

yunsaki commented Aug 27, 2022

example, if you use a busy city street in a modern city|illustration|cinematic lighting prompt, there are four combinations possible

How would you handle permutations? Because it could get out of hand pretty quickly, if you use too many words. Also what if you want to use pipes in the prompt itself? (I've seen people do that)

And how would this work with the previous modifiers? Let's take the prompt a busy street in a modern city -r 9 -C 7:1. Should it generate 4 images for every repetition? This would mean (without a more low level implementation) that the seeds wouldn't stay the same unless you set your seed right in the beginning.

Yeah, I think that could be useful. But I would prefer using an additional argument for that, like -p/--permutations or -c/--combinations (not sure if -c is already taken). Might also do another complete rewrite to make it less messy, not sure.

@TingTingin
Copy link

TingTingin commented Aug 27, 2022

example, if you use a busy city street in a modern city|illustration|cinematic lighting prompt, there are four combinations possible

How would you handle permutations? Because it could get out of hand pretty quickly, if you use too many words. Also what if you want to use pipes in the prompt itself? (I've seen people do that)

And how would this work with the previous modifiers? Let's take the prompt a busy street in a modern city -r 9 -C 7:1. Should it generate 4 images for every repetition? This would mean (without a more low level implementation) that the seeds wouldn't stay the same unless you set your seed right in the beginning.

Yeah, I think that could be useful. But I would prefer using an additional argument for that, like -p/--permutations or -c/--combinations (not sure if -c is already taken). Might also do another complete rewrite to make it less messy, not sure.

Probably shouldn't be called permutations and combinations since laypeople might get confused by differences also I think showing a preview of exactly how many generations are going to happen would be a big help i.e something like

a busy street in | a modern city | illustration | cinematic lighting -r 9 -C 7:1 -combinations
Generating 4 images for each repetition (9) Total : 36 images

a busy street in | a modern city | illustration | cinematic lighting -r 9 -C 7:1 -permutations
Generating 24 images for each repetition (9) Total : 216 images

Obviously if people add too many words it will get out of hand but at least this will warn them before generation starts

@TingTingin
Copy link

TingTingin commented Aug 27, 2022

I also think another feature that would be nice would be an iteration mode for -r i.e if you have busy street in a modern city -r 9 -C 7:1 -s 25:1 instead of increasing both by one per repetition it would do so iteratively i.e

busy street in a modern city -r 9 -C 7:1 -s 25:1 -it
Generating 9 images for each repetition (9) Total : 81 images

Though at this point you probably want individual repetition settings so -r could be a global setting and something like this maybe?

busy street in a modern city -r 9 -C 7:1 -s 25:1:5 -it
Generating 5 images for each repetition (9) Total : 45 images

If there's no -r specified generating can just take whatever the largest number is

busy street in a modern city -C 7:1:6 -s 25:1:3 -it
Generating 3 images for each repetition (6) Total : 18 images

a busy street in | a modern city | illustration | cinematic lighting -C 7:1:6 -s 25:1:3 -it -combinations
Generating 12 images for each repetition (6) Total : 72 images

a busy street in | a modern city | illustration | cinematic lighting -C 7:1:6 -s 25:1:3 -it -permutations
Generating 72 images for each repetition (6) Total : 432 images

hopefully my math is correct

@TingTingin
Copy link

TingTingin commented Aug 27, 2022

If steps are the only thing being interpolated then the image doesn't need to be completed to be generated and can instead be generated immediately as soon as that step is completed

I don't think that's true. Running with 50 steps vs running with 100 steps but stopping early after 50 produces different results.

would still be good for showing in progress images if you wanted to integrate this into other program

@SMUsamaShah
Copy link
Contributor

If steps are the only thing being interpolated then the image doesn't need to be completed to be generated and can instead be generated immediately as soon as that step is completed

I don't think that's true. Running with 50 steps vs running with 100 steps but stopping early after 50 produces different results.

Is it even possible? Can we produce an image at each step? If you know it can be done can you please point out which part of code I should be looking at? I am alien to ML stuff and terminology and have almost no idea what is going on. Recently found that k_euler_a never converges and even at 10000 steps it will produce a different image. Now I want to produce image at each step. Running a loop incrementally by increasing steps on each iteration as proposed in this PR is too much work for a simple thing.

@bakkot
Copy link
Contributor

bakkot commented Aug 28, 2022

@SMUsamaShah It's easy for the things built in to this repo, but a little trickier for k_euler_a, which comes from the k_diffusion library. From looking at the code a little, I am guessing that by passing a callback=something parameter here (probably threaded through from here) you might be able to get a callback invoked at each step with this argument, in which x is the image data (which needs to be translated to something useful to be rendered, probably by calling _samples_to_images.

I haven't tried this but it's somewhere you can start looking.

EDIT: actually it looks like this PR is implementing that already (for the web ui); you might try that branch.

@yunsaki yunsaki mentioned this pull request Aug 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants