Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Aug 29, 2022

adds 2 parameters for generating variations of a seed:
-z optional 0-1 value to slerp from -S noise to random noise (allows variations on an image)
-Z optional target seed that -S noise is slerped to (interpolate one image to another)

based on https://github.com/bakkot/stable-diffusion/tree/noise
this is an updated version of #81

tried to keep it as simple as possible, this is a really powerful feature IMO

adds 2 parameters for generating variations of a seed:
-z optional 0-1 value to slerp from -S noise to random noise (allows variations on an image)
-Z optional target seed that -S noise is slerped to (interpolate one image to another)

based on https://github.com/bakkot/stable-diffusion/tree/noise
@ghost
Copy link
Author

ghost commented Aug 29, 2022

one thing I forgot to mention you can use -n with this, so say you have a great prompt and seed, and want some variations on it, you'd tack on -z with some fuzz value and then -n with some number of iterations.

"my great prompt" -s32 -b1 -W576 -H512 -C9.0 -A k_euler -S2565070229 -z0.1 -n10

@ghost
Copy link
Author

ghost commented Aug 29, 2022

Another important note! If you really like one of the variations, you can re-generate it again by using the variation's seed as the target fuzz seed -Z

so say you did "my great prompt" -s32 -b1 -W576 -H512 -C9.0 -A k_euler -S2565070229 -z0.1 -n10 and really liked variation 4... and the seed in the filename & png info is 2586053648

You can just do
"my great prompt" -s32 -b1 -W576 -H512 -C9.0 -A k_euler -S2565070229 -z0.1 -Z2586053648
to regenerate it (for upscaling etc)

@undrash
Copy link

undrash commented Aug 29, 2022

I like the idea, gj

@blessedcoolant
Copy link
Collaborator

blessedcoolant commented Aug 29, 2022

Thank you for the PR. I concur that this is indeed a very powerful feature.

I did some testing. And here are my notes.

Seed Fuzzing

  1. Works as intended and generates great variations.
  2. I would recommend adding assertion values for -z so that any value below 0 and above 1 returns the assertion. The error cases is already handled but I'd also like the users to know that values below 0 and above 1 do nothing.
  3. Save the fuzzed image with their fuzzed seed rather than the original seed. This will allow regeneration of the fuzzed results through the seed later on.

Seed Fuzz Target

This one worked a little bit different than what I thought it would do. I expected -Z to create intermediary images between the source and the target with -z being the strength of that.

However I noticed that -z actually creates images off the fuzzed seed rather than the original and lower values of z create an image that is closer to the target than the source and higher values are closer to the fuzzed version of the original seed.

Not sure if that's intended but I'd expect it to work the other way around? With -z being the controller for the strength of the interpolation rather than fuzzing of the original seed? This is when provided in combination with -Z.

Hope to get some clarification on that. Works great overall. Brilliant PR. Good job. I'll do a review on this once you get back to me.

@morganavr
Copy link

With -z being the controller for the strength of the interpolation rather than fuzzing of the original seed? This is when provided in combination with -Z.

I also thought this is how it works. I expected command below to transform original image S2565070229 to 75% towards target image 578693498349

Example:
"my great prompt" -s32 -b1 -W576 -H512 -C9.0 -A k_euler -S2565070229 -z0.75 -Z 578693498349 -n10

@lstein
Copy link
Collaborator

lstein commented Aug 29, 2022

I like this and I know that @bakkot will be happy to see his PR #81 vision get back into main. I'm going to wait until there's a response to @blessedcoolant and @morganavr 's comments, and then will do my review.

@bakkot
Copy link
Contributor

bakkot commented Aug 29, 2022

For the "target seed" thing, I think there's a more general feature which interpolates between arbitrary settings (for the features for which it makes sense to do so, which includes seeds). I'd like that feature to exist (and will implement it if I have time...), which would subsume the "target seed" part of this PR. So maybe just do the "fuzzing" part of this for now?

@blessedcoolant
Copy link
Collaborator

For the "target seed" thing, I think there's a more general feature which interpolates between arbitrary settings (for the features for which it makes sense to do so, which includes seeds). I'd like that feature to exist (and will implement it if I have time...), which would subsume the "target seed" part of this PR. So maybe just do the "fuzzing" part of this for now?

I second this. The fuzzing seems to work quiet well. The interpolation is still not working as I'd expect it to. We can do fuzzing in this PR if you think you need more time to get the interpolation right.

I'd also recommend we change the name and call it -v' --variant` because it is technically that. It is a lot more descriptive to the regular user than the word fuzzing.

@morganavr
Copy link

I'd also recommend we change the name and call it -v' --variant` because it is technically that. It is a lot more descriptive to the regular user than the word fuzzing.

Agree, -v/--variant is so much easier to understand for regular users.

@warner-benjamin
Copy link
Contributor

warner-benjamin commented Aug 29, 2022

I'd also recommend we change the name and call it -v' --variant` because it is technically that. It is a lot more descriptive to the regular user than the word fuzzing.

We probably don't want to tie one type of variant generation to -v, but rather use -v to pick a variant generator by name. Perhaps this one could be called sphere_interp or sphlerp for short.

@ghost
Copy link
Author

ghost commented Aug 30, 2022

With -z being the controller for the strength of the interpolation rather than fuzzing of the original seed? This is when provided in combination with -Z.

I also thought this is how it works. I expected command below to transform original image S2565070229 to 75% towards target image 578693498349

Example: "my great prompt" -s32 -b1 -W576 -H512 -C9.0 -A k_euler -S2565070229 -z0.75 -Z 578693498349 -n10

@blessedcoolant
@morganavr

If I'm understanding correctly, you were both expecting 10 output images (-n10), where image 1 is 100% 256~ and image 10 is 75% 578~ ? (and images 2-9 are in-betweens of that?)

I can try to have it interpolate, but it would make assumptions (always starting at 0% then going up to -z) and I think would be better handled by a proper general-purpose interpolation of parameters

An alternate option is, it could do 10 variations on -S256 -z0.75 -Z 578 which might be interesting... and sticks with how just using -z -n works

@ghost
Copy link
Author

ghost commented Aug 30, 2022

@blessedcoolant

  1. Save the fuzzed image with their fuzzed seed rather than the original seed. This will allow regeneration of the fuzzed results through the seed later on.

This unfortunately isn't possible with just one seed, when fuzzing, what is happening is noise generated from -S is being slerped toward random noise (or -Z seeded noise).
So it is always reliant on the original -S seed's noise (until slerp gets to 1.0 when it is 100% second random seed noise). Theres no way to derive a single seed from the slerped noise.

You can still regenerate fuzzed results, but you need to provide -S OriginalSeed -Z RandomSeed (the random seeds are the ones that get written to the filename and png info)

However I noticed that -z actually creates images off the fuzzed seed rather than the original and lower values of z create an image that is closer to the target than the source and higher values are closer to the fuzzed version of the original seed.

Not sure if that's intended but I'd expect it to work the other way around? With -z being the controller for the strength of the interpolation rather than fuzzing of the original seed? This is when provided in combination with -Z.

-z is the controller for the strength of the interpolation, 0.0 being original seeded noise, 1.0 being the random seeded noise

But I have seen the behaviour where as soon as the slerp hits 0.5 it switches heavily toward the random seeded noise's image. I don't know why, but its similar to what occurs when using weighted prompts. It might be something specific to how SD works

I will try to save out just the raw noise and see if the sudden transition at 0.5 is noticeable there too.

@blessedcoolant
Copy link
Collaborator

If I'm understanding correctly, you were both expecting 10 output images (-n10), where image 1 is 100% 256~ and image 10 is 75% 578~ ? (and images 2-9 are in-betweens of that?)

Yes. I expected the interpolation to occur between -S and -Z at the rate of -n that is given.

An alternate option is, it could do 10 variations on -S256 -z0.75 -Z 578 which might be interesting... and sticks with how just using -z -n works

Do you mean you'll use -z to control the interpolation factor rather than -n ? You want to generate constant 10 images? I think letting the user pick the -n value for the number of variants is a better thing because those images can be used for creating transition effects and etc.

This unfortunately isn't possible with just one seed, when fuzzing, what is happening is noise generated from -S is being slerped toward random noise (or -Z seeded noise).

I guess it's not a major deal because we can always get back the same result by supplying the same original seed and -z value.

But I have seen the behaviour where as soon as the slerp hits 0.5 it switches heavily toward the random seeded noise's image. I don't know why, but its similar to what occurs when using weighted prompts. It might be something specific to how SD works

I've noticed that too. But I presumed it might be this particular PR handling the transition incorrectly. But I guess not.

Would like @lstein to weigh in on how to implement this. I think functionally it's almost there. Just deciding the user experience factor of it and going for a final push.

Thank you for the new pushes with the asserts. We can aim to release this over the next 24 hrs.

@morganavr
Copy link

morganavr commented Aug 30, 2022

@xraxra

If I'm understanding correctly, you were both expecting 10 output images (-n10), where image 1 is 100% 256~ and image 10 is 75% 578~ ? (and images 2-9 are in-betweens of that?)

No. With this prompt below I expect 10 images with 75% variation towards target seed 578693498349
"my great prompt" -s32 -b1 -W576 -H512 -C9.0 -A k_euler -S2565070229 -z0.75 -Z 578693498349 -n10

I want to use this feature mainly to create different types of variations of my favorite image:

  1. By specifying -z0.2 -S2565070229 arguments I want to see a slightly changed versions (variants) of my favorite image
  2. By specifying three parameters -z0.75 -S2565070229 -Z 578693498349 I want to see variants of my 1st favorite image (with seed S2565070229 ) towards 75% of my 2nd favorite image (with seed 578693498349)

When I specify -z0.75 argument I want it to remain 0.75 and not be changed from 0 to 0.75 in 10 iterations (with -n10 argument).


Although I can see a use case for the behavior @blessedcoolant mentioned:

Yes. I expected the interpolation to occur between -S and -Z at the rate of -n that is given.

It would create a series of images representing gradual interpolation between image A and B. Then these images I assume can be turned into a movie or GIF. Maybe if you specify additional --movie argument then it will work as @blessedcoolant wanted?

code gremlins...
@thelemuet
Copy link

thelemuet commented Aug 30, 2022

Obviously this a great feature for creating variations, but I would like to point out just how HUGE it is going to be for animating. I tested with txt2img, but I assume this will be implemented for img2img as well? I think that is where it will shine the most.

For txt2img, considering that the seed affects the composition in its entirety, incrementing -v value even by a small amount will have huge effect. I tested incrementing by 0.005 and it works quite well, however that's a lot of frames to generate if the goal is to create a smooth animation that interpolates between 2 seeds.

But for img2img because the overall composition is taken from init image, I suspect it will be able to take larger -v increments and still be able to create a smoother animation than any other methods possible previously.

Combined with prompt weighting which we already have, putting everything into a text file and loading using --from_file makes this very easy workflow. This open up a lot of ways of animating.

@bakkot
Copy link
Contributor

bakkot commented Aug 31, 2022

Can you show me these results? Do 10 versions maybe.

Sure:

10 dragons

000169 398792475
000169 651744428
000169 917153278
000169 1407084198
000169 1558963732
000169 1607217722
000169 1676606545
000169 3822675744
000169 4003230496
000169 4045227157

@bakkot
Copy link
Contributor

bakkot commented Aug 31, 2022

@lstein

Also, tildebyte will complain if the commit log becomes too convoluted, so please rebase -I to minimize commit messages.

You can use the "squash and merge" button on Github to get a clean, single-commit item in the history which links to the original PR. That's the usual flow for OSS projects, in my experience.

@morganavr
Copy link

morganavr commented Aug 31, 2022

@blessedcoolant

When I pass -S 5000 -v 0.1, I expect a VARIANT similar to the design of seed 5000. But this is NOT what happens.

You need to pass small values like -v 0.005 to get variant.
If you want variation - increase that value to 0.1 or higher.

That's what I'm calling as variants, full change in design = variation.

full change in design != variation. It's a totally different image just produced by the same prompt.

  • -v should create variants not variations
  • '-V' add a new tag for creating variations -- Currently -v does this functionality. So move the functionality to this.
  • -VI add new tag for interpolation -- Currently -V does this functionality. So move the functionality to this and fix the interpolation breaking at higher values. 1 should give the variant seed and 0 should give the original seed for intended behavior.

-v creates exactly the variants
-V is used to specify 2nd Seed value. I don't see what's wrong with it
-VI - we don't need this argument. For interpolation we have -V that works just fine.

I have a feeling @blessedcoolant that during a merge you resolved conflicts wrongly and that's why you local code does not work as in others.

@blessedcoolant
Copy link
Collaborator

@blessedcoolant

When I pass -S 5000 -v 0.1, I expect a VARIANT similar to the design of seed 5000. But this is NOT what happens.

You need to pass small values like -v 0.005 to get variant. If you want variation - increase that value to 0.1 or higher.

That's what I'm calling as variants, full change in design = variation.

full change in design != variation. It's a totally different image just produced by the same prompt.

  • -v should create variants not variations
  • '-V' add a new tag for creating variations -- Currently -v does this functionality. So move the functionality to this.
  • -VI add new tag for interpolation -- Currently -V does this functionality. So move the functionality to this and fix the interpolation breaking at higher values. 1 should give the variant seed and 0 should give the original seed for intended behavior.

-v creates exactly the variants -V is used to specify 2nd Seed value. I don't see what's wrong with it -VI - we don't need this argument. For interpolation we have -V that works just fine.

I have a feeling @blessedcoolant that during a merge you resolved conflicts wrongly and that's why you local code does not work as in others.

I made a fresh pull and I did not try values as low as 0.005... Let me do a fresh pull do a thorough check once again.

@thelemuet
Copy link

thelemuet commented Aug 31, 2022

To me this is working exactly as I would expect. It is not interpolating the 2 final images you would get from each seed, it is interpolating the generated noise from 2 different seeds, which means even a value of 0.1 has high chance to alter composition a lot, unless you get lucky or you manually picked 2 seeds where the final generated images would already be close in composition.

To me the real power of this feature will be for animation, here is a quick example from interpolating between 2 handpicked seeds where final image was close in composition:

test2

This is from -v0.0 to -v0.9, in 0.02 increments.

This breaks at even smaller increments if I had picked a second seed that produced very different image from the first.

This is why I think this feature will be pure magic if implemented for img2img as well, where it would be effectively possible to "morph" very smoothly between 2 variants based on composition from init image.

@morganavr
Copy link

okay, I implemented the behavior mentioned here (saving tensors): #184 (comment)

Maybe @xraxra can add this functionality to this PR?

Source file:
simplet2i.zip

@bakkot
Copy link
Contributor

bakkot commented Aug 31, 2022

@morganavr as mentioned above, saving the whole tensor is overkill. You just need to save a list of the original seeds and their weights (in this case there will be exactly two - the input and the target).

@morganavr
Copy link

morganavr commented Aug 31, 2022

@bakkot
I have no idea how to implement it the way you describe it. That's why I implemented it my way. It works and it allows to create variations of variations - that's the most important thing.

@bakkot
Copy link
Contributor

bakkot commented Aug 31, 2022

I will try to implement the thing I suggested later today. Short summary: the way this PR works is by constructing an "initial noise array" X_t from a weighted average of two arrays of noise generated from two different seeds. If you have both seeds and their weights, you can repeat that process for deriving X_t, and get the same result out.

@morganavr
Copy link

I will try to implement the thing I suggested later today. Short summary: the way this PR works is by constructing an "initial noise array" X_t from a weighted average of two arrays of noise generated from two different seeds. If you have both seeds and their weights, you can repeat that process for deriving X_t, and get the same result out.

can't wait to have a look at your code!

@tildebyte
Copy link
Contributor

tildebyte commented Aug 31, 2022

You can use the "squash and merge" button on Github to get a clean, single-commit item in the history

See my comment #256 (comment)

Keeping a clean history is sometimes harder than doing the actual programming work. It's not fun, and sometimes it's downright miserable. It is also the best anti-techdebt activity (esp. for the amount of effort involved) that I know of.

lstein is absolutely correct that really understanding, and faithfully using, git rebase -i, is just a necessity on complex fast-moving projects

@bakkot
Copy link
Contributor

bakkot commented Aug 31, 2022

Yeah, in cases that there's multiple logic changes, you do have to do the cleanup first. But a lot of the time (like this PR) it's logically a single thing, and you can reasonable just squash instead of worrying about cleaning it up.

@morganavr
Copy link

morganavr commented Aug 31, 2022

I will try to implement the thing I suggested later today. Short summary: the way this PR works is by constructing an "initial noise array" X_t from a weighted average of two arrays of noise generated from two different seeds.

Guys, I was thinking about one feature. Is it possible to construct such "noise array" that will change only a specific part of the image? It sounds like inpainting but maybe it is possible to implement using algorithm from this PR.

While using this PR feature I noticed that very often I would love some part of the image to stay the same, and I could use brush and paint over parts of the image... :) So yeah, basically inpainting. But with this "seed fuzzing" feature.

@bakkot
Copy link
Contributor

bakkot commented Aug 31, 2022

Keep in mind this feature is for txt2img; it doesn't take an image as input at all.

With that said, with this feature (or rather a followon), one could in theory make it so that only part of the noise array changed. That would not guarantee that the output for the rest of the image was the same, but it would be more likely.

@morganavr
Copy link

morganavr commented Aug 31, 2022

Keep in mind this feature is for txt2img; it doesn't take an image as input at all.

I know that this PR is for txt2img. The more it is interesting. It is inpainting inside trained model data because we did not provide external input image. Any idea how to map noise array to image pixels? I would be excited even if this "internal inpainting" for txt2img worked only for 512x512 size.

@bakkot
Copy link
Contributor

bakkot commented Aug 31, 2022

Any idea how to map noise array to image pixels?

I don't have any idea myself; it would require knowing more about the first stage encoder than I do to even tell if it's possible.

Anyway, you should open a new issue for this, so we can continue discussion after merging this PR.

@bakkot
Copy link
Contributor

bakkot commented Sep 1, 2022

I should have an updated version of this PR within a couple hours, so hold off on any further work/reviews for the moment.

@lstein
Copy link
Collaborator

lstein commented Sep 1, 2022 via email

@bakkot
Copy link
Contributor

bakkot commented Sep 1, 2022

OK, I've opened #277, which extends this PR to support reproducible outputs, variations-of-variations, and img2img.

@thelemuet
Copy link

Once you have a series of generated images, how easy is it to animate them? This would make a great alternative to --grid. Lincoln

I used ImageMagick because I have binaries installed, very easy it can make a gif from images contained in a folder, ran it with -delay 10 -loop 0 path\*.png -scale 360x360 -fuzz 35% output.gif.

I am pretty sure Pillow should be able to do it as well in python.

@ghost
Copy link
Author

ghost commented Sep 1, 2022

After testing this fantastic feature I have discovered an issue - seeds of the images created with -v argument can't be recreated.

  • Then I want to create variants of some image I liked, so I run:
    "happy dog" -n 50 -t -C 10 -s 50 -A k_euler -S 2578925290 -v 0.15
  • Some fantastic variant image appears and now I want to iterate on it! So I run this prompt:
    "happy dog" -n 50 -t -C 10 -s 50 -A k_euler -S 777777777 -v 0.01
    where 77777777 is a seed of this variant image but I got a completely different image...

Yes this is a problem- you can recreate only them only with the original seed and the seed in the variant images and the -v amount used, like this:

"happy dog" -n 50 -t -C 10 -s 50 -A k_euler -S 2578925290 -v 0.15 -V 777777777

doing the above would re-create what you saw in the image with seed 777777777 in the filename, which was output originally from: "happy dog" -n 50 -t -C 10 -s 50 -A k_euler -S 2578925290 -v 0.15

But from here there is not a way to make variations on that one, aside from adjusting -v to blend between the 2 seeds.

I have some ideas for how to manage doing this (variants on variants...) but nothing I've started working on yet.

@morganavr
Copy link

@xraxra

I have some ideas for how to manage doing this (variants on variants...) but nothing I've started working on yet.

I have already implemented it. By saving tensor files next to image files.

Code is here:
#184 (comment)

@ghost
Copy link
Author

ghost commented Sep 1, 2022

Did some more testing and here are my notes.
To summarize -

  • -v should create variants not variations
  • '-V' add a new tag for creating variations -- Currently -v does this functionality. So move the functionality to this.
  • -VI add new tag for interpolation -- Currently -V does this functionality. So move the functionality to this and fix the interpolation breaking at higher values. 1 should give the variant seed and 0 should give the original seed for intended behavior.

@blessedcoolant thanks for the awesome investigation, unfortunately I can't reproduce the issue here either, I tested all samplers to be sure.

I know that the 2 ancestral samplers will not do "small changes" and exhibit the behavior similar to how they rapidly change output between each step.
The only way to do "small changes" on ancestral samplers is to lock both seeds with -V and -S and gently adjust -v0.1 in small amounts

But doing this test on all samplers shows that the images match as far as setting -v to 1.0 versus using -V as the main -S seed
"an apple" -s18 -S42 -v1 -V100 -A k_euler
"an apple" -s18 -S100 -A k_euler

I would really like to have the interpolation thing, but don't want to make it a special-case for -v -V stuff, it seems like a more general purpose interpolation solution would be better. So I'd prefer to not add it into this PR.

For example being able to just do -C 8:10 -n10 and it automatically blending any parameter with x : y over -n steps
So then we'd be able to do -S42 -v0:1 -V100 -n60 to get 60 frames slerping from seed 42 to 100
or stuff where you do -v0:0.1 -n60 and get a much more gradual transition

@jnpatrick99
Copy link

@xraxra
Does it work only with k_euler? I tried k_euler_a and got wildly different results not even close to image generated with original seed and -v0.1, -v0.001 etc.

@bakkot
Copy link
Contributor

bakkot commented Sep 2, 2022

@jnpatrick99 from my testing locally, it works with every sampler except k_euler_a and k_dpm_2_a. Those two are a lot less stable in general - they also change a lot more than other samplers when you increase the number of steps, for example. So I'm not surprised they're also unstable with respect to small variations in the seed.

That said, it does still give you something closer to the input than you'd get from a random seed. Here's an image and two variations generated with k_euler_a using -v 0.05 (you'll have to use my branch to run the second two inputs), and then the fourth image is a totally random seed.

But yeah I think this will work less well with k_euler_a/k_dpm_2_a, because they're just inherently less stable. Nothing to be done about that, as far as I can tell.

original

"a highly detailed oil painting of a dragon" -s12 -W512 -H512 -C7.5 -Ak_euler_a -S2574952750
000199 2574952750

variation 1

"a highly detailed oil painting of a dragon" -s12 -W512 -H512 -C7.5 -Ak_euler_a -V 1717379167,0.05 -S2574952750
000200 1717379167

variation 2

"a highly detailed oil painting of a dragon" -s12 -W512 -H512 -C7.5 -Ak_euler_a -V 2001884610,0.05 -S2574952750
000200 2001884610

a random other seed

"a highly detailed oil painting of a dragon" -s12 -W512 -H512 -C7.5 -Ak_euler_a -S2452409030
000203 1438000961

@ghost
Copy link
Author

ghost commented Sep 2, 2022

I'm closing this PR, please use #277 it solves the multiple seed history issue

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants