-
Notifications
You must be signed in to change notification settings - Fork 6.7k
[Dreambooth flux] bug fix for dreambooth script (align with dreambooth lora) #9257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single comment.
| height=int(model_input.shape[2] * vae_scale_factor / 2), | ||
| width=int(model_input.shape[3] * vae_scale_factor / 2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this from?
| latents = self._unpack_latents(latents, height, width, self.vae_scale_factor) |
We don't additionally scale it by "/2".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's just a modification from the original version of the training scripts where we had
model_pred = FluxPipeline._unpack_latents(
model_pred,
height=int(model_input.shape[2] * 8),
width=int(model_input.shape[3] * 8),
vae_scale_factor=vae_scale_factor,
)
and it didnt work with all resolutions, so we fixed in the previous PR for the LoRA script
(in the pipeline there is this- diffusers/src/diffusers/pipelines/flux/pipeline_flux.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but we still don't have to do the additional scaling in the original pipeline, no? And it works with multiple resolutions without that. So, I am struggling to understand why we would need it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the pipeline we first scale in prepare_latents the width and height by
height = 2 * (int(height) // self.vae_scale_factor)
width = 2 * (int(width) // self.vae_scale_factor)
but we don't override them, so when they're sent to unpack_latents it's the x8 of the scaled version here^
in the training script, we send
height=model_input.shape[2],
width=model_input.shape[3],
to pack_latents - which is already a scaled down version because it happens after vae encoding.
i.e. model_input.shape[2] is equivalent to 2 * (int(height) // self.vae_scale_factor) in shape
that's why when we call unpack_latents in the training script we need to scale up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay. Thanks for explaining this. Perhaps we could add a link to this comment in our script for our bookkeeping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done :)
|
@sayakpaul shall we merge? |
|
Thank you! |
…h lora) (#9257) * fix shape * fix prompt encoding * style * fix device * add comment
fixes #9204 (comment)