Skip to content

Conversation

@linoytsaban
Copy link
Collaborator

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@linoytsaban linoytsaban requested a review from sayakpaul August 23, 2024 11:40
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single comment.

Comment on lines +1602 to +1603
height=int(model_input.shape[2] * vae_scale_factor / 2),
width=int(model_input.shape[3] * vae_scale_factor / 2),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this from?

latents = self._unpack_latents(latents, height, width, self.vae_scale_factor)

We don't additionally scale it by "/2".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just a modification from the original version of the training scripts where we had

model_pred = FluxPipeline._unpack_latents(
                    model_pred,
                    height=int(model_input.shape[2] * 8),
                    width=int(model_input.shape[3] * 8),
                    vae_scale_factor=vae_scale_factor,
                )

and it didnt work with all resolutions, so we fixed in the previous PR for the LoRA script

(in the pipeline there is this- diffusers/src/diffusers/pipelines/flux/pipeline_flux.py)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but we still don't have to do the additional scaling in the original pipeline, no? And it works with multiple resolutions without that. So, I am struggling to understand why we would need it here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the pipeline we first scale in prepare_latents the width and height by

height = 2 * (int(height) // self.vae_scale_factor)
width = 2 * (int(width) // self.vae_scale_factor)

but we don't override them, so when they're sent to unpack_latents it's the x8 of the scaled version here^
in the training script, we send

height=model_input.shape[2],
width=model_input.shape[3],

to pack_latents - which is already a scaled down version because it happens after vae encoding.
i.e. model_input.shape[2] is equivalent to 2 * (int(height) // self.vae_scale_factor) in shape
that's why when we call unpack_latents in the training script we need to scale up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. Thanks for explaining this. Perhaps we could add a link to this comment in our script for our bookkeeping?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

@linoytsaban
Copy link
Collaborator Author

@sayakpaul shall we merge?

@sayakpaul sayakpaul merged commit c977966 into huggingface:main Aug 26, 2024
@sayakpaul
Copy link
Member

Thank you!

@linoytsaban linoytsaban deleted the dreambooth-flux branch November 26, 2024 10:18
sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
…h lora) (#9257)

* fix shape

* fix prompt encoding

* style

* fix device

* add comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DreamBooth training script / FLUX.1 [dev]

3 participants