add: logging to text2image. #2173

sayakpaul · 2023-01-31T09:50:03Z

Context: #2163

Potentially closes #2163

Command to fire training:

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"

accelerate launch --mixed_precision="fp16"  train_text_to_image.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME \
  --use_ema \
  --resolution=512 --center_crop --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --gradient_checkpointing \
  --max_train_steps=250 \
  --learning_rate=1e-05 \
  --max_grad_norm=1 \
  --lr_scheduler="constant" --lr_warmup_steps=0 \
  --validation_prompt="cute dragon creature" \
  --seed=666 \
  --report_to="wandb" \
  --output_dir="sd-pokemon-model"

Leads to:

Traceback (most recent call last):█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊      | 29/30 [00:02<00:00, 13.74it/s]
  File "train_text_to_image.py", line 813, in <module>
    main()
  File "train_text_to_image.py", line 791, in main
    pipeline(args.validation_prompt, num_inference_steps=30, generator=generator).images[0]
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 636, in __call__
    image, has_nsfw_concept = self.run_safety_checker(image, device, prompt_embeds.dtype)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 361, in run_safety_checker
    image, has_nsfw_concept = self.safety_checker(
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/safety_checker.py", line 52, in forward
    pooled_output = self.vision_model(clip_input)[1]  # pooled_output
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 934, in forward
    return self.vision_model(
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 859, in forward
    hidden_states = self.embeddings(pixel_values)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 195, in forward
    patch_embeds = self.patch_embedding(pixel_values)  # shape = [*, width, grid, grid]
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.FloatTensor) should be the same

The PR also follows this comment: #2163 (comment)

It's the safety checker that causes the problem.

HuggingFaceDocBuilderDev · 2023-01-31T09:58:25Z

The documentation is not available anymore as the PR was closed or merged.

sayakpaul · 2023-01-31T12:12:24Z

Adding the safety checker explicitly (f2a143f) also didn't help.

patil-suraj

Thanks a lot for the PR! Looks good to me, just left a comment about loading text_encoder and vae again for inference, not really in favour of that.

Also, when doing mixed-precision training, I would expect the generation to be in mixed-precision as well. And for that we'll probably need to use torch.autocast for the reasons explained here #2163 (comment)

We don't really want to promote autocast for inference, but I don't see any other clean way of handling it here. We could explain it with a comment on why it's used in the script and why it's not needed for general inference.

Also, this will be enabled only when using mixed-precision, else everything will default to fp32. This can be achieved using

with torch.autocast(accelerator.device, enabled=args.mixed_precision == "fp16"):
    ....

Happy to hear suggestions :)

patil-suraj · 2023-01-31T12:15:10Z

examples/text_to_image/train_text_to_image.py


+    if args.report_to == "wandb":
+        if not is_wandb_available():
+            raise ImportError("Make sure to install wandb if you want to use it for logging during training.")


Suggested change

raise ImportError("Make sure to install wandb if you want to use it for logging during training.")

raise ImportError("Make sure to install wandb if you want to use it for logging during training. You can do so by doing `pip install wandb`")

patil-suraj · 2023-01-31T12:17:40Z

examples/text_to_image/train_text_to_image.py

+                safety_checker = StableDiffusionSafetyChecker.from_pretrained(
+                    args.pretrained_model_name_or_path, subfolder="safety_checker", revision=args.non_ema_revision
+                )


StableDiffusionPipeline.from_pretrained should automatically load safety_checker when available, is there any reason we need to load it here explicitly?

Also, I'm not sure what difference it makes when we load safety_checker separately like this, StableDiffusionPipeline.from_pretrained does pretty much the same thing.

patil-suraj · 2023-01-31T12:18:35Z

examples/text_to_image/train_text_to_image.py

+                    safety_checker=safety_checker,
+                    revision=args.revision,


We could also directly pass vae and text_encoder here, not really in favour of loading them again, as this would take more memory and time and might also lead to OOM (depending on the GPU).

patil-suraj · 2023-01-31T12:59:11Z

examples/text_to_image/train_text_to_image.py

+                # safety_checker.to(accelerator.device, dtype=weight_dtype)
+                pipeline = StableDiffusionPipeline.from_pretrained(
+                    args.pretrained_model_name_or_path,
+                    unet=accelerator.unwrap_model(unet),


When doing ema, we should use ema weights for inference.

to do that, we'll need to

temporarily store the non-ema weights

copy the ema weights to unet

restore the non-ema weight back in the unet.

For that we'll need to add the store, restore method in EMAModel as defined in https://github.com/fadel/pytorch_ema/blob/master/torch_ema/ema.py#L139

Happy to take care of this if you want :)

sayakpaul · 2023-01-31T13:21:35Z

@patil-suraj thanks a lot for your comments. Let me address them in some time (might take tomorrow mid half) as I have one other commitment and will be travelling for some time. I hope it's not a blocker.

patil-suraj · 2023-01-31T13:57:18Z

No worries at all, not super urgent anyway : )

patil-suraj · 2023-02-01T08:54:41Z

Actually, there's already a similar PR #2157

sayakpaul · 2023-02-01T09:47:04Z

Actually, there's already a similar PR #2157

Sure, I can close this one then.

sayakpaul · 2023-02-01T09:47:38Z

Closing in favor of #2157

patil-suraj · 2023-02-01T10:28:16Z

Thanks a lot!

sayakpaul added 3 commits January 31, 2023 15:04

add: logging to text2image.

938fa28

add: autocast block.

69af6ce

apply make style/.

67c6c56

pcuenca mentioned this pull request Jan 31, 2023

[examples] ]Error due to mismatches in the dtype of UNet while running train_text_to_image.py #2163

Closed

sayakpaul added 4 commits January 31, 2023 15:43

disable unwrapping.

ef2abd8

autocast context manager before final inference.

6582c80

remove autocasts.

26bc0d7

make style.

5a33617

sayakpaul requested review from patil-suraj and pcuenca and removed request for patil-suraj January 31, 2023 10:48

sayakpaul self-assigned this Jan 31, 2023

sayakpaul marked this pull request as ready for review January 31, 2023 11:29

sayakpaul added 2 commits January 31, 2023 17:12

add: safety checker.

37a35c1

fix: cli arg.

f2a143f

disable casting for safety checker.

a0e844d

patil-suraj reviewed Jan 31, 2023

View reviewed changes

patil-suraj mentioned this pull request Jan 31, 2023

[WIP] Sample images when checkpointing. #2157

Closed

patil-suraj reviewed Jan 31, 2023

View reviewed changes

sayakpaul closed this Feb 1, 2023

sayakpaul deleted the feat/text2image-logging branch February 2, 2023 03:15

sayakpaul mentioned this pull request Feb 9, 2023

[Utils] Adds store() and restore() methods to EMAModel #2302

Merged

	raise ImportError("Make sure to install wandb if you want to use it for logging during training.")
	raise ImportError("Make sure to install wandb if you want to use it for logging during training. You can do so by doing `pip install wandb`")

add: logging to text2image. #2173

add: logging to text2image. #2173

Uh oh!

Conversation

sayakpaul commented Jan 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jan 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Jan 31, 2023

Uh oh!

patil-suraj left a comment

Choose a reason for hiding this comment

Uh oh!

patil-suraj Jan 31, 2023

Choose a reason for hiding this comment

Uh oh!

patil-suraj Jan 31, 2023

Choose a reason for hiding this comment

Uh oh!

patil-suraj Jan 31, 2023

Choose a reason for hiding this comment

Uh oh!

patil-suraj Jan 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patil-suraj Jan 31, 2023

Choose a reason for hiding this comment

Uh oh!

patil-suraj Jan 31, 2023

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Jan 31, 2023

Uh oh!

patil-suraj commented Jan 31, 2023

Uh oh!

patil-suraj commented Feb 1, 2023

Uh oh!

sayakpaul commented Feb 1, 2023

Uh oh!

sayakpaul commented Feb 1, 2023

Uh oh!

patil-suraj commented Feb 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sayakpaul commented Jan 31, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 31, 2023 •

edited

Loading

patil-suraj Jan 31, 2023 •

edited

Loading