Skip to content

Conversation

@subpanic
Copy link

Some additional features and quality of life improvements for the dream booth script. Opening a PR in case some/all of these are worth merging. I believe all additions should be non-destructive for existing script users assuming autocast isn't a problem (but it might be, so worth investigating).

  • Support for optionally generating sample image outputs for each checkpoint
  • Supporting for optionally automatically converting each checkpoint to a diffusers model
  • Support for optionally allowing arbitrary image resolutions in training data (sample generation will still be set by the resolution argument)
  • Support for optionally setting a specific device to train on

Some questions around areas I'm not 100% on for broader audience:

  1. In order to support training on a specific device (instead of restricting purely to accelerator config), I explicitly set_device for torch and directly access the accelerator.state and set the device. Unsure if there's a cleaner way to do this
  2. Generating sample images I was consistently running into type mismatch issues either with calling the pipeline for inference or with calling the unet in the training loop. (tensor mismatch for float32 vs float16 type issues). I resolved this by enabling the autocast context around those two calls with torch.cuda.amp.autocast(enabled=True):. I have not noticed any performance or obvious training issues due to this but if someone knows why this is a bad idea it would be greatly appreciated if they could help work out how to avoid the type mismatch issues without autocrats

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@patrickvonplaten
Copy link
Contributor

cc @patil-suraj

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR @subpanic , I left a few comments below.

Note that the goal of the example scripts is to have simple scripts that cover the most common use cases and are easier to read and modify if users wants to change something. For additional features like multires we encourage users to take the script and adjust on their own, so as to keep the example here simpler. Hope you understand.

Support for optionally generating sample image outputs for each checkpoint.

That's a good point, @patrickvonplaten is working on adding logging to all training examples. @patrickvonplaten maybe we could add an argument like validation_steps so users could sync checkpointing and generating.

raise ValueError(f"Instance {self.instance_data_root} images root doesn't exists.")

self.instance_images_path = list(Path(instance_data_root).iterdir())
self.instance_images_path = [path for path in self.instance_images_path if path.suffix in [".jpg", ".png", ".jpeg"]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

Comment on lines +473 to +475
gpu = int(args.device.split(":")[1]) if args.device != "cuda" else "cuda"
if torch.cuda.is_available():
torch.cuda.set_device(gpu)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think we should have this here, this will break multi-gpu. We can set the device using CUDA_VISIBLE_DEVICES env variable. Which is also recommended by torch, cf https://pytorch.org/docs/stable/generated/torch.cuda.set_device.html


# Predict the noise residual
model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
with torch.cuda.amp.autocast(enabled=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we pass --mixed_precision="fp16" accelerate automatically uses autocast; we should not hardcode this here.

Comment on lines +856 to +903
# Also generate and save sample images if specified
if args.samples_per_checkpoint > 0:
# Make sure any data leftover from previous interim pipeline is cleared
if torch.cuda.is_available():
torch.cuda.empty_cache()

# Load current training state into a new diffusion pipeline to generate samples
pipeline = DiffusionPipeline.from_pretrained(
args.pretrained_model_name_or_path,
unet=accelerator.unwrap_model(unet),
text_encoder=accelerator.unwrap_model(text_encoder),
torch_dtype=weight_dtype
).to(accelerator.device)

if args.convert_checkpoints:
convertedOutPath = os.path.join(args.output_dir, "converted_checkpoints", f"checkpoint-{global_step}")
logger.info(f"Converting checkpoint-{global_step} to diffusers model at {convertedOutPath}")
pipeline.save_pretrained(convertedOutPath)

logger.info(f"Generating {args.samples_per_checkpoint} samples at step {global_step} with prompt: {args.sample_prompt}")

# Allow a statically set or random seed for for generated samples
sampleSeed = args.sample_seed if args.sample_seed != -1 else torch.Generator(accelerator.device).seed()
sampleGenerator = torch.Generator(device=accelerator.device)

for sampleIdx in range(args.samples_per_checkpoint):
sampleGenerator.manual_seed(sampleSeed)
# Generate samples
with torch.cuda.amp.autocast(enabled=True):
images = pipeline(
prompt=args.sample_prompt,
num_images_per_prompt=1,
num_inference_steps=args.sample_steps,
generator=sampleGenerator,
width=args.resolution,
height=args.resolution).images

# Save samples to 'samples' folder in output directory
for i, image in enumerate(images):
hash_image = hashlib.sha1(image.tobytes()).hexdigest()
os.makedirs(os.path.join(args.output_dir, "samples"), exist_ok=True)
image_filename = os.path.join(args.output_dir, "samples", f"step-{global_step}_seed-{sampleSeed}_loss-{loss.detach().item()}_hash-{hash_image}.png")
image.save(image_filename)

sampleSeed += 1

# Remove interim pipeline reference
del pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @subpanic, could you maybe try to use the mechanism of https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py#L902 here for generation? It should be a bit easier than what we currently have :-)

logging_dir=logging_dir,
)

# Accelerator device target is managed by an AcceleratorState object, grabbing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not do this here. accelerate is responsible for device handling :-)

]
)
if multires_enabled:
self.image_transforms = transforms.Compose(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patil-suraj what do you think regarding multires?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really in favour of this. I think better to keep the script simpler.

parser.add_argument("--convert_checkpoints", action="store_true", help="Auto-convert checkpoints to an inference ready structure")
parser.add_argument("--multires", action="store_true", help="Disables dataset image transforms. Allows training on image datasets of arbitrary resolutions")

parser.add_argument("--device", type=str, default="cuda", help="Set the device to use for training (cuda, cuda:0, cuda:1, etc.).")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not add this. accelerate should handle this :-)

parser.add_argument("--sample_steps", type=int, default=40, help="Number of steps for generating sample images.")
parser.add_argument("--sample_prompt", type=str, default=None, help="Prompt to use for sample image generation.")
parser.add_argument("--sample_seed", type=int, default=-1, help="Seed for the per-checkpoint sample image generation. -1 to select random seed")
parser.add_argument("--convert_checkpoints", action="store_true", help="Auto-convert checkpoints to an inference ready structure")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also not add this. We will handle this soon directly in accelerate as well, see: #2048

"--enable_xformers_memory_efficient_attention", action="store_true", help="Whether or not to use xformers."
)
parser.add_argument("--samples_per_checkpoint", type=int, default=0, help="Whether or not to save samples for every checkpoint specified by --checkpointing_steps.")
parser.add_argument("--sample_steps", type=int, default=40, help="Number of steps for generating sample images.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead maybe use:

?

parser.add_argument(
"--enable_xformers_memory_efficient_attention", action="store_true", help="Whether or not to use xformers."
)
parser.add_argument("--samples_per_checkpoint", type=int, default=0, help="Whether or not to save samples for every checkpoint specified by --checkpointing_steps.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm can we maybe align this with

and not have this argument?

)
parser.add_argument("--samples_per_checkpoint", type=int, default=0, help="Whether or not to save samples for every checkpoint specified by --checkpointing_steps.")
parser.add_argument("--sample_steps", type=int, default=40, help="Number of steps for generating sample images.")
parser.add_argument("--sample_prompt", type=str, default=None, help="Prompt to use for sample image generation.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call this:

maybe?

parser.add_argument("--samples_per_checkpoint", type=int, default=0, help="Whether or not to save samples for every checkpoint specified by --checkpointing_steps.")
parser.add_argument("--sample_steps", type=int, default=40, help="Number of steps for generating sample images.")
parser.add_argument("--sample_prompt", type=str, default=None, help="Prompt to use for sample image generation.")
parser.add_argument("--sample_seed", type=int, default=-1, help="Seed for the per-checkpoint sample image generation. -1 to select random seed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parser.add_argument("--sample_seed", type=int, default=-1, help="Seed for the per-checkpoint sample image generation. -1 to select random seed")
parser.add_argument("--validation_seed", type=int, default=-1, help="Seed for the per-checkpoint sample image generation. -1 to select random seed")

parser.add_argument("--sample_prompt", type=str, default=None, help="Prompt to use for sample image generation.")
parser.add_argument("--sample_seed", type=int, default=-1, help="Seed for the per-checkpoint sample image generation. -1 to select random seed")
parser.add_argument("--convert_checkpoints", action="store_true", help="Auto-convert checkpoints to an inference ready structure")
parser.add_argument("--multires", action="store_true", help="Disables dataset image transforms. Allows training on image datasets of arbitrary resolutions")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR. It would be great if we could align things here a bit with: https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py#L159 if possible :-)

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Feb 25, 2023
@github-actions github-actions bot closed this Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Issues that haven't received updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants