diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index aa2d907da4bd..a2bb870dc6df 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -28,8 +28,8 @@ title: Load community pipelines - local: using-diffusers/using_safetensors title: Load safetensors - - local: using-diffusers/kerascv - title: Load KerasCV Stable Diffusion checkpoints + - local: using-diffusers/other-formats + title: Load different Stable Diffusion formats title: Loading & Hub - sections: - local: using-diffusers/pipeline_overview diff --git a/docs/source/en/using-diffusers/kerascv.mdx b/docs/source/en/using-diffusers/kerascv.mdx deleted file mode 100644 index 06981cc8fdd1..000000000000 --- a/docs/source/en/using-diffusers/kerascv.mdx +++ /dev/null @@ -1,179 +0,0 @@ - - -# Using KerasCV Stable Diffusion Checkpoints in Diffusers - - - -This is an experimental feature. - - - -[KerasCV](https://github.com/keras-team/keras-cv/) provides APIs for implementing various computer vision workflows. It -also provides the Stable Diffusion [v1 and v2](https://github.com/keras-team/keras-cv/blob/master/keras_cv/models/stable_diffusion) -models. Many practitioners find it easy to fine-tune the Stable Diffusion models shipped by KerasCV. However, as of this writing, KerasCV offers limited support to experiment with Stable Diffusion models for inference and deployment. On the other hand, -Diffusers provides tooling dedicated to this purpose (and more), such as different [noise schedulers](https://huggingface.co/docs/diffusers/using-diffusers/schedulers), [flash attention](https://huggingface.co/docs/diffusers/optimization/xformers), and [other -optimization techniques](https://huggingface.co/docs/diffusers/optimization/fp16). - -How about fine-tuning Stable Diffusion models in KerasCV and exporting them such that they become compatible with Diffusers to combine the -best of both worlds? We have created a [tool](https://huggingface.co/spaces/sayakpaul/convert-kerascv-sd-diffusers) that -lets you do just that! It takes KerasCV Stable Diffusion checkpoints and exports them to Diffusers-compatible checkpoints. -More specifically, it first converts the checkpoints to PyTorch and then wraps them into a -[`StableDiffusionPipeline`](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview) which is ready -for inference. Finally, it pushes the converted checkpoints to a repository on the Hugging Face Hub. - -We welcome you to try out the tool [here](https://huggingface.co/spaces/sayakpaul/convert-kerascv-sd-diffusers) -and share feedback via [discussions](https://huggingface.co/spaces/sayakpaul/convert-kerascv-sd-diffusers/discussions/new). - -## Getting Started - -First, you need to obtain the fine-tuned KerasCV Stable Diffusion checkpoints. We provide an -overview of the different ways Stable Diffusion models can be fine-tuned [using `diffusers`](https://huggingface.co/docs/diffusers/training/overview). For the Keras implementation of some of these methods, you can check out these resources: - -* [Teach StableDiffusion new concepts via Textual Inversion](https://keras.io/examples/generative/fine_tune_via_textual_inversion/) -* [Fine-tuning Stable Diffusion](https://keras.io/examples/generative/finetune_stable_diffusion/) -* [DreamBooth](https://keras.io/examples/generative/dreambooth/) -* [Prompt-to-Prompt editing](https://github.com/miguelCalado/prompt-to-prompt-tensorflow) - -Stable Diffusion is comprised of the following models: - -* Text encoder -* UNet -* VAE - -Depending on the fine-tuning task, we may fine-tune one or more of these components (the VAE is almost always left untouched). Here are some common combinations: - -* DreamBooth: UNet and text encoder -* Classical text to image fine-tuning: UNet -* Textual Inversion: Just the newly initialized embeddings in the text encoder - -### Performing the Conversion - -Let's use [this checkpoint](https://huggingface.co/sayakpaul/textual-inversion-kerasio/resolve/main/textual_inversion_kerasio.h5) which was generated -by conducting Textual Inversion with the following "placeholder token": ``. - -On the tool, we supply the following things: - -* Path(s) to download the fine-tuned checkpoint(s) (KerasCV) -* An HF token -* Placeholder token (only applicable for Textual Inversion) - -
- -
- -As soon as you hit "Submit", the conversion process will begin. Once it's complete, you should see the following: - -
- -
- -If you click the [link](https://huggingface.co/sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline/tree/main), you -should see something like so: - -
- -
- -If you head over to the [model card of the repository](https://huggingface.co/sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline), the -following should appear: - -
- -
- - - -Note that we're not specifying the UNet weights here since the UNet is not fine-tuned during Textual Inversion. - - - -And that's it! You now have your fine-tuned KerasCV Stable Diffusion model in Diffusers ๐Ÿงจ. - -## Using the Converted Model in Diffusers - -Just beside the model card of the [repository](https://huggingface.co/sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline), -you'd notice an inference widget to try out the model directly from the UI ๐Ÿค— - -
- -
- -On the top right hand side, we provide a "Use in Diffusers" button. If you click the button, you should see the following code-snippet: - -```py -from diffusers import DiffusionPipeline - -pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline") -``` - -The model is in standard `diffusers` format. Let's perform inference! - -```py -from diffusers import DiffusionPipeline - -pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline") -pipeline.to("cuda") - -placeholder_token = "" -prompt = f"two {placeholder_token} getting married, photorealistic, high quality" -image = pipeline(prompt, num_inference_steps=50).images[0] -``` - -And we get: - -
- -
- -_**Note that if you specified a `placeholder_token` while performing the conversion, the tool will log it accordingly. Refer -to the model card of [this repository](https://huggingface.co/sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline) -as an example.**_ - -We welcome you to use the tool for various Stable Diffusion fine-tuning scenarios and let us know your feedback! Here are some examples -of Diffusers checkpoints that were obtained using the tool: - -* [sayakpaul/text-unet-dogs-kerascv_sd_diffusers_pipeline](https://huggingface.co/sayakpaul/text-unet-dogs-kerascv_sd_diffusers_pipeline) (DreamBooth with both the text encoder and UNet fine-tuned) -* [sayakpaul/unet-dogs-kerascv_sd_diffusers_pipeline](https://huggingface.co/sayakpaul/unet-dogs-kerascv_sd_diffusers_pipeline) (DreamBooth with only the UNet fine-tuned) - -## Incorporating Diffusers Goodies ๐ŸŽ - -Diffusers provides various options that one can leverage to experiment with different inference setups. One particularly -useful option is the use of a different noise scheduler during inference other than what was used during fine-tuning. -Let's try out the [`DPMSolverMultistepScheduler`](https://huggingface.co/docs/diffusers/main/en/api/schedulers/multistep_dpm_solver) -which is different from the one ([`DDPMScheduler`](https://huggingface.co/docs/diffusers/main/en/api/schedulers/ddpm)) used during -fine-tuning. - -You can read more details about this process in [this section](https://huggingface.co/docs/diffusers/using-diffusers/schedulers). - -```py -from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler - -pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline") -pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config) -pipeline.to("cuda") - -placeholder_token = "" -prompt = f"two {placeholder_token} getting married, photorealistic, high quality" -image = pipeline(prompt, num_inference_steps=50).images[0] -``` - -
- -
- -One can also continue fine-tuning from these Diffusers checkpoints by leveraging some relevant tools from Diffusers. Refer [here](https://huggingface.co/docs/diffusers/training/overview) for -more details. For inference-specific optimizations, refer [here](https://huggingface.co/docs/diffusers/main/en/optimization/fp16). - -## Known Limitations - -* Only Stable Diffusion v1 checkpoints are supported for conversion in this tool. diff --git a/docs/source/en/using-diffusers/other-formats.mdx b/docs/source/en/using-diffusers/other-formats.mdx new file mode 100644 index 000000000000..c8dc7cca86fc --- /dev/null +++ b/docs/source/en/using-diffusers/other-formats.mdx @@ -0,0 +1,126 @@ + + +# Load different Stable Diffusion formats + +Stable Diffusion models are available in different formats depending on the framework they're trained and saved with, and where you download them from. Converting these formats for use in ๐Ÿค— Diffusers allows you to use all the features supported by the library, such as [using different schedulers](schedulers) for inference, [building your custom pipeline](write_own_pipeline), and a variety of techniques and methods for [optimizing inference speed](./optimization/opt_overview). + + + +We highly recommend using the `.safetensors` format because it is more secure than traditional pickled files which are vulnerable and can be exploited to execute any code on your machine (learn more in the [Load safetensors](using_safetensors) guide). + + + +This guide will show you how to convert other Stable Diffusion formats to be compatible with ๐Ÿค— Diffusers. + +## PyTorch .ckpt + +The checkpoint - or `.ckpt` - format is commonly used to store and save models. The `.ckpt` file contains the entire model and is typically several GBs in size. While you can load and use a `.ckpt` file directly with the [`~StableDiffusionPipeline.from_ckpt`] method, it is generally better to convert the `.ckpt` file to ๐Ÿค— Diffusers so both formats are available. + +There are two options for converting a `.ckpt` file; use a Space to convert the checkpoint or convert the `.ckpt` file with a script. + +### Convert with a Space + +The easiest and most convenient way to convert a `.ckpt` file is to use the [SD to Diffusers](https://huggingface.co/spaces/diffusers/sd-to-diffusers) Space. You can follow the instructions on the Space to convert the `.ckpt` file. + +This approach works well for basic models, but it may struggle with more customized models. You'll know the Space failed if it returns an empty pull request or error. In this case, you can try converting the `.ckpt` file with a script. + +### Convert with a script + +๐Ÿค— Diffusers provides a [conversion script](https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py) for converting `.ckpt` files. This approach is more reliable than the Space above. + +Before you start, make sure you have a local clone of ๐Ÿค— Diffusers to run the script and log in to your Hugging Face account so you can open pull requests and push your converted model to the Hub. + +```bash +huggingface-cli login +``` + +To use the script: + +1. Git clone the repository containing the `.ckpt` file you want to convert. For this example, let's convert this [TemporalNet](https://huggingface.co/CiaraRowles/TemporalNet) `.ckpt` file: + +```bash +git lfs install +git clone https://huggingface.co/CiaraRowles/TemporalNet +``` + +2. Open a pull request on the repository where you're converting the checkpoint from: + +```bash +cd TemporalNet && git fetch origin refs/pr/13:pr/13 +git checkout pr/13 +``` + +3. There are several input arguments to configure in the conversion script, but the most important ones are: + + - `checkpoint_path`: the path to the `.ckpt` file to convert. + - `original_config_file`: a YAML file defining the configuration of the original architecture. If you can't find this file, try searching for the YAML file in the GitHub repository where you found the `.ckpt` file. + - `dump_path`: the path to the converted model. + + For example, you can take the `cldm_v15.yaml` file from the [ControlNet](https://github.com/lllyasviel/ControlNet/tree/main/models) repository because the TemporalNet model is a Stable Diffusion v1.5 and ControlNet model. + +4. Now you can run the script to convert the `.ckpt` file: + +```bash +python ../diffusers/scripts/convert_original_stable_diffusion_to_diffusers.py --checkpoint_path temporalnetv3.ckpt --original_config_file cldm_v15.yaml --dump_path ./ --controlnet +``` + +5. Once the conversion is done, upload your converted model and test out the resulting [pull request](https://huggingface.co/CiaraRowles/TemporalNet/discussions/13)! + +```bash +git push origin pr/13:refs/pr/13 +``` + +## Keras .pb or .h5 + + + +๐Ÿงช This is an experimental feature. Only Stable Diffusion v1 checkpoints are supported by the Convert KerasCV Space at the moment. + + + +[KerasCV](https://keras.io/keras_cv/) supports training for [Stable Diffusion](https://github.com/keras-team/keras-cv/blob/master/keras_cv/models/stable_diffusion) v1 and v2. However, it offers limited support for experimenting with Stable Diffusion models for inference and deployment whereas ๐Ÿค— Diffusers has a more complete set of features for this purpose, such as different [noise schedulers](https://huggingface.co/docs/diffusers/using-diffusers/schedulers), [flash attention](https://huggingface.co/docs/diffusers/optimization/xformers), and [other +optimization techniques](https://huggingface.co/docs/diffusers/optimization/fp16). + +The [Convert KerasCV](https://huggingface.co/spaces/sayakpaul/convert-kerascv-sd-diffusers) Space converts `.pb` or `.h5` files to PyTorch, and then wraps them in a [`StableDiffusionPipeline`] so it is ready for inference. The converted checkpoint is stored in a repository on the Hugging Face Hub. + +For this example, let's convert the [`sayakpaul/textual-inversion-kerasio`](https://huggingface.co/sayakpaul/textual-inversion-kerasio/tree/main) checkpoint which was trained with Textual Inversion. It uses the special token `` to personalize images with cats. + +The Convert KerasCV Space allows you to input the following: + +* Your Hugging Face token. +* Paths to download the UNet and text encoder weights from. Depending on how the model was trained, you don't necessarily need to provide the paths to both the UNet and text encoder. For example, Textual Inversion only requires the embeddings from the text encoder and a text-to-image model only requires the UNet weights. +* Placeholder token is only applicable for textual inversion models. +* The `output_repo_prefix` is the name of the repository where the converted model is stored. + +Click the **Submit** button to automatically convert the KerasCV checkpoint! Once the checkpoint is successfully converted, you'll see a link to the new repository containing the converted checkpoint. Follow the link to the new repository, and you'll see the Convert KerasCV Space generated a model card with an inference widget to try out the converted model. + +If you prefer to run inference with code, click on the **Use in Diffusers** button in the upper right corner of the model card to copy and paste the code snippet: + +```py +from diffusers import DiffusionPipeline + +pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline") +``` + +Then you can generate an image like: + +```py +from diffusers import DiffusionPipeline + +pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline") +pipeline.to("cuda") + +placeholder_token = "" +prompt = f"two {placeholder_token} getting married, photorealistic, high quality" +image = pipeline(prompt, num_inference_steps=50).images[0] +``` \ No newline at end of file