diff --git a/docs/source/en/optimization/fp16.mdx b/docs/source/en/optimization/fp16.mdx index d05c5aabea2b..596312a0ffe0 100644 --- a/docs/source/en/optimization/fp16.mdx +++ b/docs/source/en/optimization/fp16.mdx @@ -202,6 +202,8 @@ image = pipe(prompt).images[0] **Note**: When using `enable_sequential_cpu_offload()`, it is important to **not** move the pipeline to CUDA beforehand or else the gain in memory consumption will only be minimal. See [this issue](https://github.com/huggingface/diffusers/issues/1934) for more information. +**Note**: `enable_sequential_cpu_offload()` is a stateful operation that installs hooks on the models. + ## Model offloading for fast inference and memory savings @@ -251,6 +253,11 @@ image = pipe(prompt).images[0] This feature requires `accelerate` version 0.17.0 or larger. +**Note**: `enable_model_cpu_offload()` is a stateful operation that installs hooks on the models and state on the pipeline. In order to properly offload +models after they are called, it is required that the entire pipeline is run and models are called in the order the pipeline expects them to be. Exercise caution +if models are re-used outside the context of the pipeline after hooks have been installed. See [accelerate](https://huggingface.co/docs/accelerate/v0.18.0/en/package_reference/big_modeling#accelerate.hooks.remove_hook_from_module) +for further docs on removing hooks. + ## Using Channels Last memory format Channels last memory format is an alternative way of ordering NCHW tensors in memory preserving dimensions ordering. Channels last tensors ordered in such a way that channels become the densest dimension (aka storing images pixel-per-pixel). Since not all operators currently support channels last format it may result in a worst performance, so it's better to try it and see if it works for your model.