From 79e11c21f7cb69c229ca74e1c051511ceece465a Mon Sep 17 00:00:00 2001 From: William Berman Date: Wed, 26 Apr 2023 19:34:16 -0700 Subject: [PATCH 1/3] [docs] add notes for stateful model changes --- docs/source/en/optimization/fp16.mdx | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/source/en/optimization/fp16.mdx b/docs/source/en/optimization/fp16.mdx index d05c5aabea2b..d934808f3718 100644 --- a/docs/source/en/optimization/fp16.mdx +++ b/docs/source/en/optimization/fp16.mdx @@ -202,6 +202,8 @@ image = pipe(prompt).images[0] **Note**: When using `enable_sequential_cpu_offload()`, it is important to **not** move the pipeline to CUDA beforehand or else the gain in memory consumption will only be minimal. See [this issue](https://github.com/huggingface/diffusers/issues/1934) for more information. +**Note**: `enable_sequential_cpu_offload()` is a stateful operation that installs hooks on the models. + ## Model offloading for fast inference and memory savings @@ -251,6 +253,10 @@ image = pipe(prompt).images[0] This feature requires `accelerate` version 0.17.0 or larger. +**Note**: `enable_model_cpu_offload()` is a stateful operation that installs hooks on the models and state on the pipeline. In order to properly re-offload +models after they are called, it is required that the entire pipeline is run and models are called in the order the pipeline expects them to be. Exercise caution +if models are re-used outside the context of the pipeline after hooks have been installed. + ## Using Channels Last memory format Channels last memory format is an alternative way of ordering NCHW tensors in memory preserving dimensions ordering. Channels last tensors ordered in such a way that channels become the densest dimension (aka storing images pixel-per-pixel). Since not all operators currently support channels last format it may result in a worst performance, so it's better to try it and see if it works for your model. From 87259a253b83e2af39d97bf028e11f7e730b99ed Mon Sep 17 00:00:00 2001 From: Will Berman Date: Thu, 27 Apr 2023 10:42:07 -0700 Subject: [PATCH 2/3] Update docs/source/en/optimization/fp16.mdx Co-authored-by: Pedro Cuenca --- docs/source/en/optimization/fp16.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/optimization/fp16.mdx b/docs/source/en/optimization/fp16.mdx index d934808f3718..009fd2de2744 100644 --- a/docs/source/en/optimization/fp16.mdx +++ b/docs/source/en/optimization/fp16.mdx @@ -253,7 +253,7 @@ image = pipe(prompt).images[0] This feature requires `accelerate` version 0.17.0 or larger. -**Note**: `enable_model_cpu_offload()` is a stateful operation that installs hooks on the models and state on the pipeline. In order to properly re-offload +**Note**: `enable_model_cpu_offload()` is a stateful operation that installs hooks on the models and state on the pipeline. In order to properly offload models after they are called, it is required that the entire pipeline is run and models are called in the order the pipeline expects them to be. Exercise caution if models are re-used outside the context of the pipeline after hooks have been installed. From c099a8fa5608231c991c5608ac4bf5df6b194b39 Mon Sep 17 00:00:00 2001 From: William Berman Date: Thu, 27 Apr 2023 10:45:28 -0700 Subject: [PATCH 3/3] link to accelerate docs for discarding hooks --- docs/source/en/optimization/fp16.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/source/en/optimization/fp16.mdx b/docs/source/en/optimization/fp16.mdx index 009fd2de2744..596312a0ffe0 100644 --- a/docs/source/en/optimization/fp16.mdx +++ b/docs/source/en/optimization/fp16.mdx @@ -255,7 +255,8 @@ This feature requires `accelerate` version 0.17.0 or larger. **Note**: `enable_model_cpu_offload()` is a stateful operation that installs hooks on the models and state on the pipeline. In order to properly offload models after they are called, it is required that the entire pipeline is run and models are called in the order the pipeline expects them to be. Exercise caution -if models are re-used outside the context of the pipeline after hooks have been installed. +if models are re-used outside the context of the pipeline after hooks have been installed. See [accelerate](https://huggingface.co/docs/accelerate/v0.18.0/en/package_reference/big_modeling#accelerate.hooks.remove_hook_from_module) +for further docs on removing hooks. ## Using Channels Last memory format