Implement TeaCache #12652

LawJarp-A · 2025-11-13T14:54:08Z

What does this PR do?

What is TeaCache?

TeaCache (Timestep Embedding Aware Cache) is a training-free caching technique that speeds up diffusion model inference by 1.5x-2.6x by reusing transformer block computations when consecutive timestep embeddings are similar.

Architecture & Design

TeaCache uses a ModelHook to intercept transformer forward passes without modifying model code. The algorithm:

Extracts modulated input from first transformer block (after norm1 + timestep embedding)
Computes relative L1 distance vs previous timestep
Applies model-specific polynomial rescaling: c[0]*x^4 + c[1]*x^3 + c[2]*x^2 + c[3]*x + c[4]
Accumulates rescaled distance across timesteps
If accumulated < threshold → Reuses cached residual (FAST)
If accumulated >= threshold → Full transformer pass (SLOW, update cache)

Key Design Features:

Hook-based: Integrates with HookRegistry and CacheMixin for lifecycle management
State Isolation: StateManager with context-aware state for CFG conditional/unconditional branches
Model Auto-Detection: Detects model type from class name and config path (specific variants checked first)
Boundary Guarantee: First and last timesteps always computed fully for quality
Specialized Strategies: Dual residual caching (CogVideoX), per-sequence-length caching (Lumina2)

Supported Models

Model	Coefficients	Status
FLUX	Auto-detected	Tested
FLUX-Kontext	Auto-detected	Ready
Mochi	Auto-detected	Ready
Lumina2	Auto-detected	Ready
CogVideoX (2b/5b/1.5-5B)	Auto-detected	Ready

All models support automatic coefficient detection based on model class name and config path. Custom coefficients can also be provided via TeaCacheConfig.

Benchmark Results (FLUX.1-dev)

Threshold	Time	Speedup
Baseline	9.26s	1.00x
0.2	6.85s	1.35x
0.4	5.24s	1.77x
0.6	4.64s	2.00x
0.8	4.18s	2.22x

Benchmark Results (Lumina2)

Threshold	Time	Speedup
Baseline	3.45s	1.00x
0.2	3.07s	1.12x
0.4	2.27s	1.52x
0.6	1.84s	1.88x

Benchmark Results (CogVideoX-2b)

Threshold	Time	Speedup
Baseline	26.27s	1.00x
0.3	23.97s	1.10x
0.5	22.57s	1.16x
0.7	19.31s	1.36x
0.9	17.38s	1.51x

Benchmark Results (Mochi)

Threshold	Time	Speedup
Baseline	7.71s	1.00x
0.05	6.27s	1.23x
0.06	6.03s	1.28x
0.08	5.73s	1.35x
0.10	5.41s	1.42x

Test Hardware: NVIDIA h100
Framework: Diffusers with TeaCache hooks
All tests: Same seed (42) for reproducibility

Usage

from diffusers import FluxPipeline
from diffusers.hooks import TeaCacheConfig

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipe.to("cuda")

# Enable TeaCache (1.75x speedup with 0.4 threshold)
config = TeaCacheConfig(rel_l1_thresh=0.4)
pipe.transformer.enable_cache(config)

image = pipe("A dragon on a crystal mountain", num_inference_steps=20).images[0]

pipe.transformer.disable_cache()

Configuration Options

The TeaCacheConfig supports the following parameters:

rel_l1_thresh (float, default=0.2): Threshold for accumulated relative L1 distance. Recommended values: 0.25 for ~1.5x speedup, 0.4 for ~1.8x, 0.6 for ~2.0x. Mochi models require lower thresholds (0.06-0.09).
coefficients (List[float], optional): Polynomial coefficients for rescaling L1 distance. Auto-detected based on model type if not provided.
num_inference_steps (int, optional): Total inference steps. Ensures first/last timesteps are always computed. Auto-detected if not provided.
num_inference_steps_callback (Callable[[], int], optional): Callback returning total inference steps. Alternative to num_inference_steps.
current_timestep_callback (Callable[[], int], optional): Callback returning current timestep. Used for debugging/statistics.

Files Changed

src/diffusers/hooks/teacache.py - Core implementation with model-specific forward functions
src/diffusers/models/cache_utils.py - CacheMixin integration
src/diffusers/hooks/__init__.py - Export TeaCacheConfig and apply_teacache
tests/hooks/test_teacache.py - Comprehensive unit tests

Fixes # (issue)
#12589
#12635

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sayakpaul @yiyixuxu @DN6

…nal_forward

LawJarp-A · 2025-11-13T17:29:06Z

Work done

Implement teacache for FLUX architecture using hooks (only flux for now)
add logging
add compatible tests

Waiting for feedback and review :)
cc: @dhruvrnaik @sayakpaul @yiyixuxu

LawJarp-A · 2025-11-17T11:40:23Z

Hi @sayakpaul @dhruvrnaik any updates?

sayakpaul · 2025-11-23T11:27:37Z

@LawJarp-A sorry about the delay on our end. @DN6 will review it soon.

HuggingFaceDocBuilderDev · 2025-11-23T11:35:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

DN6 · 2025-11-24T02:56:23Z

Hi @LawJarp-A I think we would need TeaCache to be implemented in a model agnostic way in order to merge the PR. The First Block Cache implementation is a good reference for this.

LawJarp-A · 2025-11-24T03:40:06Z

Hi @LawJarp-A I think we would need TeaCache to be implemented in a model agnostic way in order to merge the PR. The First Block Cache implementation is a good reference for this.

Yep @DN6 , I agree, I wanted to first implement it just for a single model and get feedback on that before I work on Model agnostic full implementation. I'm sort of working on it, didn't push it yet. I'll take a look at First block cache for reference as well.
On the same note, lemme know if there is anything to add to the current implementation

LawJarp-A · 2025-11-26T08:53:31Z

@DN6 updated it in a more model agnostic way.
Requesting review and feedback

… quality checks

…th auto-detection

LawJarp-A · 2025-12-02T09:51:52Z

Added multi model support, testing it thoroughly though.

LawJarp-A · 2025-12-08T11:51:18Z

Hi @DN6 @sayakpaul
Two questions, I'm almost done testing, I'll update the PR with more descriptive results and changes. And do final cleanup/merging etc

Any tests I should write and anything I can refer to for the same?
Added support for other models, I'll add pictures comparison with speedup and threshold to the PR as well?

In the meantime any feedback would be appreciated

sayakpaul · 2025-12-08T12:20:26Z

Thanks @LawJarp-A!

Any tests I should write and anything I can refer to for the same?

You can refer to #12569 for testing

Added support for other models, I'll add pictures comparison with speedup and threshold to the PR as well?

Yes, I think that is informative for users.

sayakpaul

Some initial feedback. Most important question is it seems like we need to craft different logic based on different model? Can we not keep it model agnostic?

src/diffusers/hooks/teacache.py

LawJarp-A · 2025-12-08T13:43:54Z

I am trying to think if ways we can avoid having the forward model for each model now. Initially that seemed like th ebe

Some initial feedback. Most important question is it seems like we need to craft different logic based on different model? Can we not keep it model agnostic?

t was fine when I wrote for flux, but lumina needed multi stage preprocessing.
I am trying to think how to , but keeping a generic forward might not work very well :/
Firstcache, FirstBlock all work block level, but TeaCache is more model level.
Defo open to ideas :)

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

LawJarp-A · 2025-12-27T04:28:33Z

@sayakpaul @DN6 checking in again :)

DN6

Some high level feedback on the design. The control flow is hard to follow as it switches between the hook object and adapter. The adapters themselves are thin wrappers around a modified forward function, so it would be better to just define them as standalone functions. e.g.

def _flux_forward(
    state: "TeaCacheState", # pass the state to the function not the hook object
    coefficients: List[float],
    rel_l1_thresh: float,
    module: torch.nn.Module,
    hidden_states: torch.Tensor,
    timestep: torch.Tensor,
    pooled_projections: torch.Tensor,
    encoder_hidden_states: torch.Tensor,
    txt_ids: torch.Tensor,
    img_ids: torch.Tensor,
    return_dict: bool = True,
    **kwargs,
):

    if _should_use_cache(state, modulated_inp, coefficients, rel_l1_thresh)
        hidden_states = _apply_cached_residual(state, hidden_states, modulated_inp)
    else:
	    # run compute
	    _update_cache(state, hidden_states, original_hidden_states, modulated_inp)

Since we're hooking the top level forward of the model, we can map this forward function using the class name during hook initialization.

    def initialize_hook(self, module):
        """Initialize hook with model-specific configuration."""
        model_config = _MODEL_CONFIG.get(module.__name__)
        if model_config is None:
            raise ValueError

        if self.config.coefficients is not None:
            self.coefficients = self.config.coefficients
        else:
            self.coefficients = model_config["coefficients"]

        # Initialize state
        self.state_manager = StateManager(TeaCacheState)
        self.forward_fn = model_config["forward_func"]

        return module

Where _MODEL_CONFIG is just a mapping for the forward functions and coefficients

_MODEL_CONFIG = {
    "FluxTransformer2DModel": {
        "forward_func": _flux_forward,
        "coefficients": [4.98651651e02, -2.83781631e02, 5.58554382e01, -3.82021401e00, 2.64230861e-01],
    },
}

Similarly, the methods defined in the hook object could also be turned into utility functions.

def _compute_rescaled_distance(rel_distance: float, coefficients: List[float]) -> float:
    return (
        coefficients[0] * rel_distance**4
        + coefficients[1] * rel_distance**3
        + coefficients[2] * rel_distance**2
        + coefficients[3] * rel_distance
        + coefficients[4]
    )
    
def _should_use_cache(state: "TeaCacheState", ...):
	# Return True or False based on whether to use cache. 
	return 
	
def _update_cache(state: "TeaCacheState)
	return 

def _apply_cached_residual(
    state: "TeaCacheState", input_base: torch.Tensor, modulated_inp: torch.Tensor
) -> torch.Tensor:
    """
    Apply cached residual to input (fast path).
    """
    output = input_base + state.previous_residual
    state.previous_modulated_input = modulated_inp
    state.cnt += 1
    return output

Let's remove passing cache_fn and compute_fn between the hook and the adapter. Use operations directly on the cache state + globally available utility methods. We can also remove the modulation extractors and move that logic into the model specific forward functions.

src/diffusers/hooks/teacache.py

src/diffusers/models/cache_utils.py

src/diffusers/hooks/teacache.py

LawJarp-A · 2025-12-29T11:24:10Z

Thanks for the feedback @DN6
I'll take this week and rework it.
I had left some redundant code while trying to figure out the organization, will clean it up

LawJarp-A · 2026-01-06T03:49:39Z

The per-model forward code is unavoidable due to different model architectures. The adapter pattern was an attempt to organize this, but I agree standalone functions would be cleaner. I'll refactor.

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

…ctions Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

LawJarp-A · 2026-01-06T04:59:43Z

Hi @DN6 , I've updated the implementation as you requested:

Replaced adapter classes with standalone forward functions
Created _MODEL_CONFIG mapping for forward functions and coefficients
Removed cache_fn/compute_fn closures - now using direct if/else logic in each forward
Extracted utility functions: _should_compute(), _update_state(), _apply_cached_residual()
Removed enable_teacache() - now only enable_cache(TeaCacheConfig(...))
Inlined modulation extractors into forward functions

This does introduce some code duplication - each forward function now has the same if/else pattern:

  if _should_compute(state, modulated_inp, hook.coefficients, hook.config.rel_l1_thresh):
      # compute full transformer
      _update_state(state, output, original, modulated_inp)
  else:
      output = _apply_cached_residual(state, input, modulated_inp)

But the control flow is now much clearer - you can read each forward function top-to-bottom without jumping between closures and hook methods.

Let me know if you'd like any further changes!

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

… isolation Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

…elpers Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

LawJarp-A · 2026-01-12T10:59:58Z

@DN6 @sayakpaul I spent the weekend going over the code again to understand and simplify

I have updated the cache context to be set in the denosing loop itself
removed redundant code
tested it with all models on a h100 and updated it in the PR description

I have kept it with per model forward function like you requested instead of the common adapter pattern I was using before.
Please review it now, I think it addresses all the recent feedback I have recieved

Btw, below are the images generated w and w/o cache

Mochi

lumina2

flux

cogxvideo

…orward methods Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

sayakpaul

I left some comments. LMK if they make sense.

sayakpaul · 2026-01-20T08:00:59Z

src/diffusers/hooks/hooks.py

+            # Fallback to default context for backward compatibility with
+            # pipelines that don't call cache_context()
+            context = "_default"


Should this branch not error out like previous?

sayakpaul · 2026-01-20T08:03:39Z

src/diffusers/hooks/hooks.py

-        # Iterate over all attributes of the hook to see if any of them have the type `StateManager`. If so, call `set_context` on them.
+        """Set context on all StateManager attributes of this hook."""


Can we please revert the changes unrelated to this PR? Makes reviewing a bit easier since the diff becomes smaller.

sayakpaul · 2026-01-20T08:21:52Z

src/diffusers/hooks/teacache.py

+def _get_model_config() -> Dict[str, Dict[str, Any]]:
+    """Get model configuration mapping. Order matters: more specific variants before generic ones."""
+    return {
+        "FluxKontext": {


WDYT of using the actual model class names here? Will likely be easier to maintain and read.

sayakpaul · 2026-01-20T08:22:15Z

src/diffusers/hooks/teacache.py

+    return {
+        "FluxKontext": {
+            "forward_func": _flux_teacache_forward,
+            "coefficients": [-1.04655119e03, 3.12563399e02, -1.69500694e01, 4.10995971e-01, 3.74537863e-02],


(nit): would make a note on how these values were obtained.

sayakpaul · 2026-01-20T08:56:11Z

src/diffusers/hooks/teacache.py

+    if prev_mean.item() > 1e-9:
+        return ((current - previous).abs().mean() / prev_mean).item()


Do we need to make it data-dependent (item() call)? Raising it because it makes torch.compile cry.

sayakpaul · 2026-01-20T08:56:30Z

src/diffusers/hooks/teacache.py

+    return 0.0 if current.abs().mean().item() < 1e-9 else float("inf")
+
+
+@torch.compiler.disable


Is it because of the item() call?

sayakpaul · 2026-01-20T09:01:34Z

src/diffusers/hooks/teacache.py

+    attention_kwargs, lora_scale = _extract_lora_scale(attention_kwargs)
+    if USE_PEFT_BACKEND:
+        scale_lora_layers(module, lora_scale)


We should check if the underlying model class inherits from PeftLoaderMixin and if so, we should do it.

sayakpaul · 2026-01-20T09:06:14Z

src/diffusers/hooks/teacache.py

+    """Get model configuration mapping. Order matters: more specific variants before generic ones."""
+    return {
+        "FluxKontext": {
+            "forward_func": _flux_teacache_forward,


One problem that I see with this is what happens when the core forward method undergoes some changes and we fail to propagate them in these modified forwards.

sayakpaul · 2026-01-20T09:09:58Z

@bot /style

github-actions · 2026-01-20T09:10:26Z

Style bot fixed some files and pushed the changes.

Copilot

Pull request overview

This PR implements TeaCache (Timestep Embedding Aware Cache), a training-free caching technique that speeds up diffusion model inference by 1.5x-2.6x by reusing transformer block computations when consecutive timestep embeddings are similar.

Changes:

Adds TeaCache hook system with model-specific forward implementations for FLUX, Mochi, Lumina2, and CogVideoX models
Integrates TeaCache with the existing CacheMixin infrastructure for unified cache management
Implements StateManager improvements for context-aware state isolation (CFG support)

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/diffusers/hooks/teacache.py	Core TeaCache implementation with polynomial rescaling, model auto-detection, and specialized forward functions for each supported model
src/diffusers/models/cache_utils.py	Integration of TeaCacheConfig into enable_cache/disable_cache methods
src/diffusers/hooks/init.py	Export TeaCacheConfig, apply_teacache, and StateManager
src/diffusers/hooks/hooks.py	StateManager enhancement with default context fallback for backward compatibility
src/diffusers/models/transformers/transformer_lumina2.py	Add CacheMixin to Lumina2Transformer2DModel
tests/hooks/test_teacache.py	Comprehensive unit tests for config validation, state management, and model detection

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-20T09:20:48Z

src/diffusers/hooks/teacache.py

+        }
+    cache = state.cache_dict[cache_key]
+
+    is_boundary_step = state.cnt == 0 or state.cnt == state.num_steps - 1


The boundary step check may fail when num_steps is 0 (not initialized). If state.num_steps is 0, the condition state.cnt == state.num_steps - 1 evaluates to state.cnt == -1, which would never be true for a non-negative counter. Consider adding state.num_steps > 0 as a guard similar to line 118.

Suggested change

is_boundary_step = state.cnt == 0 or state.cnt == state.num_steps - 1

is_boundary_step = state.cnt == 0 or (state.num_steps > 0 and state.cnt == state.num_steps - 1)

Copilot · 2026-01-20T09:20:49Z

src/diffusers/hooks/teacache.py

+    # Track sequence length for step counting (CFG)
+    if state.uncond_seq_len is None:
+        state.uncond_seq_len = cache_key
+    if cache_key != state.uncond_seq_len:


The step counter logic for Lumina2 is confusing and potentially incorrect. The counter is incremented when cache_key != state.uncond_seq_len (line 725), but this assumes CFG will always process unconditional first, then conditional. If the order changes or if CFG is not used, the counter may not increment correctly. Consider documenting this assumption more clearly or making the logic more robust.

Suggested change

# Track sequence length for step counting (CFG)

if state.uncond_seq_len is None:

state.uncond_seq_len = cache_key

if cache_key != state.uncond_seq_len:

# Track sequence length for step counting.

# We keep state.uncond_seq_len for backward compatibility but avoid assuming

# that unconditional is always processed before conditional.

if state.uncond_seq_len is None:

state.uncond_seq_len = cache_key

# Step counting strategy:

# * If we ever observe more than one distinct cache_key, we assume CFG-like

# behavior and increment state.cnt once per "pair" (i.e., once we've seen

# at least two distinct cache_keys in a step), independent of ordering.

# * If we only ever observe a single cache_key, we assume CFG is disabled and

# increment state.cnt on every call.

if not hasattr(state, "_seen_cache_keys_in_step"):

state._seen_cache_keys_in_step = set()

if not hasattr(state, "_all_cache_keys_seen"):

state._all_cache_keys_seen = set()

state._seen_cache_keys_in_step.add(cache_key)

state._all_cache_keys_seen.add(cache_key)

has_multiple_cache_keys = len(state._all_cache_keys_seen) > 1

if has_multiple_cache_keys:

# CFG-like behavior: increment once per step after seeing multiple keys.

if len(state._seen_cache_keys_in_step) > 1:

state.cnt += 1

if state.cnt >= state.num_steps:

state.cnt = 0

# Reset for the next diffusion step.

state._seen_cache_keys_in_step.clear()

else:

# No-CFG behavior: only one cache_key is ever seen, so increment every call.

Copilot · 2026-01-20T09:20:49Z

src/diffusers/hooks/teacache.py

+def _auto_detect_model_type(module: torch.nn.Module) -> str:
+    """Auto-detect model type from class name and config path."""
+    class_name = module.__class__.__name__
+    config_path = getattr(getattr(module, "config", None), "_name_or_path", "").lower()
+    model_config = _get_model_config()
+
+    for model_type in model_config:
+        if model_type.lower() in config_path or model_type in class_name:
+            return model_type


The model auto-detection uses substring matching with model_type.lower() in config_path or model_type in class_name. This can lead to false positives. For example, "CogVideoX" would match "CogVideoX1.5-5B" configs. While the iteration order is designed to check specific variants first, the order dependency is fragile. Consider using more precise matching (e.g., checking for exact model identifier patterns or using startswith/endswith) to avoid potential mismatches.

Copilot · 2026-01-20T09:20:49Z

src/diffusers/hooks/teacache.py

+                f"Please provide a float value between 0.1 and 1.0."
+            )
+        if self.rel_l1_thresh <= 0:
+            raise ValueError(
+                f"rel_l1_thresh must be positive, got {self.rel_l1_thresh}. "
+                f"Based on the TeaCache paper, values between 0.1 and 0.3 work best. "
+                f"Try 0.25 for 1.5x speedup or 0.6 for 2x speedup."


The validation logic checks for rel_l1_thresh <= 0 but zero values are arguably valid since they would force computation at every step (effectively disabling caching). Consider whether the check should be < 0 instead, or document why zero is explicitly disallowed.

Suggested change

f"Please provide a float value between 0.1 and 1.0."

)

if self.rel_l1_thresh <= 0:

raise ValueError(

f"rel_l1_thresh must be positive, got {self.rel_l1_thresh}. "

f"Based on the TeaCache paper, values between 0.1 and 0.3 work best. "

f"Try 0.25 for 1.5x speedup or 0.6 for 2x speedup."

f"Please provide a float value >= 0.0 (values between 0.1 and 1.0 are recommended)."

)

if self.rel_l1_thresh < 0:

raise ValueError(

f"rel_l1_thresh must be non-negative, got {self.rel_l1_thresh}. "

f"Based on the TeaCache paper, values between 0.1 and 0.3 work best. "

f"Try 0.25 for 1.5x speedup or 0.6 for 2x speedup. "

f"Note that rel_l1_thresh=0.0 effectively disables caching by forcing computation at every step."

Copilot · 2026-01-20T09:20:50Z

src/diffusers/hooks/teacache.py

+            state.cnt = 0
+            state.accumulated_rel_l1_distance = 0.0
+            state.previous_modulated_input = None
+            state.previous_residual = None


The _maybe_reset_state_for_new_inference method doesn't reset cache_dict and uncond_seq_len which are used by Lumina2. This could cause stale cache data to persist across inference runs when using Lumina2 models. Consider calling state.reset() instead of manually resetting individual fields, or add these Lumina2-specific fields to the reset logic.

Suggested change

state.previous_residual = None

state.previous_residual = None

# Reset Lumina2-specific state to avoid stale cache/data between inference runs

if hasattr(state, "cache_dict") and state.cache_dict is not None:

# Clear in-place to preserve any existing references to the cache dict

state.cache_dict.clear()

if hasattr(state, "uncond_seq_len"):

state.uncond_seq_len = None

Copilot · 2026-01-20T09:20:50Z

src/diffusers/hooks/teacache.py

+    Example:
+        ```python
+        >>> from diffusers import FluxPipeline
+        >>> from diffusers.hooks import TeaCacheConfig
+
+        >>> pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
+        >>> pipe.to("cuda")
+
+        >>> config = TeaCacheConfig(rel_l1_thresh=0.2)
+        >>> pipe.transformer.enable_cache(config)
+
+        >>> image = pipe("A cat sitting on a windowsill", num_inference_steps=4).images[0]
+        >>> pipe.transformer.disable_cache()
+        ```


The example code in the docstring references torch but doesn't show the import statement. Consider adding import torch to the example for completeness.

base implement teacahce flux; follow original impl

781c7e3

LawJarp-A mentioned this pull request Nov 13, 2025

[feature] implement TeaCache #12589

Open

update cache utils with teacache

549bf97

sayakpaul requested a review from DN6 November 13, 2025 16:49

change hook to inline transformer processing instead of calling origi…

29d4ffc

…nal_forward

sayakpaul mentioned this pull request Nov 14, 2025

The Diffusers MVP 🚀 #12635

Open

LawJarp-A added 4 commits November 14, 2025 06:43

add extensive docstring (auto gen); add repr

07c6718

add param validation and error messages

a7598a1

add basic logging

d9648e5

add compatible test

59cb890

LawJarp-A marked this pull request as ready for review November 14, 2025 08:23

Merge branch 'main' into teacache-flux

5227937

LawJarp-A and others added 2 commits November 26, 2025 12:30

Merge branch 'main' into teacache-flux

296aa8d

update it to make it model agnostic

44663de

LawJarp-A added 2 commits November 26, 2025 11:08

add TeaCache hook tests and ensure cache integration passes style and…

4d34020

… quality checks

Add multi-model TeaCache support for Mochi, Lumina2, and CogVideoX wi…

9dab52f

…th auto-detection

sayakpaul reviewed Dec 8, 2025

View reviewed changes

LawJarp-A and others added 2 commits December 8, 2025 13:44

simplify model extract; use logger

4cb71d6

Merge branch 'main' into teacache-flux

ad90044

LawJarp-A requested a review from sayakpaul December 16, 2025 06:42

LawJarp-A and others added 2 commits December 16, 2025 12:32

Merge branch 'huggingface:main' into teacache-flux

c2c4c76

Refactor TeaCache hook into adapters

38effa1

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

DN6 reviewed Dec 29, 2025

View reviewed changes

Merge branch 'huggingface:main' into teacache-flux

9f0fbff

LawJarp-A added 3 commits January 6, 2026 09:49

refactor teacache: replace adapter classes with standalone functions

c605f4d

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

refactor teacache: remove closure pattern, use standalone utility fun…

883699e

…ctions Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

cleanup: remove dead comments

29110bf

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

Merge branch 'main' into teacache-flux

308e4c3

sayakpaul requested a review from DN6 January 8, 2026 06:33

LawJarp-A added 5 commits January 8, 2026 12:26

cleanup: code quality fixes

e328ccc

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

refactor: improve code quality, type hints

8946aeb

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

fix: move TeaCache context setting to denoising loop for proper state…

abb24e0

… isolation Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

simplify implementation and reduce code duplication

c45af51

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

simplify TeaCache tests with module-level imports and model factory h…

90eb746

…elpers Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

LawJarp-A and others added 2 commits January 12, 2026 16:31

Merge branch 'main' into teacache-flux

c6c5339

refactor: enhance type hints and add ControlNet support in TeaCache f…

b38315d

…orward methods Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

sayakpaul reviewed Jan 20, 2026

View reviewed changes

sayakpaul requested a review from Copilot January 20, 2026 09:09

Copilot started reviewing on behalf of sayakpaul January 20, 2026 09:10 View session

Apply style fixes

08deb44

Copilot AI reviewed Jan 20, 2026

View reviewed changes

		# Iterate over all attributes of the hook to see if any of them have the type `StateManager`. If so, call `set_context` on them.
		"""Set context on all StateManager attributes of this hook."""

		if prev_mean.item() > 1e-9:
		return ((current - previous).abs().mean() / prev_mean).item()

		return 0.0 if current.abs().mean().item() < 1e-9 else float("inf")


		@torch.compiler.disable

	is_boundary_step = state.cnt == 0 or state.cnt == state.num_steps - 1
	is_boundary_step = state.cnt == 0 or (state.num_steps > 0 and state.cnt == state.num_steps - 1)

-    # Track sequence length for step counting (CFG)
-    if state.uncond_seq_len is None:
-        state.uncond_seq_len = cache_key
-    if cache_key != state.uncond_seq_len:
+    # Track sequence length for step counting.
+    # We keep state.uncond_seq_len for backward compatibility but avoid assuming
+    # that unconditional is always processed before conditional.
+    if state.uncond_seq_len is None:
+        state.uncond_seq_len = cache_key
+    # Step counting strategy:
+    #   * If we ever observe more than one distinct cache_key, we assume CFG-like
+    #     behavior and increment state.cnt once per "pair" (i.e., once we've seen
+    #     at least two distinct cache_keys in a step), independent of ordering.
+    #   * If we only ever observe a single cache_key, we assume CFG is disabled and
+    #     increment state.cnt on every call.
+    if not hasattr(state, "_seen_cache_keys_in_step"):
+        state._seen_cache_keys_in_step = set()
+    if not hasattr(state, "_all_cache_keys_seen"):
+        state._all_cache_keys_seen = set()
+    state._seen_cache_keys_in_step.add(cache_key)
+    state._all_cache_keys_seen.add(cache_key)
+    has_multiple_cache_keys = len(state._all_cache_keys_seen) > 1
+    if has_multiple_cache_keys:
+        # CFG-like behavior: increment once per step after seeing multiple keys.
+        if len(state._seen_cache_keys_in_step) > 1:
+            state.cnt += 1
+            if state.cnt >= state.num_steps:
+                state.cnt = 0
+            # Reset for the next diffusion step.
+            state._seen_cache_keys_in_step.clear()
+    else:
+        # No-CFG behavior: only one cache_key is ever seen, so increment every call.

-            state.previous_residual = None
+            state.previous_residual = None
+            # Reset Lumina2-specific state to avoid stale cache/data between inference runs
+            if hasattr(state, "cache_dict") and state.cache_dict is not None:
+                # Clear in-place to preserve any existing references to the cache dict
+                state.cache_dict.clear()
+            if hasattr(state, "uncond_seq_len"):
+                state.uncond_seq_len = None

Implement TeaCache #12652

Are you sure you want to change the base?

Implement TeaCache #12652

Uh oh!

Conversation

LawJarp-A commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

What is TeaCache?

Architecture & Design

Supported Models

Benchmark Results (FLUX.1-dev)

Benchmark Results (Lumina2)

Benchmark Results (CogVideoX-2b)

Benchmark Results (Mochi)

Usage

Configuration Options

Files Changed

Before submitting

Who can review?

Uh oh!

LawJarp-A commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LawJarp-A commented Nov 17, 2025

Uh oh!

sayakpaul commented Nov 23, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 23, 2025

Uh oh!

DN6 commented Nov 24, 2025

Uh oh!

LawJarp-A commented Nov 24, 2025

Uh oh!

LawJarp-A commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LawJarp-A commented Dec 2, 2025

Uh oh!

LawJarp-A commented Dec 8, 2025

Uh oh!

sayakpaul commented Dec 8, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LawJarp-A commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LawJarp-A commented Dec 27, 2025

Uh oh!

DN6 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LawJarp-A commented Dec 29, 2025

Uh oh!

LawJarp-A commented Jan 6, 2026

Uh oh!

LawJarp-A commented Jan 6, 2026

Uh oh!

LawJarp-A commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

LawJarp-A commented Nov 13, 2025 •

edited

Loading

LawJarp-A commented Nov 13, 2025 •

edited

Loading

LawJarp-A commented Nov 26, 2025 •

edited

Loading

LawJarp-A commented Dec 8, 2025 •

edited

Loading

LawJarp-A commented Jan 12, 2026 •

edited

Loading

sayakpaul left a comment •

edited

Loading

github-actions bot commented Jan 20, 2026 •

edited

Loading