Skip to content

fenglang918/HiCache

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HiCache

A Plug-in Scaled-Hermite Upgrade for Taylor-Style Cache-then-Forecast Diffusion Acceleration

arXiv License ICLR 2026

English | 简体中文

Quick StartBackendsStoryUsageResultsLicense


📄 Abstract

Diffusion models have achieved remarkable success in content generation but suffer from prohibitive computational costs due to iterative sampling. While recent feature caching methods tend to accelerate inference through temporal extrapolation, these methods still suffer from severe quality loss due to the failure in modeling the complex dynamics of feature evolution.

To solve this problem, this paper presents HiCache (Hermite Polynomial-based Feature Cache), a training-free acceleration framework that fundamentally improves feature prediction by aligning mathematical tools with empirical properties. Our key insight is that feature derivative approximations in Diffusion Transformers exhibit multivariate Gaussian characteristics, motivating the use of Hermite polynomials—the potentially theoretically optimal basis for Gaussian-correlated processes. Besides, we introduce a dual-scaling mechanism that ensures numerical stability while preserving predictive accuracy, which is also effective when applied standalone to TaylorSeer.

Extensive experiments demonstrate HiCache's superiority: achieving 5.55× speedup on FLUX.1-dev while exceeding baseline quality, maintaining strong performance across text-to-image, video generation, and super-resolution tasks. Moreover, HiCache can be naturally added to the previous caching methods to enhance their performance, e.g., improving ClusCa from 0.9480 to 0.9840 in terms of image rewards.


✨ Features

  • HiCache Acceleration — Hermite polynomial-based feature caching for efficient diffusion sampling
  • Multi-Backend Support — FLUX, Qwen-Image, Qwen-Image-Edit, Chipmunk-Flux
  • Flexible Deployment — Single GPU & Multi-GPU launcher included
  • Easy Integration — Simple pip install -e ".[all]" setup

Note

Model weights / checkpoints are not included in this repository.

Tip

TODO

  • Release Inf-DiT super-resolution acceleration code

🧩 Story

HiCache is my first top-conference first-author paper. Research is delayed gratification under uncertainty: the project was finished in late July 2025, first submitted to AAAI and rejected, then re-submitted to ICLR and finally accepted at ICLR 2026 after rebuttal—often it takes half a year (or more) from “done” to “officially on the list”.

Behind HiCache is also the second half of my own transition—from being recommended for admission to Fudan’s Master of Finance (professional degree) to becoming an AI pre-PhD researcher (a prospective PhD applicant). Starting from March 2025, I decided to stop job-hunting and start applying for PhD programs, interning at Prof. Zhang’s EPIC Lab while still taking finance courses; I spent long days commuting between Fudan and SJTU (Xuhui), and the lab community was a big part of what kept me going.

Before that, I also went through a “finance → engineering” turn: after securing my recommendation for admission (to Fudan’s Master of Finance), I spent months learning backend engineering and later interned as a backend engineer at ByteDance, which helped me clarify what I truly wanted to pursue.

Technically, HiCache almost didn’t happen: it grew out of a PlanB Hermite direction that my junior collaborator (Shikang Zheng) was close to abandoning. We initially worked on applying TaylorSeer to super-resolution acceleration; after repeatedly hitting walls on video models, we pivoted to better cache-forecast algorithms. Shikang found two promising paths (FOCA and Hermite) and chose to focus on FOCA. At that moment, I was under the pressure of needing a first-author paper for my AI PhD applications; with Prof. Zhang’s suggestion, I revisited Hermite and contributed a key improvement (e.g., the dual-scaling trick for basis functions), turning it into HiCache. In roughly four months, we completed two conference works: FOCA (accepted at AAAI) and HiCache (rejected at AAAI first, then accepted at ICLR).

More context and notes on my journey (FinTechMath):


🚀 Quick Start

# Create and activate virtual environment
python3.10 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -e ".[all]"

🎯 Supported Backends

Backend Description Status
FLUX Text-to-image generation
Qwen-Image Text-to-image generation
Qwen-Image-Edit Image editing
Chipmunk-Flux Backend experiments

📖 Usage

Unified Launcher (Single / Multi-GPU)

FLUX

# Single GPU: --gpus 0 ; Multi GPU: --gpus 0,1
bash RUN/multi_gpu_launcher.sh --backend flux --mode HiCache --gpus 0 \
  --prompt_file resources/prompts/prompt.txt --output_dir outputs/hicache

Qwen-Image

# Recommended: pass a python interpreter from your Qwen env
bash RUN/multi_gpu_launcher.sh --backend qwen-image --python /path/to/python -- \
  --model_path /path/to/Qwen-Image --output_dir outputs/qwen_image

📊 Results

Text-to-Image Generation (FLUX.1-dev)

HiCache achieves 5.55× speedup with superior image quality compared to baseline methods.

Text-to-Image Comparison

Qualitative comparison across diverse prompts. HiCache produces results with higher fidelity compared to TaylorSeer and other baselines.

Detail Preservation & Style Consistency

Detail Comparison

High-frequency detail preservation: HiCache retains fine details better than competing methods.

Style Consistency

Consistent style and clean backgrounds under varying acceleration ratios.

Text-to-Video Generation (HunyuanVideo)

Video Generation Comparison

Superior temporal consistency and frame quality compared to other acceleration methods.

Image Super-Resolution (Inf-DiT)

Super-Resolution Comparison

HiCache achieves ~5.93× theoretical speedup while maintaining comparable PSNR and SSIM.


📦 Weights & Paths

FLUX Weights

Place FLUX weights under resources/weights/ (not tracked by git):

huggingface-cli download black-forest-labs/FLUX.1-dev \
  --local-dir resources/weights/FLUX.1-dev \
  --local-dir-use-symlinks False

Qwen Weights

Qwen-Image / Qwen-Image-Edit weights should be provided via:

  • --model_path argument, or
  • Environment variable: QWEN_IMAGE_MODEL_PATH

📁 Project Structure

HiCache/
├── models/                 # Model implementations
│   ├── flux/              # FLUX backend
│   ├── qwen_image/        # Qwen-Image backend
│   ├── qwen_image_edit/   # Qwen-Image-Edit backend
│   ├── chipmunk/          # Chipmunk-Flux experiments
│   └── hicache_fast_impl.py  # Core HiCache implementation
├── scripts/               # Utility scripts
├── RUN/                   # Launch scripts
└── resources/             # Prompts, weights, licenses

📄 License

Component License
This Repository GPL-3.0
Model Weights See resources/third_party/model_licenses/
Vendored Code See resources/third_party/code_licenses/ (e.g., Apache-2.0)

📚 Citation

If you find HiCache useful, please cite our paper:

@inproceedings{feng2026hicache,
  title={HiCache: A Plug-in Scaled-Hermite Upgrade for Taylor-Style Cache-then-Forecast Diffusion Acceleration},
  author={Feng, Liang and Zheng, Shikang and Liu, Jiacheng and Lin, Yuqi and Zhou, Qinming and Cai, Peiliang and Wang, Xinyu and Chen, Junjie and Zou, Chang and Ma, Yue and Zhang, Linfeng},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2026}
}

About

HiCache: Hermite Polynomial-based Feature Cache for diffusion inference

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors