xpu-webui

A Gradio web UI for Z-Image-Turbo on Intel XPU (e.g. Intel Arc B580) using native PyTorch XPU support.

Architecture

Component	Source	Implementation
Transformer	`Comfy-Org/z_image_turbo` (single `.safetensors`)	`modules/transformer.py` — self-contained nn modules ported from diffusers ZImagePipeline
Scheduler	—	`modules/scheduler.py` — `FlowMatchEulerDiscreteScheduler` with exponential shift
Text encoder	`Tongyi-MAI/Z-Image-Turbo`	Qwen3 loaded from local `.safetensors` with `transformers.AutoModel.from_config(...)` + `load_state_dict(...)`
VAE	`Tongyi-MAI/Z-Image-Turbo`	`AutoencoderKL` loaded from local `.safetensors` + local config

Requirements

Intel Arc GPU with up-to-date drivers
Python 3.10+

Installation

1. Install PyTorch with native XPU support

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu

2. Install remaining dependencies

pip install -r requirements.txt

Usage

python app.py

Open your browser at http://localhost:7860.
All model files must exist locally before launch.

For low-VRAM GPUs, runtime stages are loaded on-demand during generation:

Load tokenizer + text encoder, encode prompt, unload both.
Transformer stage (preloaded to CPU at startup):
- Mode offload (default): stream blocks CPU→GPU during denoising, minimizing VRAM.
- Mode persistent: keep full transformer resident on GPU across generations (higher VRAM, lower latency). The default mode prioritizes compatibility with lower-VRAM GPUs.

Expected local layout:

models/
	diffusion_models/
		z_image_turbo_bf16.safetensors
	text_encoders/
		qwen_3_4b.safetensors
		config.json                # optional; built-in fallback exists for Z-Image-Turbo
		tokenizer.json
		tokenizer_config.json
		merges.txt
		vocab.json
	vae/
		ae.safetensors
		config.json                # optional; built-in fallback exists for Z-Image-Turbo

The tokenizer files (tokenizer.json, tokenizer_config.json, merges.txt, vocab.json) come from the upstream tokenizer/ folder (https://huggingface.co/Tongyi-MAI/Z-Image-Turbo/tree/main/tokenizer). The text encoder and VAE weights are loaded directly from local .safetensors files rather than via from_pretrained(...) weight loading.

Settings

Parameter	Default	Description
Prompt	—	Text description of the image to generate
Negative prompt	—	Features to suppress (only used when Guidance scale > 0)
Width / Height	1024	Output resolution; must be multiples of 16, max 1536
Inference steps	9	Recommended by upstream; more steps = higher quality
Guidance scale	0.0	CFG weight — 0 = turbo mode (no classifier-free guidance)
Transformer runtime mode	offload	`offload` (default, lower VRAM) or `persistent` (higher VRAM, faster after first load)
Seed	-1	Fixed seed for reproducibility; -1 uses a random seed

Model

Z-Image-Turbo is a flow-matching text-to-image model by Alibaba Tongyi.
The transformer uses a single-stream architecture with adaLN modulation, RoPE position embeddings, and a Qwen3 text encoder.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
modules		modules
.gitignore		.gitignore
README.md		README.md
app.py		app.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
webui_config.json		webui_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xpu-webui

Architecture

Requirements

Installation

1. Install PyTorch with native XPU support

2. Install remaining dependencies

Usage

Settings

Model

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

xpu-webui

Architecture

Requirements

Installation

1. Install PyTorch with native XPU support

2. Install remaining dependencies

Usage

Settings

Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages