Skip to content

W2GenAI-Lab/UltraFlux

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arXiv Paper

UltraFlux is a diffusion transformer that extends Flux backbones to native 4K synthesis with consistent quality across a wide range of aspect ratios. The project unifies data, architecture, objectives, and optimization so that positional encoding, VAE compression, and loss design reinforce each other rather than compete.

UltraFlux samples

Show UltraFlux generation examples
UltraFlux sample 01 (4096×4096) UltraFlux sample 02 (4096×4096)
UltraFlux sample 03 (4096×4096) UltraFlux sample 04 (4096×4096)
UltraFlux sample 05 (4096×4096) UltraFlux sample 06 (4096×4096)
UltraFlux sample 07 (4096×4096) UltraFlux sample 08 (4096×4096)
UltraFlux sample 09 (4096×4096) UltraFlux sample 10 (4096×4096)
UltraFlux sample 11 (4096×4096) UltraFlux sample 12 (4096×4096)
UltraFlux sample 13 (4096×4096) UltraFlux sample 14 (4096×4096)
UltraFlux sample 15 (4096×4096) UltraFlux sample 16 (4096×4096)

Each sample is rendered at 4096×4096 resolution.

👥 Authors

Tian Ye1*‡,Song Fei1*, Lei Zhu1,2

1The Hong Kong University of Science and Technology (Guangzhou)
2The Hong Kong University of Science and Technology

*Equal Contribution, ‡Project Leader, †Corresponding Author


📰 News ✨✨

[2026.04.09] - UltraFlux is selected as CVPR 2026 Highlight (top 3%).

[2026.04.01] - We released the MultiAspect-4K-1M dataset and the filtering pipeline.

[2026.02.21] - UltraFlux is accepted by CVPR'26.

[2025.12.17] — Thanks to the community’s help, we fixed the implementation of Resonance alignment for the 2D RoPE.

[2025.11.26] — Thanks to smthemex for developing ComfyUI_UltraFlux T2I&I2I, which enables UltraFlux to run with as little as 8 GB GB of memory through the GGUF integration !!

[2025.11.21] – We released the UltraFlux-v1.1 transformer checkpoint. It is fine-tuned on a carefully curated set of high-aesthetic synthetic images to further improve visual aesthetics and composition quality. You can now enable it easily by uncommenting the corresponding lines in inf_ultraflux.py!

[2025.11.20] – We released the UltraFlux-v1 checkpoint, inference code, and the accompanying tech report.


Inference Quickstart

  • The script inf_ultraflux.py downloads the latest Owen777/UltraFlux-v1 weights (transformer + VAE) and runs a set of curated prompts.
  • Ensure PyTorch, diffusers, and CUDA are available, then run:
python inf_ultraflux.py
  • Generated images are saved into results/ultra_flux_*.jpeg at 4096×4096 resolution; edit the prompt list or pipeline arguments inside the script to customize inference.

MultiAspect-4K-1M Dataset and Filtering Pipeline

We have released the MultiAspect-4K-1M dataset, together with the filtering pipeline.

Each sample in MultiAspect-4K-1M provides an image_url for downloading the image. The metadata also contains the attributes, including bilingual captions, character tag, VLM-based quality and aesthetic scores, and classical interpretable signals—flatness and information entropy. To better respect image provenance and the original creators, about 98% of the dataset also includes source attribution metadata: work_url refers to the original webpage where the image was published, photographer gives the creator name, and photographer_url links to the creator’s profile or source page.

Images can be downloaded and filtering scores can be computed with:

# download the image
python tools/download_from_image_url.py "image_url in metadata"

# compute filtering scores
python tools/filtering_pipeline.py /path/to/image.jpg

Why UltraFlux?

  • 4K positional robustness. Resonance 2D RoPE with YaRN keeps training-window awareness while remaining band-aware and aspect-ratio aware to avoiding ghosting.
  • Detail-preserving compression. A lightweight, non-adversarial post-training routine sharpens Flux VAE reconstructions at 4K without sacrificing throughput, resolving the usual trade-off between speed and micro-detail.
  • 4K-aware objectives. The SNR-Aware Huber Wavelet Training Objective emphasizes high-frequency fidelity in the latent space so gradients stay balanced across timesteps and frequency bands.
  • Aesthetic-aware scheduling. Stage-wise Aesthetic Curriculum Learning (SACL) routes high-aesthetic supervision toward high-noise steps, sculpting the model prior where it matters most for vivid detail and alignment.

MultiAspect-4K-1M Dataset

  • Scale and coverage. 1M native and near-4K images with controlled aspect-ratio sampling to ensure both wide and portrait regimes are equally represented.
  • Content balance. A dual-channel collection pipeline debiases landscape-heavy sources toward human-centric content.
  • Rich metadata. Every sample includes bilingual captions, subject tags, CLIP/VLM-based quality and aesthetic scores, and classical IQA metrics, enabling targeted subset sampling for specific training stages.

Model & Training Recipe

  1. Backbone. Flux-style DiT trained directly on MultiAspect-4K-1M with token-efficient blocks and Resonance 2D RoPE + YaRN for AR-aware positional encoding.
  2. Objective. SNR-Aware Huber Wavelet loss aligns gradient magnitudes with 4K statistics, reinforcing high-frequency fidelity under strong VAE compression.
  3. Curriculum. SACL injects high-aesthetic data primarily into high-noise timesteps so the model’s prior captures human-desired structure early in the trajectory.
  4. VAE Post-training. A simple, non-adversarial fine-tuning pass boosts 4K reconstruction quality while keeping inference cost low.

Results

UltraFlux surpasses recent native-4K and training-free scaling baselines on standard 4K benchmarks spanning:

  • Image fidelity at 4096×4096 and higher
  • Aesthetic preference scores
  • Text-image alignment metrics across diverse aspect ratios

Resources

We will release the full stack upon publication:

  • MultiAspect-4K-1M dataset with metadata loaders
  • Training pipelines
  • Evaluation code covering fidelity, aesthetic, and alignment metrics

🚀 Updates

For the purpose of fostering research and the open-source community, we plan to open-source the entire project, encompassing training, inference, weights, etc. Thank you for your patience and support! 🌟

  • Release GitHub repo.
  • Release inference code (inf_ultraflux.py).
  • Release training code.
  • Release model checkpoints.
  • Release arXiv paper.
  • Release HuggingFace Space demo.
  • Release dataset (MultiAspect-4K-1M).

Stay tuned for links and usage instructions. For updates, please watch this repository or open an issue.

Acknowledgement

We are grateful for the following projects:

BibTeX citation

@misc{ye2025ultrafluxdatamodelcodesignhighquality,
      title={UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios}, 
      author={Tian Ye and Song Fei and Lei Zhu},
      year={2025},
      eprint={2511.18050},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.18050}, 
}

About

UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages