[2026.03.26] β Thanks to smthemex for developing ComfyUI_LucidNFT.
Song Fei1, β , Tian Ye1, β , Sixiang Chen1, Zhaohu Xing1, Jianyu Lai1, Lei Zhu1,2,*
1The Hong Kong University of Science and Technology (Guangzhou)
2The Hong Kong University of Science and Technologyβ Equal Contribution, * Corresponding Author
π‘ We also have other projects on 4K text-to-image generation and photo-realistic image estoration that may interest you. β¨
[CVPR 2026 Highlight] UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios
Tian Ye1*β‘, Song Fei1*, Lei Zhu1,2β
![]()
![]()
![]()
![]()
![]()
[ICLR 2026] LucidFlux: Caption-Free Photo-Realistic Image Restoration via a Large-Scale Diffusion Transformer
Song Fei1*, Tian Ye1*β‘, Lujia Wang1 , Lei Zhu1,2β
![]()
![]()
![]()
![]()
![]()
LucidNFT is a multi-reward preference optimization framework for flow-matching real-world image super-resolution. Built on top of LucidFlux, it improves perceptual quality while preserving LR-anchored faithfulness under diverse real-world degradations.
Compared with naive multi-reward preference optimization, LucidNFT focuses on the part that is actually difficult in Real-ISR: outputs may look realistic, yet drift away from the semantic and structural evidence contained in the low-quality input. LucidNFT addresses this with a faithfulness-aware reward design and a more stable multi-reward optimization strategy.
- Faithfulness is hard without HR ground truth. In real-world SR, visually plausible outputs can still contradict the LR evidence.
- Naive scalarized rewards are unstable. Directly mixing heterogeneous reward objectives before normalization can compress rollout-wise contrasts and weaken preference optimization.
- Perceptual metrics alone are insufficient. Metrics that reward sharpness or realism do not directly measure LR-anchored faithfulness.
- Real-world data diversity matters. Small benchmark-only datasets limit rollout diversity and reduce the quality of preference signals.
LucidNFT consists of three key ingredients:
- LucidConsistency. A frozen Qwen3-VL embedding backbone plus a lightweight trainable projection head aligns LR and HR semantics in a shared representation space and yields a degradation-robust consistency score.
- Decoupled advantage normalization. Each reward objective is normalized per rollout group before fusion, preserving perceptual-faithfulness contrasts and mitigating advantage collapse.
- LucidLR-supported preference optimization. Large-scale real-world low-quality data improves degradation coverage and rollout diversity.
Overview of LucidConsistency. Left: inference stage for LR-SR semantic consistency scoring. Right: training stage for projection-head optimization with LR-HR pairs.
| Domain | Pairing | Baseline | LucidConsistency |
|---|---|---|---|
| Synthetic | LSDIR-Val (paired) | 0.759 | 0.890 (+0.131) |
| Real-World | RealSR | 0.799 | 0.925 (+0.126) |
| Real-World | DRealSR | 0.786 | 0.921 (+0.135) |
| Cross-Bench | RealSR LR β DRealSR HR | 0.144 | 0.100 (-0.044) |
| Cross-Bench | DRealSR LR β RealSR HR | 0.140 | 0.131 (-0.009) |
This is the core signal used to distinguish perceptually strong but semantically drifting outputs from those that remain faithful to the LR input.
LucidLR is a 20K-image real-world low-quality dataset curated for preference optimization and unsupervised Real-ISR fine-tuning. It contains diverse natural degradations such as blur and compression artifacts, and provides stronger rollout diversity than small benchmark-oriented datasets.
| Dataset | Pairing | Primary Usage | Type | # Images |
|---|---|---|---|---|
| RealSR | Paired | Testing / Benchmark | Real-captured | 100 |
| DRealSR | Paired | Testing / Benchmark | Real-captured | 93 |
| RealLQ250 | Unpaired | Testing / Benchmark | Real-world | 250 |
| LucidLR (ours) | Unpaired | Preference Optimization / Unsupervised Training | Real-world | 20K |
Advantage separability analysis. LucidNFT yields stronger advantage gaps and higher separability than naive scalarized optimization.
Training dynamics. Both LucidConsistency and IQA-oriented rewards improve steadily during preference optimization on LucidFlux.
According to the project page, LucidNFT improves the perceptual-faithfulness trade-off on top of LucidFlux across RealLQ250, DRealSR, and RealSR, while maintaining stable optimization behavior.
Quantitative comparison with state-of-the-art Real-ISR methods on RealLQ250, DRealSR, and RealSR. Higher is better for all metrics except NIQE. Values in parentheses denote improvements over the corresponding backbone baseline.
Visual comparison on RealLQ250. LucidNFT further improves semantic consistency and perceptual quality over the baseline LucidFlux, producing more faithful structures and richer texture details.
git clone https://github.com/W2GenAI-Lab/LucidNFT.git
cd LucidNFT
python -m venv .venv
source .venv/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txtRun the downloader to populate weights/ with the required assets, including the FLUX base model, SwinIR, LucidFlux checkpoint, prompt embeddings, LucidNFT LoRA, UltraFlux VAE, and SigLIP:
python -m tools.hf_login --token "$HF_TOKEN"
python -m tools.download_weights --dest weightsThis script also generates weights/env.sh. Source it before inference so the FLUX base paths are exported correctly:
source weights/env.shRun the LucidFlux baseline:
python inference.py \
--checkpoint weights/lucidflux/lucidflux.pth \
--control_image /path/to/lr_image_or_dir \
--output_dir outputs \
--width 1024 \
--height 1024 \
--num_steps 24 \
--swinir_pretrained weights/swinir.pth \
--siglip_ckpt weights/siglip \
--offloadRun LucidFlux + LucidNFT LoRA:
python inference.py \
--checkpoint weights/lucidflux/lucidflux.pth \
--control_image /path/to/lr_image_or_dir \
--output_dir outputs-lora \
--width 1024 \
--height 1024 \
--num_steps 24 \
--swinir_pretrained weights/swinir.pth \
--siglip_ckpt weights/siglip \
--lora_path weights/lucidflux/LucidFlux+LucidNFT_lora \
--offloadThe repository also includes a lightweight LucidConsistency scoring entrypoint for comparing an LR image against an HR/SR image, or two benchmark folders with matched file counts. The learned projection-head score is reported as LucidConsistency.
Score a single image pair:
python test_LucidConsistency.py \
--model_name_or_path weights/LucidConsistency/Qwen3-VL-Embedding-8B \
--proj_head weights/LucidConsistency/proj_head.pt \
--lr /path/to/lr.png \
--hr /path/to/hr_or_sr.pngScore two benchmark folders:
python test_LucidConsistency.py \
--model_name_or_path weights/LucidConsistency/Qwen3-VL-Embedding-8B \
--proj_head weights/LucidConsistency/proj_head.pt \
--lr /path/to/lr_benchmark \
--hr /path/to/hr_benchmark@article{fei2026lucidnft,
title={LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Generative Real-World Super-Resolution},
author={Fei, Song and Ye, Tian and Chen, Sixiang and Xing, Zhaohu and Lai, Jianyu and Zhu, Lei},
journal={arXiv preprint arXiv:2603.05947},
year={2026}
}This repository is released under the license specified in LICENSE.













