OneReward

Official implementation of OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

🚀 TODO

Release arXiv paper.
Release inference code.
Release FLUX.1-Fill-dev[OneReward] and FLUX.1-Fill-dev[OneRewardDynamic] mask-guided edit checkpoints.
Release FLUX.1-dev[OneReward] text-to-image checkpoints.
Comfyui support.
Future open-source plan.

Introduction

We propose OneReward, a novel RLHF methodology for the visual domain by employing Qwen2.5-VL as a generative reward model to enhance multitask reinforcement learning, significantly improving the policy model’s generation ability across multiple subtask. Building on OneReward, we develop Seedream 3.0 Fill, a unified SOTA image editing model capable of effec-tively handling diverse tasks including image fill, image extend, object removal, and text rendering. It surpasses several leading commercial and open-source systems, including Ideogram, Adobe Photoshop, and FLUX Fill [Pro]. Finally, based on FLUX Fill [dev], we are thrilled to release FLUX.1-Fill-dev-OneReward, which outperforms closed-source FLUX Fill [Pro] in inpainting and outpainting tasks, serving as a powerful new baseline for future research in unified image editing.

Seedream 3.0 Fill Performance Overview

Image Fill	Image Extend with Prompt
Image Extend without Prompt	Object Removal

Quick Start

Make sure your transformers>=4.51.3 (Supporting Qwen2.5-VL)
Install the latest version of diffusers (>=0.35.0)

pip install -U diffusers

The following contains a code snippet illustrating how to use the model to generate images based on text prompts and input mask, support inpaint(image-fill), outpaint(image-extend), eraser(object-removal). As the model is fully trained, FluxFillCFGPipeline with cfg is needed, you can find in pipeline_flux_fill_with_cfg.py.

import torch
from diffusers.utils import load_image
from diffusers import FluxTransformer2DModel

from src.pipeline_flux_fill_with_cfg import FluxFillCFGPipeline

transformer_onereward = FluxTransformer2DModel.from_pretrained(
    "bytedance-research/OneReward",
    subfolder="flux.1-fill-dev-OneReward-transformer",
    torch_dtype=torch.bfloat16
)

pipe = FluxFillCFGPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Fill-dev", 
    transformer=transformer_onereward,
    torch_dtype=torch.bfloat16).to("cuda")

# Image Fill
image = load_image('assets/image.png')
mask = load_image('assets/mask_fill.png')
image = pipe(
    prompt='the words "ByteDance", and in the next line "OneReward"',
    negative_prompt="nsfw",
    image=image,
    mask_image=mask,
    height=image.height,
    width=image.width,
    guidance_scale=1.0,
    true_cfg=4.0,
    num_inference_steps=50,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save(f"image_fill.jpg")

input

output

Or you can run the whole inference demo in demo_one_reward.py and demo_one_reward_dynamic.py

python3 -m src.examples.demo_one_reward
python3 -m src.examples.demo_one_reward_dynamic

Model

FLUX.1-Fill-dev[OneReward], trained with Alg.1 in paper

transformer_onereward = FluxTransformer2DModel.from_pretrained(
    "bytedance-research/OneReward",
    subfolder="flux.1-fill-dev-OneReward-transformer",
    torch_dtype=torch.bfloat16
)

pipe = FluxFillCFGPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Fill-dev", 
    transformer=transformer_onereward,
    torch_dtype=torch.bfloat16).to("cuda")

FLUX.1-Fill-dev[OneRewardDynamic], trained with Alg.2 in paper

transformer_onereward_dynamic = FluxTransformer2DModel.from_pretrained(
    "bytedance-research/OneReward",
    subfolder="flux.1-fill-dev-OneRewardDynamic-transformer",
    torch_dtype=torch.bfloat16
)

pipe = FluxFillCFGPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Fill-dev", 
    transformer=transformer_onereward_dynamic,
    torch_dtype=torch.bfloat16).to("cuda")

Multi-task Usage

Image Extend with prompt

image = load_image('assets/image2.png')
mask = load_image('assets/mask_extend.png')
image = pipe(
    prompt='Deep in the forest, surronded by colorful flowers',
    negative_prompt="nsfw",
    image=image,
    mask_image=mask,
    height=image.height,
    width=image.width,
    guidance_scale=1.0,
    true_cfg=4.0,
    num_inference_steps=50,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save(f"image_extend_w_prompt.jpg")

Image Extend without prompt

image = load_image('assets/image2.png')
mask = load_image('assets/mask_extend.png')
image = pipe(
    prompt='high-definition, perfect composition',  # using fix prompt in image extend wo prompt
    negative_prompt="nsfw",
    image=image,
    mask_image=mask,
    height=image.height,
    width=image.width,
    guidance_scale=1.0,
    true_cfg=4.0,
    num_inference_steps=50,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save(f"image_extend_wo_prompt.jpg")

Object Removal

image = load_image('assets/image.png')
mask = load_image('assets/mask_remove.png')
image = pipe(
    prompt='remove',  # using fix prompt in object removal
    negative_prompt="nsfw",
    image=image,
    mask_image=mask,
    height=image.height,
    width=image.width,
    guidance_scale=1.0,
    true_cfg=4.0,
    num_inference_steps=50,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save(f"object_removal.jpg")

Object Removal with Lora

As the base model flux fill have undergone heavy SFT for object generation which cause it gain only 15% usability for object removal, the improvement on removal is not obvious. we release a lora for object removal separately and might be helpful for you.

import torch
from diffusers.utils import load_image
from diffusers import FluxTransformer2DModel

from src.pipeline_flux_fill_with_cfg import FluxFillCFGPipeline

transformer_onereward = FluxTransformer2DModel.from_pretrained(
    "bytedance-research/OneReward",
    subfolder="flux.1-fill-dev-OneReward-transformer",
    torch_dtype=torch.bfloat16
)

pipe = FluxFillCFGPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Fill-dev", 
    transformer=transformer_onereward,
    torch_dtype=torch.bfloat16).to("cuda")

pipe.load_lora_weights(
    "bytedance-research/OneReward",
    subfolder="flux.1-fill-dev-object-removal-lora",
    weight_name="pytorch_lora_weights.safetensors",
    adapter_name="object_removal_lora"
)
print("Loaded adapters:", pipe.get_list_adapters())  
pipe.set_adapters(["object_removal_lora"], adapter_weights=[1.0])

# Object Removal
image = load_image('assets/image.png')
mask = load_image('assets/mask_remove.png')
image = pipe(
    prompt='remove',  # using fix prompt in object removal
    negative_prompt="nsfw",
    image=image,
    mask_image=mask,
    height=image.height,
    width=image.width,
    guidance_scale=1.0,
    true_cfg=4.0,
    num_inference_steps=50,
    generator=torch.Generator("cpu").manual_seed(0),
).images[0]
image.save(f"object_removal_lora.jpg")

License Agreement

Code is licensed under Apache 2.0. Model is licensed under CC BY NC 4.0.

Citation

@article{gong2025onereward,
  title={OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning},
  author={Gong, Yuan and Wang, Xionghui and Wu, Jie and Wang, Shiyin and Wang, Yitong and Wu, Xinglong},
  journal={arXiv preprint arXiv:2508.21066},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
model_licenses		model_licenses
src		src
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OneReward

🚀 TODO

Introduction

Quick Start

Model

FLUX.1-Fill-dev[OneReward], trained with Alg.1 in paper

FLUX.1-Fill-dev[OneRewardDynamic], trained with Alg.2 in paper

Multi-task Usage

Image Extend with prompt

Image Extend without prompt

Object Removal

Object Removal with Lora

License Agreement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

OneReward

🚀 TODO

Introduction

Quick Start

Model

FLUX.1-Fill-dev[OneReward], trained with Alg.1 in paper

FLUX.1-Fill-dev[OneRewardDynamic], trained with Alg.2 in paper

Multi-task Usage

Image Extend with prompt

Image Extend without prompt

Object Removal

Object Removal with Lora

License Agreement

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages