ComfyUI Remove Background nodes (SET)

This repository provides a set of custom nodes for ComfyUI focused on background removal and or replacement.

Important

ComfyUI 0.3.48 is currently needed (Aug 1, 2025)

⚙️ Main features

✅ No bizarre extra dependencies, we use the same modules as ComfyUI

✅ Warnings and errors visible in the browser, configurable debug information in the console

✅ Support for BEN1/2, BiRefNet, BRIA 1.4/2, Depth Anything V2, DiffDIS, InSPyReNet, MODNet, MVANet, PDFNet, U-2-Net, IS-Net

✅ Automatic model download (Only the SD Turbo VAE might be needed for DiffDIS)

📜 Table of Contents

🚀 Installation
📦 Dependencies
🖼️ Examples
- Simple (01_Simple, 01_Change_Background)
- More advanced (02_Full_example, 03_Web_page_examples, 04_Advanced)
- Video (05_Video, 05_Video_Advanced)
- Model specific (01_PDFNet_simple, 06_PDFNet_external_map, 01_Simple_DiffDIS)
- Comparison (07_PDFNet_vs_BiRefNet, 09_Compare_Models)
✨ Nodes
📝 Usage Notes
📜 Project History
⚖️ License
🙏 Attributions

✨ Nodes

Loaders

The loaders are used to load a background removal model. We have a general loader that will look for models in the ComfyUI/models/rembg folder. You can reconfigure this path using the RemBG_SET key in the extra_model_paths.yaml file of ComfyUI.

Note that we also look for the rembg and birefnet keys. If these keys aren't defined we assume they point to ComfyUI/models/rembg and ComfyUI/models/BiRefNet. Also note that models downloaded to ~/.transparent-background/ (or ${TRANSPARENT_BACKGROUND_FILE_PATH'}.transparent-background/) will be also available.

In addition we have automatic downloaders for each supported model family.

Load RemBG model by file

Display Name: Load RemBG model by file
Internal Name: LoadRembgByBiRefNetModel_SET
Category: RemBG_SET/Load
Description: Loads a model from the ComfyUI/models/rembg folder, you can connect its output to any of the processing nodes
Purpose: Used for models that you already downloaded, or perhaps you trained.
Inputs:
- model (FILENAME): The name of the model in the ComfyUI/models/rembg folder, use R to refresh the list
- device (DEVICE): The device where the model will be executed. Using AUTO you'll use the default ComfyUI target (i.e. your GPU)
- dtype (DTYPE_OPS): Used to select the data type using during inference. AUTO means we will use the same data type as the model weights loaded from disk. Most of the models performs quite well on 16 bits floating point. You can force 16 bits to save VRAM, or even force to convert 16 bits values to 32 bits.
- vae (VAE, optional): Only needed for DiffDIS, you have to connect a "Load VAE" node here. The model needs the SD Turbo VAE, please look in the examples.
- positive (CONDITIONING, optional): Experimental and used only for DiffDIS. In practice you should leave it unconnected, the model was trained with an empty conditioning text.
Output:
- model (SET_REMBG): The loaded model, ready to be connected to a processing node

Load XXXXXX model by name

Display Name: Load XXXXXX model by name
Internal Name: AutoDownloadXXXXXXModel_SET
Category: RemBG_SET/Load
Description: Load a model of the XXXXXX family, if the model isn't on disk this is automatically downloaded. XXXXXX is one of the supported families (i.e. 'BiRefNet', 'MVANet/BEN', 'InSPyReNet', 'U-2-Net', 'IS-Net', 'MODNet', 'PDFNet', 'DiffDIS')
Purpose: Download from internet and load to memory a model for background removal. The names are descriptive and says how big is the file.
Inputs:
- model (FILENAME): The descriptive name of the model
- device (DEVICE): The device where the model will be executed. Using AUTO you'll use the default ComfyUI target (i.e. your GPU)
- dtype (DTYPE_OPS): Used to select the data type using during inference. AUTO means we will use the same data type as the model weights loaded from disk. Most of the models performs quite well on 16 bits floating point. You can force 16 bits to save VRAM, or even force to convert 16 bits values to 32 bits.
- vae (VAE, only for DiffDIS): You have to connect a "Load VAE" node here. The model needs the SD Turbo VAE, please look in the examples.
- positive (CONDITIONING, only for DiffDIS): Experimental. In practice you should leave it unconnected, the model was trained with an empty conditioning text.
Output:
- model (SET_REMBG): The loaded model, ready to be connected to a processing node
- train_w (INT): Width of the images used during training. You should use this size for optimal results. Note that most models accepts any size multiple of 32. The General 2K Lite BiRefNet model was trained with some flexibility in the size. The DiffDIS is more restrictive. Only for manual pre-processing.
- train_h (INT): Height of the images used during training. You should use this size for optimal results. Note that most models accepts any size multiple of 32. The General 2K Lite BiRefNet model was trained with some flexibility in the size. The DiffDIS is more restrictive. Only for manual pre-processing.
- norm_params (NORM_PARAMS): Normalization parameters for the input images. This is needed only for advanced use when you want to manually pre-process the images. The Arbitrary Normalize node from Image Misc can use these parameters to apply the correct normalization.

Processing nodes

These nodes applies the loaded model to estimate the foreground object. For normal use the simplest node is Remove background, it will generate an RGBA image with transparent background, or replace it using a provided image.

This simple node doesn't allow to change much options, so you could also want to use Remove background (full). But note this node will also consume more RAM.

What the models do is to generate a map where each pixel represents the estimated probability that it belongs to the foreground. This is mask that you can apply to remove the background. The Get background mask node is oriented to just get this mask, no background removal or replacement.

If you want to play at the lowest level use the Get background mask low level. This node doesn't pre-process the input image and returns the mask without extra post-processing.

Remove background

Display Name: Remove background
Internal Name: RembgByBiRefNet_SET
Category: RemBG_SET/Basic
Description: Removes or replaces the background from the input image.
Inputs:
- model (SET_REMBG): The model to use
- images (IMAGE): One or more images to process, will be scaled to a size that is good for the model
- batch_size (INT): How many images will be processed at once. Useful for videos, depending on the model and the GPU you have this can make things faster. Consumes more VRAM. Note that for boards like RTX3060 is better to just use 1.
- depths (MASK optional): Can be used for PDFNet to provide externally computed depth maps
- background (IMAGE optional): Image to use as background, will be scaled to the size of images. If you don't provide an image the output will be an RGBA image with transparency. Note that this can be 1 or more images. If images is a video this input can be another video of the same number of frames.
- out_dtype (AUTO, float32, float16): Which data type will be used for the output image. Using AUTO is recommended. Can be used to save RAM when processing long videos, use float16.
Output:
- images (IMAGE): The images with the background removed (transparent) or replaced by the background image

Remove background (full)

Display Name: Remove background (full)
Internal Name: RembgByBiRefNetAdvanced_SET
Category: RemBG_SET/Advanced
Description: Removes or replaces the background from the input image. Gives more options and also generates masks and other stuff.
Inputs:
- model (SET_REMBG): The model to use
- images (IMAGE): One or more images to process, will be scaled to a size that is good for the model
- width (INT): The width to scale the image before applying the model. Should be supported by the model. Usually is the train_w from Load XXXXXX model by name
- height (INT): The height to scale the image before applying the model. Should be supported by the model. Usually is the train_h from Load XXXXXX model by name
- upscale_method (area, bicubic, nearest-exact, bilinear, lanczos): Which algorithm will be used to scale the image to and from the model size. Usually bicubic is a good choice.
- blur_size (INT): Diameter for the coarse gaussian blur used for the Approximate Fast Foreground Colour Estimation.
- blur_size_two (INT): Diameter for the fine gaussian blur (see blur_size)
- fill_color (BOOLEAN): When enabled and no background is provided we fill the background using a color.
- color (STRING): Color to use when filling the background. You can specify it in multiple ways, even by name, more here
- mask_threshold (FLOAT): Most models generates masks that contain a value from 0 to 1, but can be any value in between. Matte models can estimate transparency using it. If you need to make the mask 0 or 1, but nothing in between, you can provide a threshold here. Values above it will become 1 and the rest 0.
- batch_size (INT): How many images will be processed at once. Useful for videos, depending on the model and the GPU you have this can make things faster. Consumes more VRAM. Note that for boards like RTX3060 is better to just use 1.
- depths (MASK optional): Can be used for PDFNet to provide externally computed depth maps. This is the map generated by Depth Anything V2
- background (IMAGE optional): Image to use as background, will be scaled to the size of images. If you don't provide an image the output will be an RGBA image with transparency. Note that this can be 1 or more images. If images is a video this input can be another video of the same number of frames.
- out_dtype (AUTO, float32, float16): Which data type will be used for the output image. Using AUTO is recommended. Can be used to save RAM when processing long videos, use float16.
Output:
- images (IMAGE): The images with the background removed (transparent) or replaced by the background image
- masks (MASK): The estimated masks, where a higher value means the model estimates it belongs to the foreground with more confidence.
- depths (MASK): The estimated depth map. Either from the depths input or computed. Note this applies only to PDFNet. This is the map generated by Depth Anything V2
- edges (MASK): The estimated edges. This is only generated by the DiffDIS model.

Get background mask

Display Name: Get background mask
Internal Name: GetMaskByBiRefNet_SET
Category: RemBG_SET/Basic
Description: Computes the foreground mask. It normalizes the input images, scales them to the model size, computes the masks and then scales the masks to the image size. No background removal/replacement is done.
Inputs:
- model (SET_REMBG): The model to use
- images (IMAGE): One or more images to process, will be scaled to a size that is good for the model
- width (INT): The width to scale the image before applying the model. Should be supported by the model. Usually is the train_w from Load XXXXXX model by name
- height (INT): The height to scale the image before applying the model. Should be supported by the model. Usually is the train_h from Load XXXXXX model by name
- upscale_method (area, bicubic, nearest-exact, bilinear, lanczos): Which algorithm will be used to scale the image to and from the model size. Usually bicubic is a good choice.
- mask_threshold (FLOAT): Most models generates masks that contain a value from 0 to 1, but can be any value in between. Matte models can estimate transparency using it. If you need to make the mask 0 or 1, but nothing in between, you can provide a threshold here. Values above it will become 1 and the rest 0.
- batch_size (INT): How many images will be processed at once. Useful for videos, depending on the model and the GPU you have this can make things faster. Consumes more VRAM. Note that for boards like RTX3060 is better to just use 1.
- depths (MASK optional): Can be used for PDFNet to provide externally computed depth maps. This is the map generated by Depth Anything V2
- out_dtype (AUTO, float32, float16): Which data type will be used for the output image. Using AUTO is recommended. Can be used to save RAM when processing long videos, use float16.
Output:
- masks (MASK): The estimated masks, where a higher value means the model estimates it belongs to the foreground with more confidence.
- depths (MASK): The estimated depth map. Either from the depths input or computed. Note this applies only to PDFNet. This is the map generated by Depth Anything V2
- edges (MASK): The estimated edges. This is only generated by the DiffDIS model.

Get background mask low level

Display Name: Get background mask low level
Internal Name: GetMaskLowByBiRefNet_SET
Category: RemBG_SET/Advanced
Description: Computes the foreground mask. No pre or post processing is applied, you must do it outside the node.
Inputs:
- model (SET_REMBG): The model to use
- images (IMAGE): One or more images to process, they must be normalized to a range that is good for the model. Their size must be similar to the size used to train the model.
- batch_size (INT): How many images will be processed at once. Useful for videos, depending on the model and the GPU you have this can make things faster. Consumes more VRAM. Note that for boards like RTX3060 is better to just use 1.
- depths (MASK optional): Can be used for PDFNet to provide externally computed depth maps. This is the map generated by Depth Anything V2
- out_dtype (AUTO, float32, float16): Which data type will be used for the output image. Using AUTO is recommended. Can be used to save RAM when processing long videos, use float16.
Output:
- masks (MASK): The estimated masks, where a higher value means the model estimates it belongs to the foreground with more confidence.
- depths (MASK): The estimated depth map. Either from the depths input or computed. Note this applies only to PDFNet. This is the map generated by Depth Anything V2
- edges (MASK): The estimated edges. This is only generated by the DiffDIS model.

Other nodes

The PDFNet model is a special case, instead of just using the image it also uses an estimation of the depth of the image performed using Depth Anything V2. In order to allow automatic computation of the depth maps I took Kijai nodes and adapted them to this use.

Load Depth Anything by name

Display Name: Load Depth Anything by name
Internal Name: DownloadAndLoadDepthAnythingV2Model_SET
Category: RemBG_SET/Load
Description: Downloads and loads to memory one of the Depth Anything V2 models.
Inputs:
- model (STRING): The name of the model to use. Small, Base and Large are available in 16 and 32 bits. The 16 bits version works quite well. PDFNet was trained using the Base version.
Output:
- da_v2_model (DAMODEL): The model ready to be used.

Depth Anything V2

Display Name: Depth Anything V2
Internal Name: DepthAnything_V2_SET
Category: RemBG_SET/Advanced
Description: Computes an estimated depth map of the image, larger values means the pixel is closer to the camera
Inputs:
- da_model (DAMODEL): The model from the Load Depth Anything by name node.
- images (IMAGE): One or more images to process, will be normalized and scaled.
- batch_size (INT): How many images will be processed at once.
Output:
- depths (MASK): The depth map
- depth_imgs (IMAGE): The same map in a format compatible with nodes that needs an image. The three channels (R, G, B) are the same buffer, shared with the mask.

🚀 Installation

You can install the nodes from the ComfyUI nodes manager, the name is Remove Background (SET) (remove-background), or just do it manually:

Clone this repository into your ComfyUI/custom_nodes/ directory:

cd ComfyUI/custom_nodes/
git clone https://github.com/set-soft/ComfyUI-RemoveBackground_SET ComfyUI-RemoveBackground_SET

Install dependencies: pip install -r ComfyUI/custom_nodes/ComfyUI-RemoveBackground_SET/requirements.txt

Important

SeCoNoHe lib is developed in parallel with my nodes, when installing the nodes from the repo you might need to install a fresh copy of SeCoNoHe. pip install git+https://github.com/set-soft/seconohe.git

Restart ComfyUI.

The nodes should then appear under the "RemBG_SET" category in the "Add Node" menu.

📦 Dependencies

SeCoNoHe (seconohe): This is just some functionality I wrote shared by my nodes, only depends on ComfyUI.
PyTorch: Installed by ComfyUI
einops: Installed by ComfyUI
kornia: Installed by ComfyUI
safetensors: Installed by ComfyUI
Requests (optional): Usually an indirect ComfyUI dependency. If installed it will be used for downloads, it should be more robust than then built-in urllib, used as fallback.
Colorama (optional): Might help to get colored log messages on some terminals. We use ANSI escape sequences when it isn't installed.

🖼️ Examples

Once installed the examples are available in the ComfyUI workflow templates, in the remove-background section (or ComfyUI-RemoveBackground_SET).

Simple

These examples shows how to remove the background, obtaining an image with transparency, or replacing it with an image. Note that RGBA images, the ones with transparency, aren't supported by all nodes. The correct way to handle them is to have the image and a mask, but using RGBA is what most background removal tools do.

01_Simple: Basic use to get an RGBA image
01_Change_Background: Basic example showing how to replace the background of an image

More advanced

These examples show how to have more control over the process.

02_Full_example: Shows how to use the full node to get an RGBA image. Needs Image Misc to download the example image.
03_Web_page_examples: Allows comparing the result with the original image. Downloads the BiRefNet exmamples. Needs Image Misc to download the example images and rgthree-comfy to compare the images.
04_Advanced: Shows how to do custom pre and post processing, including filling with a color, background image replacement and object highlight. Needs Image Misc and rgthree-comfy to compare the images.
04_Advanced_subgraphs: Same as the 04_Advanced but using subgraphs.
08_Batch: Shows how to process multiple images in the same run. Needs Image Misc and rgthree-comfy to compare the images.

Video

Examples for video processing, using ComfyUI video nodes and advanced ComfyUI-VideoHelperSuite nodes.

05_Video: Simple video workflow to replace the background of a video using a still image. Uses the Comfy-Core nodes.
05_Video_Advanced: Video workflow to replace the background of a video using another video. Uses the ComfyUI-VideoHelperSuite which allows resize, skip frames, limit frames, etc.

Foreground input video:

v1.mp4

Background input video:

v2.mp4

Output using InSPyReNet Base model:

out.mp4

Model specific

Examples related to particular models. PDFNet uses a depth map and DiffDIS is a diffusion model repurposed for DIS.

01_PDFNet_simple: Shows how to use the PDFNet model and the automatically computed maps.
06_PDFNet_external_map: Shows how to use the PDFNet model and the externally computed maps.
01_Simple_DiffDIS: Basic use to get an RGBA image using DiffDIS model

Comparison

Example workflows showing how to compare the models.

07_PDFNet_vs_BiRefNet: Example to compare two models, in this case PDFNet vs BiRefNet
09_Compare_Models: Compares 10 models and generates an images showing the output from the 10 models. Needs Image Misc to compose the final image.

📝 Usage Notes

Informal explanation of the terms

I don't pretend to define them strictly, just to give you an idea of their meaning

DIS stands for Dichotomous Image Segmentation, is a technical term used for tasks where you separate an image in two different things, in particular the foreground and background. There are some specialized DIS tasks like COD and HRSOD
- SOD (Salient Object Detection) a term also used for this task, we want to separate the object in the foreground, the one that is "salient"
- HRSOD (High Resolution SOD) used when we want to get a highly detailed separation, preserving high detail of the boundary
- COD (Camouflage Object Detection) as its name implies here the object is camouflaged, making the task harder
- Matte: this term is used when we want to separate translucent objects, getting a mask that shows how much of the background is blended with the foreground
- Portrait: refers to human portraits, used for the task to separate a human from the background

Why so many models?

Each model has its own strengths, one might excel for an image and miserably fail for another.

There are many things that determines how well a model works:

Its architecture, how it tackles the task. This is the most technical issue, how to do it well, fast and using fewer resources
Its size, how many parameters are used. Bigger implementations of the same architecture might perform better, at the cost of time and resources
Its training, the dataset used and the mechanism used to guide the model

BiRefNet is a good example where you can see one architecture trained using:

Different sizes, Lite models uses less parameters
Different datasets, you'll find models trained for General DIS, COD, HRSOD and Matte tasks using specific datasets

You'll also find a version of this model trained by a company with a curated dataset, and perhaps some twist in the strategy: BRIA v2.0

Notes about the architectures

Some random notes you might find interesting:

Models using the Swin Transformer as backbone.
- BiRefNet Lite uses the Tiny size
- MVANet/BEN, InSPyReNet and PDFNet uses the Base size
- BiRefNet Full uses the Large size
BEN and BEN2 models uses the same architecture, the difference is in the training
- Both are basically MVANet with a few changes in activation functions and similar details
BRIA v1.4 is a U-2-Net model, small changes, trained with a proprietary dataset. Not for commercial use.
BRIA v2.0 is a BiRefNet model, again trained with a proprietary dataset and not for commercial use.
PDFNet uses a cleaver strategy: leverage the Depth Anything V2 power (using DINO v2 backbone and training) to assist the task. The cost is twice the time of similar models.
DiffDIS uses a completely different approach. This is the fast SD Turbo (1-4 steps diffusion model) repurposed for the DIS task. It processes two latents, one for the mask and the other for the edges. The image is generated in one step, but the cost in time and resources is huge when compared with the other models.
MODNet was designed for fast separation, is by far the fastest. The cost is that it isn't a general model. The available trained model is for portraits.
IS-Net is an evolution of U-2-Net
- These models uses a nested UNet (an UNet in each stage of the bigger UNet)
- No Swin Transformer (or ResNet) backbone
- Even when they are older than the current generation that uses Swin as backbone, it can deliver very good results with less resources

Resources comparison

This is not a formal benchmark, is just the result of a few tests using an RTX3060 with 12 GiB of VRAM on a system with 32 GiB of RAM.

Model	Time (ms)	Memory (MiB)	Image Size
MODNet Photo portrait	60	175	512
U-2-Net Base	147	371	320
IS-Net Base	196	776	1024
IS-Net BRIA v1.4	200	776	1024
MVANet General BEN2 F16	421	1605	1024
BiRefNet General F16	516	1592	1024
InSPyReNet Base 1.2.12	661	2910	1024
BiRefNet BRIA v2.0	1029	3181	1024
PDFNet Base	1684	3551	1024
DiffDIS Base F16	3109	6102	1024
DiffDIS Base F32	5249	5674	1024

Note that BRIA v2 and BiRefNet General F16 are the same architecture, but one is working on 32 bits and the other on 16 bits. The impact in speed and memory is very important. The 16 bits weights runs twice faster using half the memory.

In the DiffDIS this difference is not maintained and, for some reason, the 16 bits weights needs more memory, not sure why.

Also note that IS-Net models are much faster and needs much less memory than the rest, even when using a 1024x1024 image size.

Debug

Logging: 🔊 The nodes use Python's logging module. Debug messages can be helpful for understanding the transformations being applied. You can control log verbosity through ComfyUI's startup arguments (e.g., --preview-method auto --verbose DEBUG for more detailed ComfyUI logs which might also affect custom node loggers if they are configured to inherit levels). The logger name used is "RemoveBackground_SET". You can force debugging level for these nodes defining the REMOVEBACKGROUND_SET_NODES_DEBUG environment variable to 1 or 2.

📜 Project History

1.0.0 2025-10-23: Initial release

⚖️ License

GPL-3.0

🙏 Attributions

Good part of the initial code and this README was generated using Gemini 2.5 Pro.
I took various ideas from ComfyUI_BiRefNet_ll
These nodes contains the inference code for the models:
- BiRefNet: Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe
- Depth Anything: Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao (HKU/TikTok)
- DiffDIS: Qian Yu, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Bo Li, Lihe Zhang, Huchuan Lu
- Diffusers: The HuggingFace Team
- DINO: Meta AI Research
- InSPyReNet: Taehun Kim, Kunhee Kim, Joonyeong Lee, Dongmin Cha, Jiho Lee, Daijin Kim
- MODNet: Zhanghan Ke, Jiayu Sun, Kaican Li, Qiong Yan, Rynson W.H. Lau
- MVANet: Qian Yu, Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu
  - BEN: Maxwell Meyer and Jack Spruyt
- PDFNet: Xianjie Liu, Keren Fu, Qijun Zhao
- Swin: Ze Liu, Yutong Lin, Yixuan Wei
- U-2-Net: Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane and Martin Jagersand
  - IS-Net: Xuebin Qin, Hang Dai, Xiaobin Hu, Deng-Ping Fan, Ling Shao, Luc Van Gool
Code for Depth Anything v2 by Kijai (Jukka Seppänen)
All working together by Salvador E. Tropea

Name		Name	Last commit message	Last commit date
Latest commit History 302 Commits
.github		.github
doc		doc
example_workflows		example_workflows
info/dicts		info/dicts
src		src
subgraphs		subgraphs
tools		tools
.codespellignore		.codespellignore
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Uh oh!

License

set-soft/ComfyUI-RemoveBackground_SET

Folders and files

Latest commit

History

Repository files navigation

ComfyUI Remove Background nodes (SET)

⚙️ Main features

📜 Table of Contents

✨ Nodes

Loaders

Load RemBG model by file

Load XXXXXX model by name

Processing nodes

Remove background

Remove background (full)

Get background mask

Get background mask low level

Other nodes

Load Depth Anything by name

Depth Anything V2

🚀 Installation

📦 Dependencies

🖼️ Examples

Simple

More advanced

Video

Model specific

Comparison

📝 Usage Notes

Informal explanation of the terms

Why so many models?

Notes about the architectures

Resources comparison

Debug

📜 Project History

⚖️ License

🙏 Attributions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Languages

Packages