This repository provides a set of custom nodes for ComfyUI focused on background removal and or replacement.
Important
ComfyUI 0.3.48 is currently needed (Aug 1, 2025)
โ No bizarre extra dependencies, we use the same modules as ComfyUI
โ Warnings and errors visible in the browser, configurable debug information in the console
โ Support for BEN1/2, BiRefNet, BRIA 1.4/2, Depth Anything V2, DiffDIS, InSPyReNet, MODNet, MVANet, PDFNet, U-2-Net, IS-Net
โ Automatic model download (Only the SD Turbo VAE might be needed for DiffDIS)
- ๐ Installation
- ๐ฆ Dependencies
- ๐ผ๏ธ Examples
- Simple (01_Simple, 01_Change_Background)
- More advanced (02_Full_example, 03_Web_page_examples, 04_Advanced)
- Video (05_Video, 05_Video_Advanced)
- Model specific (01_PDFNet_simple, 06_PDFNet_external_map, 01_Simple_DiffDIS)
- Comparison (07_PDFNet_vs_BiRefNet, 09_Compare_Models)
- โจ Nodes
- ๐ Usage Notes
- ๐ Project History
- โ๏ธ License
- ๐ Attributions
The loaders are used to load a background removal model. We have a general loader that will look for models in the ComfyUI/models/rembg folder.
You can reconfigure this path using the RemBG_SET key in the extra_model_paths.yaml file of ComfyUI.
Note that we also look for the rembg and birefnet keys. If these keys aren't defined we assume they point to ComfyUI/models/rembg and
ComfyUI/models/BiRefNet. Also note that models downloaded to ~/.transparent-background/
(or ${TRANSPARENT_BACKGROUND_FILE_PATH'}.transparent-background/) will be also available.
In addition we have automatic downloaders for each supported model family.
- Display Name:
Load RemBG model by file - Internal Name:
LoadRembgByBiRefNetModel_SET - Category:
RemBG_SET/Load - Description: Loads a model from the
ComfyUI/models/rembgfolder, you can connect its output to any of the processing nodes - Purpose: Used for models that you already downloaded, or perhaps you trained.
- Inputs:
model(FILENAME): The name of the model in theComfyUI/models/rembgfolder, useRto refresh the listdevice(DEVICE): The device where the model will be executed. UsingAUTOyou'll use the default ComfyUI target (i.e. your GPU)dtype(DTYPE_OPS): Used to select the data type using during inference.AUTOmeans we will use the same data type as the model weights loaded from disk. Most of the models performs quite well on 16 bits floating point. You can force 16 bits to save VRAM, or even force to convert 16 bits values to 32 bits.vae(VAE, optional): Only needed for DiffDIS, you have to connect a "Load VAE" node here. The model needs the SD Turbo VAE, please look in the examples.positive(CONDITIONING, optional): Experimental and used only for DiffDIS. In practice you should leave it unconnected, the model was trained with an empty conditioning text.
- Output:
model(SET_REMBG): The loaded model, ready to be connected to a processing node
- Display Name:
Load XXXXXX model by name - Internal Name:
AutoDownloadXXXXXXModel_SET - Category:
RemBG_SET/Load - Description: Load a model of the XXXXXX family, if the model isn't on disk this is automatically downloaded. XXXXXX is one of the supported families (i.e. 'BiRefNet', 'MVANet/BEN', 'InSPyReNet', 'U-2-Net', 'IS-Net', 'MODNet', 'PDFNet', 'DiffDIS')
- Purpose: Download from internet and load to memory a model for background removal. The names are descriptive and says how big is the file.
- Inputs:
model(FILENAME): The descriptive name of the modeldevice(DEVICE): The device where the model will be executed. UsingAUTOyou'll use the default ComfyUI target (i.e. your GPU)dtype(DTYPE_OPS): Used to select the data type using during inference.AUTOmeans we will use the same data type as the model weights loaded from disk. Most of the models performs quite well on 16 bits floating point. You can force 16 bits to save VRAM, or even force to convert 16 bits values to 32 bits.vae(VAE, only for DiffDIS): You have to connect a "Load VAE" node here. The model needs the SD Turbo VAE, please look in the examples.positive(CONDITIONING, only for DiffDIS): Experimental. In practice you should leave it unconnected, the model was trained with an empty conditioning text.
- Output:
model(SET_REMBG): The loaded model, ready to be connected to a processing nodetrain_w(INT): Width of the images used during training. You should use this size for optimal results. Note that most models accepts any size multiple of 32. TheGeneral 2K LiteBiRefNet model was trained with some flexibility in the size. The DiffDIS is more restrictive. Only for manual pre-processing.train_h(INT): Height of the images used during training. You should use this size for optimal results. Note that most models accepts any size multiple of 32. TheGeneral 2K LiteBiRefNet model was trained with some flexibility in the size. The DiffDIS is more restrictive. Only for manual pre-processing.norm_params(NORM_PARAMS): Normalization parameters for the input images. This is needed only for advanced use when you want to manually pre-process the images. TheArbitrary Normalizenode from Image Misc can use these parameters to apply the correct normalization.
These nodes applies the loaded model to estimate the foreground object.
For normal use the simplest node is Remove background, it will generate an RGBA image with transparent background, or replace it using a provided image.
This simple node doesn't allow to change much options, so you could also want to use Remove background (full). But note this node will also consume more RAM.
What the models do is to generate a map where each pixel represents the estimated probability that it belongs to the foreground. This is mask that you can apply to remove the background.
The Get background mask node is oriented to just get this mask, no background removal or replacement.
If you want to play at the lowest level use the Get background mask low level. This node doesn't pre-process the input image and returns the mask without extra post-processing.
- Display Name:
Remove background - Internal Name:
RembgByBiRefNet_SET - Category:
RemBG_SET/Basic - Description: Removes or replaces the background from the input image.
- Inputs:
model(SET_REMBG): The model to useimages(IMAGE): One or more images to process, will be scaled to a size that is good for the modelbatch_size(INT): How many images will be processed at once. Useful for videos, depending on the model and the GPU you have this can make things faster. Consumes more VRAM. Note that for boards like RTX3060 is better to just use 1.depths(MASKoptional): Can be used for PDFNet to provide externally computed depth mapsbackground(IMAGEoptional): Image to use as background, will be scaled to the size ofimages. If you don't provide an image the output will be an RGBA image with transparency. Note that this can be 1 or more images. Ifimagesis a video this input can be another video of the same number of frames.out_dtype(AUTO,float32,float16): Which data type will be used for the output image. UsingAUTOis recommended. Can be used to save RAM when processing long videos, usefloat16.
- Output:
images(IMAGE): The images with the background removed (transparent) or replaced by the background image
- Display Name:
Remove background (full) - Internal Name:
RembgByBiRefNetAdvanced_SET - Category:
RemBG_SET/Advanced - Description: Removes or replaces the background from the input image. Gives more options and also generates masks and other stuff.
- Inputs:
model(SET_REMBG): The model to useimages(IMAGE): One or more images to process, will be scaled to a size that is good for the modelwidth(INT): The width to scale the image before applying the model. Should be supported by the model. Usually is thetrain_wfromLoad XXXXXX model by nameheight(INT): The height to scale the image before applying the model. Should be supported by the model. Usually is thetrain_hfromLoad XXXXXX model by nameupscale_method(area,bicubic,nearest-exact,bilinear,lanczos): Which algorithm will be used to scale the image to and from the model size. Usuallybicubicis a good choice.blur_size(INT): Diameter for the coarse gaussian blur used for the Approximate Fast Foreground Colour Estimation.blur_size_two(INT): Diameter for the fine gaussian blur (seeblur_size)fill_color(BOOLEAN): When enabled and no background is provided we fill the background using a color.color(STRING): Color to use when filling the background. You can specify it in multiple ways, even by name, more heremask_threshold(FLOAT): Most models generates masks that contain a value from 0 to 1, but can be any value in between. Matte models can estimate transparency using it. If you need to make the mask 0 or 1, but nothing in between, you can provide a threshold here. Values above it will become 1 and the rest 0.batch_size(INT): How many images will be processed at once. Useful for videos, depending on the model and the GPU you have this can make things faster. Consumes more VRAM. Note that for boards like RTX3060 is better to just use 1.depths(MASKoptional): Can be used for PDFNet to provide externally computed depth maps. This is the map generated byDepth Anything V2background(IMAGEoptional): Image to use as background, will be scaled to the size ofimages. If you don't provide an image the output will be an RGBA image with transparency. Note that this can be 1 or more images. Ifimagesis a video this input can be another video of the same number of frames.out_dtype(AUTO,float32,float16): Which data type will be used for the output image. UsingAUTOis recommended. Can be used to save RAM when processing long videos, usefloat16.
- Output:
images(IMAGE): The images with the background removed (transparent) or replaced by the background imagemasks(MASK): The estimated masks, where a higher value means the model estimates it belongs to the foreground with more confidence.depths(MASK): The estimated depth map. Either from thedepthsinput or computed. Note this applies only to PDFNet. This is the map generated byDepth Anything V2edges(MASK): The estimated edges. This is only generated by the DiffDIS model.
- Display Name:
Get background mask - Internal Name:
GetMaskByBiRefNet_SET - Category:
RemBG_SET/Basic - Description: Computes the foreground mask. It normalizes the input images, scales them to the model size, computes the masks and then scales the masks to the image size. No background removal/replacement is done.
- Inputs:
model(SET_REMBG): The model to useimages(IMAGE): One or more images to process, will be scaled to a size that is good for the modelwidth(INT): The width to scale the image before applying the model. Should be supported by the model. Usually is thetrain_wfromLoad XXXXXX model by nameheight(INT): The height to scale the image before applying the model. Should be supported by the model. Usually is thetrain_hfromLoad XXXXXX model by nameupscale_method(area,bicubic,nearest-exact,bilinear,lanczos): Which algorithm will be used to scale the image to and from the model size. Usuallybicubicis a good choice.mask_threshold(FLOAT): Most models generates masks that contain a value from 0 to 1, but can be any value in between. Matte models can estimate transparency using it. If you need to make the mask 0 or 1, but nothing in between, you can provide a threshold here. Values above it will become 1 and the rest 0.batch_size(INT): How many images will be processed at once. Useful for videos, depending on the model and the GPU you have this can make things faster. Consumes more VRAM. Note that for boards like RTX3060 is better to just use 1.depths(MASKoptional): Can be used for PDFNet to provide externally computed depth maps. This is the map generated byDepth Anything V2out_dtype(AUTO,float32,float16): Which data type will be used for the output image. UsingAUTOis recommended. Can be used to save RAM when processing long videos, usefloat16.
- Output:
masks(MASK): The estimated masks, where a higher value means the model estimates it belongs to the foreground with more confidence.depths(MASK): The estimated depth map. Either from thedepthsinput or computed. Note this applies only to PDFNet. This is the map generated byDepth Anything V2edges(MASK): The estimated edges. This is only generated by the DiffDIS model.
- Display Name:
Get background mask low level - Internal Name:
GetMaskLowByBiRefNet_SET - Category:
RemBG_SET/Advanced - Description: Computes the foreground mask. No pre or post processing is applied, you must do it outside the node.
- Inputs:
model(SET_REMBG): The model to useimages(IMAGE): One or more images to process, they must be normalized to a range that is good for the model. Their size must be similar to the size used to train the model.batch_size(INT): How many images will be processed at once. Useful for videos, depending on the model and the GPU you have this can make things faster. Consumes more VRAM. Note that for boards like RTX3060 is better to just use 1.depths(MASKoptional): Can be used for PDFNet to provide externally computed depth maps. This is the map generated byDepth Anything V2out_dtype(AUTO,float32,float16): Which data type will be used for the output image. UsingAUTOis recommended. Can be used to save RAM when processing long videos, usefloat16.
- Output:
masks(MASK): The estimated masks, where a higher value means the model estimates it belongs to the foreground with more confidence.depths(MASK): The estimated depth map. Either from thedepthsinput or computed. Note this applies only to PDFNet. This is the map generated byDepth Anything V2edges(MASK): The estimated edges. This is only generated by the DiffDIS model.
The PDFNet model is a special case, instead of just using the image it also uses an estimation of the depth of the image performed using Depth Anything V2. In order to allow automatic computation of the depth maps I took Kijai nodes and adapted them to this use.
- Display Name:
Load Depth Anything by name - Internal Name:
DownloadAndLoadDepthAnythingV2Model_SET - Category:
RemBG_SET/Load - Description: Downloads and loads to memory one of the Depth Anything V2 models.
- Inputs:
model(STRING): The name of the model to use. Small, Base and Large are available in 16 and 32 bits. The 16 bits version works quite well. PDFNet was trained using the Base version.
- Output:
da_v2_model(DAMODEL): The model ready to be used.
- Display Name:
Depth Anything V2 - Internal Name:
DepthAnything_V2_SET - Category:
RemBG_SET/Advanced - Description: Computes an estimated depth map of the image, larger values means the pixel is closer to the camera
- Inputs:
da_model(DAMODEL): The model from theLoad Depth Anything by namenode.images(IMAGE): One or more images to process, will be normalized and scaled.batch_size(INT): How many images will be processed at once.
- Output:
depths(MASK): The depth mapdepth_imgs(IMAGE): The same map in a format compatible with nodes that needs an image. The three channels (R, G, B) are the same buffer, shared with the mask.
You can install the nodes from the ComfyUI nodes manager, the name is Remove Background (SET) (remove-background), or just do it manually:
- Clone this repository into your
ComfyUI/custom_nodes/directory:cd ComfyUI/custom_nodes/ git clone https://github.com/set-soft/ComfyUI-RemoveBackground_SET ComfyUI-RemoveBackground_SET - Install dependencies:
pip install -r ComfyUI/custom_nodes/ComfyUI-RemoveBackground_SET/requirements.txt
Important
SeCoNoHe lib is developed in parallel with my nodes, when installing the nodes from the repo you might need to install
a fresh copy of SeCoNoHe. pip install git+https://github.com/set-soft/seconohe.git
- Restart ComfyUI.
The nodes should then appear under the "RemBG_SET" category in the "Add Node" menu.
- SeCoNoHe (seconohe): This is just some functionality I wrote shared by my nodes, only depends on ComfyUI.
- PyTorch: Installed by ComfyUI
- einops: Installed by ComfyUI
- kornia: Installed by ComfyUI
- safetensors: Installed by ComfyUI
- Requests (optional): Usually an indirect ComfyUI dependency. If installed it will be used for downloads, it should be more robust than then built-in
urllib, used as fallback. - Colorama (optional): Might help to get colored log messages on some terminals. We use ANSI escape sequences when it isn't installed.
Once installed the examples are available in the ComfyUI workflow templates, in the remove-background section (or ComfyUI-RemoveBackground_SET).
These examples shows how to remove the background, obtaining an image with transparency, or replacing it with an image. Note that RGBA images, the ones with transparency, aren't supported by all nodes. The correct way to handle them is to have the image and a mask, but using RGBA is what most background removal tools do.
- 01_Simple: Basic use to get an RGBA image
- 01_Change_Background: Basic example showing how to replace the background of an image
These examples show how to have more control over the process.
- 02_Full_example: Shows how to use the full node to get an RGBA image. Needs Image Misc to download the example image.
- 03_Web_page_examples: Allows comparing the result with the original image. Downloads the BiRefNet exmamples. Needs Image Misc to download the example images and rgthree-comfy to compare the images.

- 04_Advanced: Shows how to do custom pre and post processing, including filling with a color, background image replacement and object highlight. Needs Image Misc and rgthree-comfy to compare the images.

- 04_Advanced_subgraphs: Same as the
04_Advancedbut using subgraphs.

- 08_Batch: Shows how to process multiple images in the same run. Needs Image Misc and rgthree-comfy to compare the images.
Examples for video processing, using ComfyUI video nodes and advanced ComfyUI-VideoHelperSuite nodes.
- 05_Video: Simple video workflow to replace the background of a video using a still image. Uses the Comfy-Core nodes.
- 05_Video_Advanced: Video workflow to replace the background of a video using another video. Uses the ComfyUI-VideoHelperSuite which allows resize, skip frames, limit frames, etc.

Foreground input video:
v1.mp4
Background input video:
v2.mp4
Output using InSPyReNet Base model:
out.mp4
Examples related to particular models. PDFNet uses a depth map and DiffDIS is a diffusion model repurposed for DIS.
- 01_PDFNet_simple: Shows how to use the PDFNet model and the automatically computed maps.
- 06_PDFNet_external_map: Shows how to use the PDFNet model and the externally computed maps.
- 01_Simple_DiffDIS: Basic use to get an RGBA image using DiffDIS model
Example workflows showing how to compare the models.
- 07_PDFNet_vs_BiRefNet: Example to compare two models, in this case PDFNet vs BiRefNet

- 09_Compare_Models: Compares 10 models and generates an images showing the output from the 10 models. Needs Image Misc to compose the final image.


I don't pretend to define them strictly, just to give you an idea of their meaning
- DIS stands for Dichotomous Image Segmentation, is a technical term used for tasks where you separate an image in two different things, in particular the foreground and background.
There are some specialized DIS tasks like COD and HRSOD
- SOD (Salient Object Detection) a term also used for this task, we want to separate the object in the foreground, the one that is "salient"
- HRSOD (High Resolution SOD) used when we want to get a highly detailed separation, preserving high detail of the boundary
- COD (Camouflage Object Detection) as its name implies here the object is camouflaged, making the task harder
- Matte: this term is used when we want to separate translucent objects, getting a mask that shows how much of the background is blended with the foreground
- Portrait: refers to human portraits, used for the task to separate a human from the background
Each model has its own strengths, one might excel for an image and miserably fail for another.
There are many things that determines how well a model works:
- Its architecture, how it tackles the task. This is the most technical issue, how to do it well, fast and using fewer resources
- Its size, how many parameters are used. Bigger implementations of the same architecture might perform better, at the cost of time and resources
- Its training, the dataset used and the mechanism used to guide the model
BiRefNet is a good example where you can see one architecture trained using:
- Different sizes, Lite models uses less parameters
- Different datasets, you'll find models trained for General DIS, COD, HRSOD and Matte tasks using specific datasets
You'll also find a version of this model trained by a company with a curated dataset, and perhaps some twist in the strategy: BRIA v2.0
Some random notes you might find interesting:
- Models using the Swin Transformer as backbone.
- BiRefNet Lite uses the Tiny size
- MVANet/BEN, InSPyReNet and PDFNet uses the Base size
- BiRefNet Full uses the Large size
- BEN and BEN2 models uses the same architecture, the difference is in the training
- Both are basically MVANet with a few changes in activation functions and similar details
- BRIA v1.4 is a U-2-Net model, small changes, trained with a proprietary dataset. Not for commercial use.
- BRIA v2.0 is a BiRefNet model, again trained with a proprietary dataset and not for commercial use.
- PDFNet uses a cleaver strategy: leverage the Depth Anything V2 power (using DINO v2 backbone and training) to assist the task. The cost is twice the time of similar models.
- DiffDIS uses a completely different approach. This is the fast SD Turbo (1-4 steps diffusion model) repurposed for the DIS task. It processes two latents, one for the mask and the other for the edges. The image is generated in one step, but the cost in time and resources is huge when compared with the other models.
- MODNet was designed for fast separation, is by far the fastest. The cost is that it isn't a general model. The available trained model is for portraits.
- IS-Net is an evolution of U-2-Net
- These models uses a nested UNet (an UNet in each stage of the bigger UNet)
- No Swin Transformer (or ResNet) backbone
- Even when they are older than the current generation that uses Swin as backbone, it can deliver very good results with less resources
This is not a formal benchmark, is just the result of a few tests using an RTX3060 with 12 GiB of VRAM on a system with 32 GiB of RAM.
| Model | Time (ms) | Memory (MiB) | Image Size |
|---|---|---|---|
| MODNet Photo portrait | 60 | 175 | 512 |
| U-2-Net Base | 147 | 371 | 320 |
| IS-Net Base | 196 | 776 | 1024 |
| IS-Net BRIA v1.4 | 200 | 776 | 1024 |
| MVANet General BEN2 F16 | 421 | 1605 | 1024 |
| BiRefNet General F16 | 516 | 1592 | 1024 |
| InSPyReNet Base 1.2.12 | 661 | 2910 | 1024 |
| BiRefNet BRIA v2.0 | 1029 | 3181 | 1024 |
| PDFNet Base | 1684 | 3551 | 1024 |
| DiffDIS Base F16 | 3109 | 6102 | 1024 |
| DiffDIS Base F32 | 5249 | 5674 | 1024 |
Note that BRIA v2 and BiRefNet General F16 are the same architecture, but one is working on 32 bits and the other on 16 bits. The impact in speed and memory is very important.
The 16 bits weights runs twice faster using half the memory.
In the DiffDIS this difference is not maintained and, for some reason, the 16 bits weights needs more memory, not sure why.
Also note that IS-Net models are much faster and needs much less memory than the rest, even when using a 1024x1024 image size.
- Logging: ๐ The nodes use Python's
loggingmodule. Debug messages can be helpful for understanding the transformations being applied. You can control log verbosity through ComfyUI's startup arguments (e.g.,--preview-method auto --verbose DEBUGfor more detailed ComfyUI logs which might also affect custom node loggers if they are configured to inherit levels). The logger name used is "RemoveBackground_SET". You can force debugging level for these nodes defining theREMOVEBACKGROUND_SET_NODES_DEBUGenvironment variable to1or2.
- 1.0.0 2025-10-23: Initial release
- Good part of the initial code and this README was generated using Gemini 2.5 Pro.
- I took various ideas from ComfyUI_BiRefNet_ll
- These nodes contains the inference code for the models:
- BiRefNet: Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe
- Depth Anything: Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao (HKU/TikTok)
- DiffDIS: Qian Yu, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Bo Li, Lihe Zhang, Huchuan Lu
- Diffusers: The HuggingFace Team
- DINO: Meta AI Research
- InSPyReNet: Taehun Kim, Kunhee Kim, Joonyeong Lee, Dongmin Cha, Jiho Lee, Daijin Kim
- MODNet: Zhanghan Ke, Jiayu Sun, Kaican Li, Qiong Yan, Rynson W.H. Lau
- MVANet: Qian Yu, Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu
- BEN: Maxwell Meyer and Jack Spruyt
- PDFNet: Xianjie Liu, Keren Fu, Qijun Zhao
- Swin: Ze Liu, Yutong Lin, Yixuan Wei
- U-2-Net: Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane and Martin Jagersand
- IS-Net: Xuebin Qin, Hang Dai, Xiaobin Hu, Deng-Ping Fan, Ling Shao, Luc Van Gool
- Code for Depth Anything v2 by Kijai (Jukka Seppรคnen)
- All working together by Salvador E. Tropea
