Skip to content

set-soft/ComfyUI-RemoveBackground_SET

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

302 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ComfyUI Remove Background nodes (SET)

This repository provides a set of custom nodes for ComfyUI focused on background removal and or replacement.

Remove Background

Important

ComfyUI 0.3.48 is currently needed (Aug 1, 2025)

โš™๏ธ Main features

โœ… No bizarre extra dependencies, we use the same modules as ComfyUI

โœ… Warnings and errors visible in the browser, configurable debug information in the console

โœ… Support for BEN1/2, BiRefNet, BRIA 1.4/2, Depth Anything V2, DiffDIS, InSPyReNet, MODNet, MVANet, PDFNet, U-2-Net, IS-Net

โœ… Automatic model download (Only the SD Turbo VAE might be needed for DiffDIS)

๐Ÿ“œ Table of Contents

โœจ Nodes

Loaders

The loaders are used to load a background removal model. We have a general loader that will look for models in the ComfyUI/models/rembg folder. You can reconfigure this path using the RemBG_SET key in the extra_model_paths.yaml file of ComfyUI.

Note that we also look for the rembg and birefnet keys. If these keys aren't defined we assume they point to ComfyUI/models/rembg and ComfyUI/models/BiRefNet. Also note that models downloaded to ~/.transparent-background/ (or ${TRANSPARENT_BACKGROUND_FILE_PATH'}.transparent-background/) will be also available.

In addition we have automatic downloaders for each supported model family.

Load RemBG model by file

  • Display Name: Load RemBG model by file
  • Internal Name: LoadRembgByBiRefNetModel_SET
  • Category: RemBG_SET/Load
  • Description: Loads a model from the ComfyUI/models/rembg folder, you can connect its output to any of the processing nodes
  • Purpose: Used for models that you already downloaded, or perhaps you trained.
  • Inputs:
    • model (FILENAME): The name of the model in the ComfyUI/models/rembg folder, use R to refresh the list
    • device (DEVICE): The device where the model will be executed. Using AUTO you'll use the default ComfyUI target (i.e. your GPU)
    • dtype (DTYPE_OPS): Used to select the data type using during inference. AUTO means we will use the same data type as the model weights loaded from disk. Most of the models performs quite well on 16 bits floating point. You can force 16 bits to save VRAM, or even force to convert 16 bits values to 32 bits.
    • vae (VAE, optional): Only needed for DiffDIS, you have to connect a "Load VAE" node here. The model needs the SD Turbo VAE, please look in the examples.
    • positive (CONDITIONING, optional): Experimental and used only for DiffDIS. In practice you should leave it unconnected, the model was trained with an empty conditioning text.
  • Output:
    • model (SET_REMBG): The loaded model, ready to be connected to a processing node

Load XXXXXX model by name

  • Display Name: Load XXXXXX model by name
  • Internal Name: AutoDownloadXXXXXXModel_SET
  • Category: RemBG_SET/Load
  • Description: Load a model of the XXXXXX family, if the model isn't on disk this is automatically downloaded. XXXXXX is one of the supported families (i.e. 'BiRefNet', 'MVANet/BEN', 'InSPyReNet', 'U-2-Net', 'IS-Net', 'MODNet', 'PDFNet', 'DiffDIS')
  • Purpose: Download from internet and load to memory a model for background removal. The names are descriptive and says how big is the file.
  • Inputs:
    • model (FILENAME): The descriptive name of the model
    • device (DEVICE): The device where the model will be executed. Using AUTO you'll use the default ComfyUI target (i.e. your GPU)
    • dtype (DTYPE_OPS): Used to select the data type using during inference. AUTO means we will use the same data type as the model weights loaded from disk. Most of the models performs quite well on 16 bits floating point. You can force 16 bits to save VRAM, or even force to convert 16 bits values to 32 bits.
    • vae (VAE, only for DiffDIS): You have to connect a "Load VAE" node here. The model needs the SD Turbo VAE, please look in the examples.
    • positive (CONDITIONING, only for DiffDIS): Experimental. In practice you should leave it unconnected, the model was trained with an empty conditioning text.
  • Output:
    • model (SET_REMBG): The loaded model, ready to be connected to a processing node
    • train_w (INT): Width of the images used during training. You should use this size for optimal results. Note that most models accepts any size multiple of 32. The General 2K Lite BiRefNet model was trained with some flexibility in the size. The DiffDIS is more restrictive. Only for manual pre-processing.
    • train_h (INT): Height of the images used during training. You should use this size for optimal results. Note that most models accepts any size multiple of 32. The General 2K Lite BiRefNet model was trained with some flexibility in the size. The DiffDIS is more restrictive. Only for manual pre-processing.
    • norm_params (NORM_PARAMS): Normalization parameters for the input images. This is needed only for advanced use when you want to manually pre-process the images. The Arbitrary Normalize node from Image Misc can use these parameters to apply the correct normalization.

Processing nodes

These nodes applies the loaded model to estimate the foreground object. For normal use the simplest node is Remove background, it will generate an RGBA image with transparent background, or replace it using a provided image.

This simple node doesn't allow to change much options, so you could also want to use Remove background (full). But note this node will also consume more RAM.

What the models do is to generate a map where each pixel represents the estimated probability that it belongs to the foreground. This is mask that you can apply to remove the background. The Get background mask node is oriented to just get this mask, no background removal or replacement.

If you want to play at the lowest level use the Get background mask low level. This node doesn't pre-process the input image and returns the mask without extra post-processing.

Remove background

  • Display Name: Remove background
  • Internal Name: RembgByBiRefNet_SET
  • Category: RemBG_SET/Basic
  • Description: Removes or replaces the background from the input image.
  • Inputs:
    • model (SET_REMBG): The model to use
    • images (IMAGE): One or more images to process, will be scaled to a size that is good for the model
    • batch_size (INT): How many images will be processed at once. Useful for videos, depending on the model and the GPU you have this can make things faster. Consumes more VRAM. Note that for boards like RTX3060 is better to just use 1.
    • depths (MASK optional): Can be used for PDFNet to provide externally computed depth maps
    • background (IMAGE optional): Image to use as background, will be scaled to the size of images. If you don't provide an image the output will be an RGBA image with transparency. Note that this can be 1 or more images. If images is a video this input can be another video of the same number of frames.
    • out_dtype (AUTO, float32, float16): Which data type will be used for the output image. Using AUTO is recommended. Can be used to save RAM when processing long videos, use float16.
  • Output:
    • images (IMAGE): The images with the background removed (transparent) or replaced by the background image

Remove background (full)

  • Display Name: Remove background (full)
  • Internal Name: RembgByBiRefNetAdvanced_SET
  • Category: RemBG_SET/Advanced
  • Description: Removes or replaces the background from the input image. Gives more options and also generates masks and other stuff.
  • Inputs:
    • model (SET_REMBG): The model to use
    • images (IMAGE): One or more images to process, will be scaled to a size that is good for the model
    • width (INT): The width to scale the image before applying the model. Should be supported by the model. Usually is the train_w from Load XXXXXX model by name
    • height (INT): The height to scale the image before applying the model. Should be supported by the model. Usually is the train_h from Load XXXXXX model by name
    • upscale_method (area, bicubic, nearest-exact, bilinear, lanczos): Which algorithm will be used to scale the image to and from the model size. Usually bicubic is a good choice.
    • blur_size (INT): Diameter for the coarse gaussian blur used for the Approximate Fast Foreground Colour Estimation.
    • blur_size_two (INT): Diameter for the fine gaussian blur (see blur_size)
    • fill_color (BOOLEAN): When enabled and no background is provided we fill the background using a color.
    • color (STRING): Color to use when filling the background. You can specify it in multiple ways, even by name, more here
    • mask_threshold (FLOAT): Most models generates masks that contain a value from 0 to 1, but can be any value in between. Matte models can estimate transparency using it. If you need to make the mask 0 or 1, but nothing in between, you can provide a threshold here. Values above it will become 1 and the rest 0.
    • batch_size (INT): How many images will be processed at once. Useful for videos, depending on the model and the GPU you have this can make things faster. Consumes more VRAM. Note that for boards like RTX3060 is better to just use 1.
    • depths (MASK optional): Can be used for PDFNet to provide externally computed depth maps. This is the map generated by Depth Anything V2
    • background (IMAGE optional): Image to use as background, will be scaled to the size of images. If you don't provide an image the output will be an RGBA image with transparency. Note that this can be 1 or more images. If images is a video this input can be another video of the same number of frames.
    • out_dtype (AUTO, float32, float16): Which data type will be used for the output image. Using AUTO is recommended. Can be used to save RAM when processing long videos, use float16.
  • Output:
    • images (IMAGE): The images with the background removed (transparent) or replaced by the background image
    • masks (MASK): The estimated masks, where a higher value means the model estimates it belongs to the foreground with more confidence.
    • depths (MASK): The estimated depth map. Either from the depths input or computed. Note this applies only to PDFNet. This is the map generated by Depth Anything V2
    • edges (MASK): The estimated edges. This is only generated by the DiffDIS model.

Get background mask

  • Display Name: Get background mask
  • Internal Name: GetMaskByBiRefNet_SET
  • Category: RemBG_SET/Basic
  • Description: Computes the foreground mask. It normalizes the input images, scales them to the model size, computes the masks and then scales the masks to the image size. No background removal/replacement is done.
  • Inputs:
    • model (SET_REMBG): The model to use
    • images (IMAGE): One or more images to process, will be scaled to a size that is good for the model
    • width (INT): The width to scale the image before applying the model. Should be supported by the model. Usually is the train_w from Load XXXXXX model by name
    • height (INT): The height to scale the image before applying the model. Should be supported by the model. Usually is the train_h from Load XXXXXX model by name
    • upscale_method (area, bicubic, nearest-exact, bilinear, lanczos): Which algorithm will be used to scale the image to and from the model size. Usually bicubic is a good choice.
    • mask_threshold (FLOAT): Most models generates masks that contain a value from 0 to 1, but can be any value in between. Matte models can estimate transparency using it. If you need to make the mask 0 or 1, but nothing in between, you can provide a threshold here. Values above it will become 1 and the rest 0.
    • batch_size (INT): How many images will be processed at once. Useful for videos, depending on the model and the GPU you have this can make things faster. Consumes more VRAM. Note that for boards like RTX3060 is better to just use 1.
    • depths (MASK optional): Can be used for PDFNet to provide externally computed depth maps. This is the map generated by Depth Anything V2
    • out_dtype (AUTO, float32, float16): Which data type will be used for the output image. Using AUTO is recommended. Can be used to save RAM when processing long videos, use float16.
  • Output:
    • masks (MASK): The estimated masks, where a higher value means the model estimates it belongs to the foreground with more confidence.
    • depths (MASK): The estimated depth map. Either from the depths input or computed. Note this applies only to PDFNet. This is the map generated by Depth Anything V2
    • edges (MASK): The estimated edges. This is only generated by the DiffDIS model.

Get background mask low level

  • Display Name: Get background mask low level
  • Internal Name: GetMaskLowByBiRefNet_SET
  • Category: RemBG_SET/Advanced
  • Description: Computes the foreground mask. No pre or post processing is applied, you must do it outside the node.
  • Inputs:
    • model (SET_REMBG): The model to use
    • images (IMAGE): One or more images to process, they must be normalized to a range that is good for the model. Their size must be similar to the size used to train the model.
    • batch_size (INT): How many images will be processed at once. Useful for videos, depending on the model and the GPU you have this can make things faster. Consumes more VRAM. Note that for boards like RTX3060 is better to just use 1.
    • depths (MASK optional): Can be used for PDFNet to provide externally computed depth maps. This is the map generated by Depth Anything V2
    • out_dtype (AUTO, float32, float16): Which data type will be used for the output image. Using AUTO is recommended. Can be used to save RAM when processing long videos, use float16.
  • Output:
    • masks (MASK): The estimated masks, where a higher value means the model estimates it belongs to the foreground with more confidence.
    • depths (MASK): The estimated depth map. Either from the depths input or computed. Note this applies only to PDFNet. This is the map generated by Depth Anything V2
    • edges (MASK): The estimated edges. This is only generated by the DiffDIS model.

Other nodes

The PDFNet model is a special case, instead of just using the image it also uses an estimation of the depth of the image performed using Depth Anything V2. In order to allow automatic computation of the depth maps I took Kijai nodes and adapted them to this use.

Load Depth Anything by name

  • Display Name: Load Depth Anything by name
  • Internal Name: DownloadAndLoadDepthAnythingV2Model_SET
  • Category: RemBG_SET/Load
  • Description: Downloads and loads to memory one of the Depth Anything V2 models.
  • Inputs:
    • model (STRING): The name of the model to use. Small, Base and Large are available in 16 and 32 bits. The 16 bits version works quite well. PDFNet was trained using the Base version.
  • Output:
    • da_v2_model (DAMODEL): The model ready to be used.

Depth Anything V2

  • Display Name: Depth Anything V2
  • Internal Name: DepthAnything_V2_SET
  • Category: RemBG_SET/Advanced
  • Description: Computes an estimated depth map of the image, larger values means the pixel is closer to the camera
  • Inputs:
    • da_model (DAMODEL): The model from the Load Depth Anything by name node.
    • images (IMAGE): One or more images to process, will be normalized and scaled.
    • batch_size (INT): How many images will be processed at once.
  • Output:
    • depths (MASK): The depth map
    • depth_imgs (IMAGE): The same map in a format compatible with nodes that needs an image. The three channels (R, G, B) are the same buffer, shared with the mask.

๐Ÿš€ Installation

You can install the nodes from the ComfyUI nodes manager, the name is Remove Background (SET) (remove-background), or just do it manually:

  1. Clone this repository into your ComfyUI/custom_nodes/ directory:
    cd ComfyUI/custom_nodes/
    git clone https://github.com/set-soft/ComfyUI-RemoveBackground_SET ComfyUI-RemoveBackground_SET
  2. Install dependencies: pip install -r ComfyUI/custom_nodes/ComfyUI-RemoveBackground_SET/requirements.txt

Important

SeCoNoHe lib is developed in parallel with my nodes, when installing the nodes from the repo you might need to install a fresh copy of SeCoNoHe. pip install git+https://github.com/set-soft/seconohe.git

  1. Restart ComfyUI.

The nodes should then appear under the "RemBG_SET" category in the "Add Node" menu.

๐Ÿ“ฆ Dependencies

  • SeCoNoHe (seconohe): This is just some functionality I wrote shared by my nodes, only depends on ComfyUI.
  • PyTorch: Installed by ComfyUI
  • einops: Installed by ComfyUI
  • kornia: Installed by ComfyUI
  • safetensors: Installed by ComfyUI
  • Requests (optional): Usually an indirect ComfyUI dependency. If installed it will be used for downloads, it should be more robust than then built-in urllib, used as fallback.
  • Colorama (optional): Might help to get colored log messages on some terminals. We use ANSI escape sequences when it isn't installed.

๐Ÿ–ผ๏ธ Examples

Once installed the examples are available in the ComfyUI workflow templates, in the remove-background section (or ComfyUI-RemoveBackground_SET).

Simple

These examples shows how to remove the background, obtaining an image with transparency, or replacing it with an image. Note that RGBA images, the ones with transparency, aren't supported by all nodes. The correct way to handle them is to have the image and a mask, but using RGBA is what most background removal tools do.

More advanced

These examples show how to have more control over the process.

Video

Examples for video processing, using ComfyUI video nodes and advanced ComfyUI-VideoHelperSuite nodes.

  • 05_Video: Simple video workflow to replace the background of a video using a still image. Uses the Comfy-Core nodes.
  • 05_Video_Advanced: Video workflow to replace the background of a video using another video. Uses the ComfyUI-VideoHelperSuite which allows resize, skip frames, limit frames, etc.
    05_Video_Advanced

Foreground input video:

v1.mp4

Background input video:

v2.mp4

Output using InSPyReNet Base model:

out.mp4

Model specific

Examples related to particular models. PDFNet uses a depth map and DiffDIS is a diffusion model repurposed for DIS.

Comparison

Example workflows showing how to compare the models.

๐Ÿ“ Usage Notes

Informal explanation of the terms

I don't pretend to define them strictly, just to give you an idea of their meaning

  • DIS stands for Dichotomous Image Segmentation, is a technical term used for tasks where you separate an image in two different things, in particular the foreground and background. There are some specialized DIS tasks like COD and HRSOD
    • SOD (Salient Object Detection) a term also used for this task, we want to separate the object in the foreground, the one that is "salient"
    • HRSOD (High Resolution SOD) used when we want to get a highly detailed separation, preserving high detail of the boundary
    • COD (Camouflage Object Detection) as its name implies here the object is camouflaged, making the task harder
    • Matte: this term is used when we want to separate translucent objects, getting a mask that shows how much of the background is blended with the foreground
    • Portrait: refers to human portraits, used for the task to separate a human from the background

Why so many models?

Each model has its own strengths, one might excel for an image and miserably fail for another.

There are many things that determines how well a model works:

  • Its architecture, how it tackles the task. This is the most technical issue, how to do it well, fast and using fewer resources
  • Its size, how many parameters are used. Bigger implementations of the same architecture might perform better, at the cost of time and resources
  • Its training, the dataset used and the mechanism used to guide the model

BiRefNet is a good example where you can see one architecture trained using:

  • Different sizes, Lite models uses less parameters
  • Different datasets, you'll find models trained for General DIS, COD, HRSOD and Matte tasks using specific datasets

You'll also find a version of this model trained by a company with a curated dataset, and perhaps some twist in the strategy: BRIA v2.0

Notes about the architectures

Some random notes you might find interesting:

  • Models using the Swin Transformer as backbone.
    • BiRefNet Lite uses the Tiny size
    • MVANet/BEN, InSPyReNet and PDFNet uses the Base size
    • BiRefNet Full uses the Large size
  • BEN and BEN2 models uses the same architecture, the difference is in the training
    • Both are basically MVANet with a few changes in activation functions and similar details
  • BRIA v1.4 is a U-2-Net model, small changes, trained with a proprietary dataset. Not for commercial use.
  • BRIA v2.0 is a BiRefNet model, again trained with a proprietary dataset and not for commercial use.
  • PDFNet uses a cleaver strategy: leverage the Depth Anything V2 power (using DINO v2 backbone and training) to assist the task. The cost is twice the time of similar models.
  • DiffDIS uses a completely different approach. This is the fast SD Turbo (1-4 steps diffusion model) repurposed for the DIS task. It processes two latents, one for the mask and the other for the edges. The image is generated in one step, but the cost in time and resources is huge when compared with the other models.
  • MODNet was designed for fast separation, is by far the fastest. The cost is that it isn't a general model. The available trained model is for portraits.
  • IS-Net is an evolution of U-2-Net
    • These models uses a nested UNet (an UNet in each stage of the bigger UNet)
    • No Swin Transformer (or ResNet) backbone
    • Even when they are older than the current generation that uses Swin as backbone, it can deliver very good results with less resources

Resources comparison

This is not a formal benchmark, is just the result of a few tests using an RTX3060 with 12 GiB of VRAM on a system with 32 GiB of RAM.

Model Time (ms) Memory (MiB) Image Size
MODNet Photo portrait 60 175 512
U-2-Net Base 147 371 320
IS-Net Base 196 776 1024
IS-Net BRIA v1.4 200 776 1024
MVANet General BEN2 F16 421 1605 1024
BiRefNet General F16 516 1592 1024
InSPyReNet Base 1.2.12 661 2910 1024
BiRefNet BRIA v2.0 1029 3181 1024
PDFNet Base 1684 3551 1024
DiffDIS Base F16 3109 6102 1024
DiffDIS Base F32 5249 5674 1024

Note that BRIA v2 and BiRefNet General F16 are the same architecture, but one is working on 32 bits and the other on 16 bits. The impact in speed and memory is very important. The 16 bits weights runs twice faster using half the memory.

In the DiffDIS this difference is not maintained and, for some reason, the 16 bits weights needs more memory, not sure why.

Also note that IS-Net models are much faster and needs much less memory than the rest, even when using a 1024x1024 image size.

Debug

  • Logging: ๐Ÿ”Š The nodes use Python's logging module. Debug messages can be helpful for understanding the transformations being applied. You can control log verbosity through ComfyUI's startup arguments (e.g., --preview-method auto --verbose DEBUG for more detailed ComfyUI logs which might also affect custom node loggers if they are configured to inherit levels). The logger name used is "RemoveBackground_SET". You can force debugging level for these nodes defining the REMOVEBACKGROUND_SET_NODES_DEBUG environment variable to 1 or 2.

๐Ÿ“œ Project History

  • 1.0.0 2025-10-23: Initial release

โš–๏ธ License

GPL-3.0

๐Ÿ™ Attributions

  • Good part of the initial code and this README was generated using Gemini 2.5 Pro.
  • I took various ideas from ComfyUI_BiRefNet_ll
  • These nodes contains the inference code for the models:
    • BiRefNet: Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe
    • Depth Anything: Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao (HKU/TikTok)
    • DiffDIS: Qian Yu, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Bo Li, Lihe Zhang, Huchuan Lu
    • Diffusers: The HuggingFace Team
    • DINO: Meta AI Research
    • InSPyReNet: Taehun Kim, Kunhee Kim, Joonyeong Lee, Dongmin Cha, Jiho Lee, Daijin Kim
    • MODNet: Zhanghan Ke, Jiayu Sun, Kaican Li, Qiong Yan, Rynson W.H. Lau
    • MVANet: Qian Yu, Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu
      • BEN: Maxwell Meyer and Jack Spruyt
    • PDFNet: Xianjie Liu, Keren Fu, Qijun Zhao
    • Swin: Ze Liu, Yutong Lin, Yixuan Wei
    • U-2-Net: Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane and Martin Jagersand
      • IS-Net: Xuebin Qin, Hang Dai, Xiaobin Hu, Deng-Ping Fan, Ling Shao, Luc Van Gool
  • Code for Depth Anything v2 by Kijai (Jukka Seppรคnen)
  • All working together by Salvador E. Tropea

About

BiRefNet background removal for ComfyUI

Resources

License

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published

Languages

  • Python 100.0%