Skip to content

art8engine/TatVTON

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TatVTON — Tattoo Virtual Try-On

Realistic tattoo compositing on body photos using SAM 2 mask extraction and SDXL Inpainting + ControlNet + IP-Adapter synthesis.

Given a body photo, a tattoo design, and a target region, TatVTON generates a photorealistic result with the tattoo naturally blended onto the skin — preserving skin texture, lighting, and body contours.

한국어

Architecture

Inference Pipeline

flowchart TB
    subgraph Input
        BODY[Body Image]
        TATTOO[Tattoo Design]
        REGION[Region Prompt<br/>Point / BBox]
    end

    subgraph Phase1["Phase 1 — Mask Extraction"]
        SAM2[SAM 2<br/>Segment Anything 2]
        MASK[Skin Mask]
        BODY --> SAM2
        REGION --> SAM2
        SAM2 --> MASK
    end

    subgraph Phase1b["Phase 1b — Body Structure (Optional)"]
        DP[DensePose]
        IUV[IUV Map]
        BODY --> DP
        DP --> IUV
    end

    subgraph Phase2["Phase 2 — Preprocessing"]
        RESIZE[Resize to 1024×1024]
        DILATE[Mask Dilation + Feathering]
        CLIP_EMB[CLIP Embedding<br/>224×224]
        MASK --> DILATE
        DILATE --> RESIZE
        BODY --> RESIZE
        TATTOO --> CLIP_EMB
    end

    subgraph Phase3["Phase 3 — Diffusion Inpainting"]
        SDXL[SDXL Inpainting]
        CN[ControlNet<br/>Body Structure Guide]
        IPA[IP-Adapter<br/>Tattoo Style Guide]
        RESIZE --> SDXL
        IUV --> CN
        CN --> SDXL
        CLIP_EMB --> IPA
        IPA --> SDXL
        SDXL --> RAW[Raw Inpainted]
    end

    subgraph Phase4["Phase 4 — Compositing"]
        COMP[Alpha Blending<br/>Mask Feathering]
        RAW --> COMP
        RESIZE --> COMP
        DILATE --> COMP
        COMP --> OUTPUT[Output Image]
    end

    style Phase1 fill:#e8f4fd,stroke:#1e88e5
    style Phase1b fill:#fce4ec,stroke:#e53935
    style Phase2 fill:#fff3e0,stroke:#fb8c00
    style Phase3 fill:#e8f5e9,stroke:#43a047
    style Phase4 fill:#f3e5f5,stroke:#8e24aa
Loading

Module Structure

graph LR
    subgraph Entry["Entry Points"]
        CLI[cli.py]
        API["__init__.py<br/>(Python API)"]
    end

    subgraph Core["Core Pipeline"]
        PIPE[TatVTONPipeline<br/>pipeline/tatvton_pipeline.py]
        CFG[TatVTONConfig<br/>config.py]
        TYPES[types.py<br/>PointPrompt, BBoxPrompt<br/>MaskResult, PipelineOutput]
    end

    subgraph Models["Model Management"]
        ML[ModelLoader<br/>models/model_loader.py]
        HUB[Hub Download<br/>models/hub.py]
    end

    subgraph Pre["Preprocessing"]
        SME[SkinMaskExtractor<br/>SAM 2]
        DPE[DensePoseExtractor<br/>detectron2]
        VAL[InputValidator]
    end

    subgraph Engine["Inference Engine"]
        IE[InpaintingEngine<br/>SDXL + ControlNet<br/>+ IP-Adapter]
    end

    subgraph Post["Postprocessing"]
        COMP2[Compositor<br/>Alpha Blending]
    end

    subgraph Utils["Utilities"]
        IMG[image.py]
        MSK[mask.py]
        DEV[device.py]
        IMP[imports.py]
    end

    CLI --> PIPE
    API --> PIPE
    CFG --> PIPE
    PIPE --> ML
    ML --> HUB
    PIPE --> SME
    PIPE --> DPE
    PIPE --> VAL
    PIPE --> IE
    PIPE --> COMP2
    IE --> ML
    SME --> ML

    style Entry fill:#e3f2fd
    style Core fill:#e8f5e9
    style Models fill:#fff3e0
    style Pre fill:#fce4ec
    style Engine fill:#f3e5f5
    style Post fill:#e0f7fa
    style Utils fill:#f5f5f5
Loading

Training Pipeline (Colab)

flowchart LR
    subgraph Data["Data Collection"]
        CRAWL[Reddit Crawler<br/>+ CLIP Filter]
        RAW_IMG[5,400+ Tattoo Images]
        CRAWL --> RAW_IMG
    end

    subgraph Label["Labeling"]
        DPOSE[DensePose]
        IUV_MAP[IUV Maps]
        BODY_LABEL[Body Part Labels]
        RAW_IMG --> DPOSE
        DPOSE --> IUV_MAP
        DPOSE --> BODY_LABEL
    end

    subgraph Build["Dataset Build"]
        CAPTION[Caption Generation]
        HF_DS[HuggingFace Dataset<br/>image + conditioning + text]
        BODY_LABEL --> CAPTION
        RAW_IMG --> HF_DS
        IUV_MAP --> HF_DS
        CAPTION --> HF_DS
    end

    subgraph Train["ControlNet Training"]
        SDXL_BASE[SDXL Base Model]
        CN_TRAIN[ControlNet SDXL<br/>Fine-tuning]
        CN_MODEL[Trained ControlNet<br/>Weights]
        SDXL_BASE --> CN_TRAIN
        HF_DS --> CN_TRAIN
        CN_TRAIN --> CN_MODEL
    end

    style Data fill:#e8f4fd,stroke:#1e88e5
    style Label fill:#fce4ec,stroke:#e53935
    style Build fill:#fff3e0,stroke:#fb8c00
    style Train fill:#e8f5e9,stroke:#43a047
Loading

Dataset

We provide a curated tattoo image dataset on HuggingFace Hub:

rlaope/tatvton-tattoo-raw — 5,400+ tattoo images crawled from Reddit, filtered with CLIP for quality (tattoo-on-skin verification). Covers 6 body parts x 12 styles.

Installation

pip install -e .

SAM 2 (installed separately):

pip install "sam-2 @ git+https://github.com/facebookresearch/sam2.git"

With DensePose support (optional):

pip install -e ".[densepose]"

Development tools:

pip install -e ".[dev]"

Quick Start

Python API

from PIL import Image
from tatvton import TatVTONPipeline, PointPrompt

body = Image.open("body.jpg")
tattoo = Image.open("tattoo.png")

pipe = TatVTONPipeline()
result = pipe(
    body_image=body,
    tattoo_image=tattoo,
    region=PointPrompt(coords=[(300, 400)]),
)
result.image.save("output.png")

CLI

# Point prompt
tatvton body.jpg tattoo.png --point 300,400 -o output.png

# Bounding box prompt
tatvton body.jpg tattoo.png --bbox 100,150,400,600 -o output.png

# Multiple points + options
tatvton body.jpg tattoo.png --point 300,400 --point 350,420 \
    --steps 20 --strength 0.9 --seed 42 -o output.png

# Save mask and raw inpainted result too
tatvton body.jpg tattoo.png --point 300,400 --save-mask --save-raw

You can also run via module:

python -m tatvton body.jpg tattoo.png --point 300,400

CLI Options

Option Description Default
body Path to body image (required) -
tattoo Path to tattoo design image (required) -
--point X,Y Point prompt (repeatable) -
--bbox X1,Y1,X2,Y2 Bounding box prompt -
-o, --output Output path output.png
--resolution Pipeline resolution 1024
--steps Inference steps 30
--strength Inpainting strength 0.85
--guidance-scale Guidance scale 7.5
--ip-adapter-scale IP-Adapter scale 0.6
--seed Random seed random
--device cuda / cpu / mps cuda
--offload none / model / sequential model
--save-mask Save mask image -
--save-raw Save raw inpainted result -

Configuration

Three levels of configuration:

Level Method Example
Defaults TatVTONPipeline() 12 GB GPU optimised
Config TatVTONPipeline(TatVTONConfig(resolution=768)) Session-wide
Per-call pipe(..., strength=0.95, seed=42) Call-specific

Requirements

  • Python 3.9+
  • PyTorch 2.0+
  • NVIDIA GPU with 12+ GB VRAM (recommended)

License

MIT

About

[TatVTON] A 3D-aware, diffusion-based virtual try-on library for photorealistic tattoo synthesis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors