Skip to content

Ea3124/TextCtrl-Translate

 
 

Repository files navigation

TextCtrl-Translate: Extending TextCtrl

with TOSPI and Multilingual Support

About

This repository is based on TextCtrl [Zeng et al., 2024].
We extend the original framework with:

  • TOSPI (Text Object Shape Powered Inference)
  • Multilingual Training: enabling text editing and translation from other languages into Korean.

Differences from the original

  • Original TextCtrl focused on Japanese text editing.
  • Our extension introduces cross-lingual capabilities: currently supports other languages → Korean text translation/editing.
  • Compatible with the original checkpoints while supporting new translation tasks.

1 Installation

  • python = 3.11.13
  • torch = 2.5.1+cu124
  • cuda = 12.4
    • used for train : NVIDIA RTX A5000 24GB * 4
    • used for inference : NVIDIA RTX A5000 24GB * 1

1.1 Code Preparation

# Clone the repo
$ git clone https://github.com/PNU-CSE-Graduation-TMOJI/TextCtrl-Translate.git
$ cd TextCtrl-Translate/
# Install required packages
$ conda create --name tospi python=3.11 -y
$ conda activate tospi
$ pip install -r requirement.txt

1.2 Checkpoints Preparation

Download the checkpoints from

  • Link_1 (project-provided custom weights: style encoder, VGG19, monitor, etc.)
  • Link_2 (pretrained Stable Diffusion v1.5: UNet, VAE, scheduler)
  • Link_3 (text/ocr-related weights: style encoder, text encoder, TrOCR, tmp checkpoint)

The file structure should be set as follows:

TextCtrl-Translate/
├── weights
│   ├── model.pth                             # weight of style encoder and unet [Link_1]
│   ├── sd                                    # pretrained weight of stable-diffusion-v1-5 [Link_2]
│   │   ├── scheduler
│   │   ├── unet
│   │   └── vae
│   ├── style_encoder.ckpt                    # pretrained style encoder [Link_3]
│   ├── text_encoder.ckpt                     # pretrained glyph encoder [Link_3]
│   ├── trocr-ko                              # OCR weight [Link_3]
│   │   ├── config.json
│   │   └── trocr_model.bin                   
│   ├── vgg19.pth                             # VGG19 feature extractor [Link_1]
│   ├── vision_model.pth                      # monitor model [Link_1]
│   └── vitstr_base_patch16_224.pth           # ViTSTR model [Link_1]
├── ...
├── tmp
│   └── model69.pt                            # tmp checkpoint [Link_3]

2 Inference

2.1 Data Preparation

The file structure of inference data should be set as the example/:

TextCtrl/
├── example/
│   ├── i_s/                # source cropped text images
│   ├── i_s.txt             # filename and text label of source images in i_s/
│   └── i_t.txt             # filename and text label of target images

2.2 Edit Arguments

Edit the arguments in inference.py, especially:

parser.add_argument("--ckpt_path", type=str, default="tmp/model69.pth")
parser.add_argument("--dataset_dir", type=str, default="example/")
parser.add_argument("--output_dir", type=str, default="example_result/")

2.3 Generate Images

The inference result could be found in example_result/ after:

$ PYTHONPATH=.../TextCtrl-Translate/ python inference.py

2.4 Inference Results

Source Images Target Text Infer Results
"정지"
"경고"
"서행"
"가수"

3 Training

3.1 Data Preparation

The training relies on synthetic data generated by SRNet-Datagen_kr.

Syn_data/
├── fonts/
│   ├── arial.ttf/              
│   └── .../  
├── train/
│   ├── train-50k-1/                    
│   ├── train-50k-2/            
│   ├── train-50k-3/              
│   └── train-50k-4/                     
│       ├── i_s/
│       ├── mask_s/
│       ├── i_s.txt
│       ├── t_f/
│       ├── mask_t/
│       ├── i_t.txt
│       ├── t_t/
│       ├── t_b/
│       └── font.txt/ 
└── eval/
    └── eval-1k/

3.2 Text Style Pretraining

$ cd prestyle/
# Modify the path of dir in the config file
$ cd configs/
$ vi StyleTrain.yaml
# Start pretraining
$ cd ..
$ python train.py

3.3 Text Glyph Pretraining

$ cd preglyph/
# Modify the path of dir in the config file
$ cd configs/
$ vi GlyphTrain.yaml
# Start pretraining
$ cd ..
$ python pretrain.py

3.4 Prior Guided Training

$ cd TextCtrl/
# Modify the path of dir in the config file
$ cd configs/
$ vi train.yaml
# Start pretraining
$ cd ..
$ python train.py

Related Resources

Our work is built upon and inspired by the following projects:

About

modified TextCtrl for translation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.7%
  • Shell 0.3%