TextCtrl-Translate: Extending TextCtrl
with TOSPI and Multilingual Support

About

This repository is based on TextCtrl [Zeng et al., 2024].
We extend the original framework with:

TOSPI (Text Object Shape Powered Inference)
Multilingual Training: enabling text editing and translation from other languages into Korean.

Differences from the original

Original TextCtrl focused on Japanese text editing.
Our extension introduces cross-lingual capabilities: currently supports other languages → Korean text translation/editing.
Compatible with the original checkpoints while supporting new translation tasks.

1 Installation

python = 3.11.13
torch = 2.5.1+cu124
cuda = 12.4
- used for train : NVIDIA RTX A5000 24GB * 4
- used for inference : NVIDIA RTX A5000 24GB * 1

1.1 Code Preparation

# Clone the repo
$ git clone https://github.com/PNU-CSE-Graduation-TMOJI/TextCtrl-Translate.git
$ cd TextCtrl-Translate/
# Install required packages
$ conda create --name tospi python=3.11 -y
$ conda activate tospi
$ pip install -r requirement.txt

1.2 Checkpoints Preparation

Download the checkpoints from

Link_1 (project-provided custom weights: style encoder, VGG19, monitor, etc.)
Link_2 (pretrained Stable Diffusion v1.5: UNet, VAE, scheduler)
Link_3 (text/ocr-related weights: style encoder, text encoder, TrOCR, tmp checkpoint)

The file structure should be set as follows:

TextCtrl-Translate/
├── weights
│   ├── model.pth                             # weight of style encoder and unet [Link_1]
│   ├── sd                                    # pretrained weight of stable-diffusion-v1-5 [Link_2]
│   │   ├── scheduler
│   │   ├── unet
│   │   └── vae
│   ├── style_encoder.ckpt                    # pretrained style encoder [Link_3]
│   ├── text_encoder.ckpt                     # pretrained glyph encoder [Link_3]
│   ├── trocr-ko                              # OCR weight [Link_3]
│   │   ├── config.json
│   │   └── trocr_model.bin                   
│   ├── vgg19.pth                             # VGG19 feature extractor [Link_1]
│   ├── vision_model.pth                      # monitor model [Link_1]
│   └── vitstr_base_patch16_224.pth           # ViTSTR model [Link_1]
├── ...
├── tmp
│   └── model69.pt                            # tmp checkpoint [Link_3]

2 Inference

2.1 Data Preparation

The file structure of inference data should be set as the example/:

TextCtrl/
├── example/
│   ├── i_s/                # source cropped text images
│   ├── i_s.txt             # filename and text label of source images in i_s/
│   └── i_t.txt             # filename and text label of target images

2.2 Edit Arguments

Edit the arguments in inference.py, especially:

parser.add_argument("--ckpt_path", type=str, default="tmp/model69.pth")
parser.add_argument("--dataset_dir", type=str, default="example/")
parser.add_argument("--output_dir", type=str, default="example_result/")

2.3 Generate Images

The inference result could be found in example_result/ after:

$ PYTHONPATH=.../TextCtrl-Translate/ python inference.py

2.4 Inference Results

Source Images	Target Text	Infer Results
	"정지"
	"경고"
	"서행"
	"가수"

3 Training

3.1 Data Preparation

The training relies on synthetic data generated by SRNet-Datagen_kr.

Syn_data/
├── fonts/
│   ├── arial.ttf/              
│   └── .../  
├── train/
│   ├── train-50k-1/                    
│   ├── train-50k-2/            
│   ├── train-50k-3/              
│   └── train-50k-4/                     
│       ├── i_s/
│       ├── mask_s/
│       ├── i_s.txt
│       ├── t_f/
│       ├── mask_t/
│       ├── i_t.txt
│       ├── t_t/
│       ├── t_b/
│       └── font.txt/ 
└── eval/
    └── eval-1k/

3.2 Text Style Pretraining

$ cd prestyle/
# Modify the path of dir in the config file
$ cd configs/
$ vi StyleTrain.yaml
# Start pretraining
$ cd ..
$ python train.py

3.3 Text Glyph Pretraining

$ cd preglyph/
# Modify the path of dir in the config file
$ cd configs/
$ vi GlyphTrain.yaml
# Start pretraining
$ cd ..
$ python pretrain.py

3.4 Prior Guided Training

$ cd TextCtrl/
# Modify the path of dir in the config file
$ cd configs/
$ vi train.yaml
# Start pretraining
$ cd ..
$ python train.py

Related Resources

Our work is built upon and inspired by the following projects:

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
configs		configs
demo		demo
diffusers		diffusers
evaluation		evaluation
example		example
modify		modify
preglyph		preglyph
prestyle		prestyle
src		src
.gitignore		.gitignore
README.md		README.md
environment.lock.yml		environment.lock.yml
environment.yaml		environment.yaml
environment.yml		environment.yml
infer.sh		infer.sh
inference.py		inference.py
inference_sh.py		inference_sh.py
ori_inference.py		ori_inference.py
requirements.txt		requirements.txt
tc3_environment.txt		tc3_environment.txt
tmp.py		tmp.py
train.py		train.py
utils.py		utils.py
visualize_results.py		visualize_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TextCtrl-Translate: Extending TextCtrl
with TOSPI and Multilingual Support

About

Differences from the original

1 Installation

1.1 Code Preparation

1.2 Checkpoints Preparation

2 Inference

2.1 Data Preparation

2.2 Edit Arguments

2.3 Generate Images

2.4 Inference Results

3 Training

3.1 Data Preparation

3.2 Text Style Pretraining

3.3 Text Glyph Pretraining

3.4 Prior Guided Training

Related Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Ea3124/TextCtrl-Translate

Folders and files

Latest commit

History

Repository files navigation

TextCtrl-Translate: Extending TextCtrl with TOSPI and Multilingual Support

About

Differences from the original

1 Installation

1.1 Code Preparation

1.2 Checkpoints Preparation

2 Inference

2.1 Data Preparation

2.2 Edit Arguments

2.3 Generate Images

2.4 Inference Results

3 Training

3.1 Data Preparation

3.2 Text Style Pretraining

3.3 Text Glyph Pretraining

3.4 Prior Guided Training

Related Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

TextCtrl-Translate: Extending TextCtrl
with TOSPI and Multilingual Support

Packages