This repository is based on TextCtrl [Zeng et al., 2024].
We extend the original framework with:
- TOSPI (Text Object Shape Powered Inference)
- Multilingual Training: enabling text editing and translation from other languages into Korean.
- Original TextCtrl focused on Japanese text editing.
- Our extension introduces cross-lingual capabilities: currently supports other languages → Korean text translation/editing.
- Compatible with the original checkpoints while supporting new translation tasks.
- python = 3.11.13
- torch = 2.5.1+cu124
- cuda = 12.4
- used for train : NVIDIA RTX A5000 24GB * 4
- used for inference : NVIDIA RTX A5000 24GB * 1
# Clone the repo
$ git clone https://github.com/PNU-CSE-Graduation-TMOJI/TextCtrl-Translate.git
$ cd TextCtrl-Translate/
# Install required packages
$ conda create --name tospi python=3.11 -y
$ conda activate tospi
$ pip install -r requirement.txtDownload the checkpoints from
- Link_1 (project-provided custom weights: style encoder, VGG19, monitor, etc.)
- Link_2 (pretrained Stable Diffusion v1.5: UNet, VAE, scheduler)
- Link_3 (text/ocr-related weights: style encoder, text encoder, TrOCR, tmp checkpoint)
The file structure should be set as follows:
TextCtrl-Translate/
├── weights
│ ├── model.pth # weight of style encoder and unet [Link_1]
│ ├── sd # pretrained weight of stable-diffusion-v1-5 [Link_2]
│ │ ├── scheduler
│ │ ├── unet
│ │ └── vae
│ ├── style_encoder.ckpt # pretrained style encoder [Link_3]
│ ├── text_encoder.ckpt # pretrained glyph encoder [Link_3]
│ ├── trocr-ko # OCR weight [Link_3]
│ │ ├── config.json
│ │ └── trocr_model.bin
│ ├── vgg19.pth # VGG19 feature extractor [Link_1]
│ ├── vision_model.pth # monitor model [Link_1]
│ └── vitstr_base_patch16_224.pth # ViTSTR model [Link_1]
├── ...
├── tmp
│ └── model69.pt # tmp checkpoint [Link_3]The file structure of inference data should be set as the example/:
TextCtrl/
├── example/
│ ├── i_s/ # source cropped text images
│ ├── i_s.txt # filename and text label of source images in i_s/
│ └── i_t.txt # filename and text label of target imagesEdit the arguments in inference.py, especially:
parser.add_argument("--ckpt_path", type=str, default="tmp/model69.pth")
parser.add_argument("--dataset_dir", type=str, default="example/")
parser.add_argument("--output_dir", type=str, default="example_result/")The inference result could be found in example_result/ after:
$ PYTHONPATH=.../TextCtrl-Translate/ python inference.py| Source Images | Target Text | Infer Results |
|---|---|---|
![]() |
"정지" | ![]() |
![]() |
"경고" | ![]() |
![]() |
"서행" | ![]() |
![]() |
"가수" | ![]() |
The training relies on synthetic data generated by SRNet-Datagen_kr.
Syn_data/
├── fonts/
│ ├── arial.ttf/
│ └── .../
├── train/
│ ├── train-50k-1/
│ ├── train-50k-2/
│ ├── train-50k-3/
│ └── train-50k-4/
│ ├── i_s/
│ ├── mask_s/
│ ├── i_s.txt
│ ├── t_f/
│ ├── mask_t/
│ ├── i_t.txt
│ ├── t_t/
│ ├── t_b/
│ └── font.txt/
└── eval/
└── eval-1k/
$ cd prestyle/
# Modify the path of dir in the config file
$ cd configs/
$ vi StyleTrain.yaml
# Start pretraining
$ cd ..
$ python train.py$ cd preglyph/
# Modify the path of dir in the config file
$ cd configs/
$ vi GlyphTrain.yaml
# Start pretraining
$ cd ..
$ python pretrain.py$ cd TextCtrl/
# Modify the path of dir in the config file
$ cd configs/
$ vi train.yaml
# Start pretraining
$ cd ..
$ python train.pyOur work is built upon and inspired by the following projects:








