Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 16 additions & 49 deletions examples/images/vit/README.md
Original file line number Diff line number Diff line change
@@ -1,61 +1,28 @@
# Vision Transformer with ColoTensor
## Overview

# Overview
Vision Transformer is a class of Transformer model tailored for computer vision tasks. It was first proposed in paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) and achieved SOTA results on various tasks at that time.

In this example, we will run Vision Transformer with ColoTensor.
In our example, we are using pretrained weights of ViT loaded from HuggingFace.
We adapt the ViT training code to ColossalAI by leveraging [Boosting API](https://colossalai.org/docs/basics/booster_api) loaded with a chosen plugin, where each plugin corresponds to a specific kind of training strategy. This example supports plugins including TorchDDPPlugin, LowLevelZeroPlugin, and GeminiPlugin.

We use model **ViTForImageClassification** from Hugging Face [Link](https://huggingface.co/docs/transformers/model_doc/vit) for unit test.
You can change world size or decide whether use DDP in our code.
## Run Demo

We use model **vision_transformer** from timm [Link](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py) for training example.

(2022/6/28) The default configuration now supports 2DP+2TP with gradient accumulation and checkpoint support. Zero is not supported at present.

# Requirement

Install colossalai version >= 0.1.11

## Unit test
To run unit test, you should install pytest, transformers with:
```shell
pip install pytest transformers
By running the following script:
```bash
bash run_demo.sh
```
You will finetune a a [ViT-base](https://huggingface.co/google/vit-base-patch16-224) model on this [dataset](https://huggingface.co/datasets/beans), with more than 8000 images of bean leaves. This dataset is for image classification task and there are 3 labels: ['angular_leaf_spot', 'bean_rust', 'healthy'].

## Training example
To run training example with ViT-S, you should install **NVIDIA DALI** from [Link](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html) for dataloader support.
You also need to install timm and titans for model/dataloader support with:
```shell
pip install timm titans
```
The script can be modified if you want to try another set of hyperparameters or change to another ViT model with different size.

### Data preparation
You can download the ImageNet dataset from the [ImageNet official website](https://www.image-net.org/download.php). You should get the raw images after downloading the dataset. As we use **NVIDIA DALI** to read data, we use the TFRecords dataset instead of raw Imagenet dataset. This offers better speedup to IO. If you don't have TFRecords dataset, follow [imagenet-tools](https://github.com/ver217/imagenet-tools) to build one.
The demo code refers to this [blog](https://huggingface.co/blog/fine-tune-vit).

Before you start training, you need to set the environment variable `DATA` so that the script knows where to fetch the data for DALI dataloader.
```shell
export DATA=/path/to/ILSVRC2012
```


# How to run
## Run Benchmark

## Unit test
In your terminal
```shell
pytest test_vit.py
You can run benchmark for ViT model by running the following script:
```bash
bash run_benchmark.sh
```

This will evaluate models with different **world_size** and **use_ddp**.

## Training example
Modify the settings in run.sh according to your environment.
For example, if you set `--nproc_per_node=8` in `run.sh` and `TP_WORLD_SIZE=2` in your config file,
data parallel size will be automatically calculated as 4.
Thus, the parallel strategy is set to 4DP+2TP.

Then in your terminal
```shell
sh run.sh
```

This will start ViT-S training with ImageNet.
The script will test performance (throughput & peak memory usage) for each combination of hyperparameters. You can also play with this script to configure your own set of hyperparameters for testing.
124 changes: 124 additions & 0 deletions examples/images/vit/args.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
from colossalai import get_default_parser

def parse_demo_args():

parser = get_default_parser()
parser.add_argument(
"--model_name_or_path",
type=str,
default="google/vit-base-patch16-224",
help="Path to pretrained model or model identifier from huggingface.co/models."
)
parser.add_argument(
"--output_path",
type=str,
default="./output_model.bin",
help="The path of your saved model after finetuning."
)
parser.add_argument(
"--plugin",
type=str,
default="gemini",
help="Plugin to use. Valid plugins include 'torch_ddp','torch_ddp_fp16','gemini','low_level_zero'."
)
parser.add_argument(
"--num_epoch",
type=int,
default=3,
help="Number of epochs."
)
parser.add_argument(
"--batch_size",
type=int,
default=32,
help="Batch size (per dp group) for the training dataloader."
)
parser.add_argument(
"--learning_rate",
type=float,
default=3e-4,
help="Initial learning rate (after the potential warmup period) to use."
)
parser.add_argument(
"--warmup_ratio",
type=float,
default=0.3,
help="Ratio of warmup steps against total training steps."
)
parser.add_argument(
"--weight_decay",
type=float,
default=0.1,
help="Weight decay to use."
)
parser.add_argument(
"--seed",
type=int,
default=42,
help="A seed for reproducible training."
)

args = parser.parse_args()
return args

def parse_benchmark_args():

parser = get_default_parser()

parser.add_argument(
"--model_name_or_path",
type=str,
default="google/vit-base-patch16-224",
help="Path to a pretrained model or model identifier from huggingface.co/models."
)
parser.add_argument(
"--plugin",
type=str,
default="gemini",
help="Plugin to use. Valid plugins include 'torch_ddp','torch_ddp_fp16','gemini','low_level_zero'."
)
parser.add_argument(
"--batch_size",
type=int,
default=8,
help="Batch size (per dp group) for the training dataloader."
)
parser.add_argument(
"--num_labels",
type=int,
default=10,
help="Number of labels for classification."
)
parser.add_argument(
"--learning_rate",
type=float,
default=5e-5,
help="Initial learning rate (after the potential warmup period) to use."
)
parser.add_argument(
"--weight_decay",
type=float,
default=0.0,
help="Weight decay to use."
)
parser.add_argument(
"--max_train_steps",
type=int,
default=20,
help="Total number of training steps to perform."
)
parser.add_argument(
"--seed",
type=int,
default=42,
help="A seed for reproducible training."
)
parser.add_argument(
"--mem_cap",
type=int,
default=0,
help="Limit on the usage of space for each GPU (in GB)."
)
args = parser.parse_args()

return args
32 changes: 0 additions & 32 deletions examples/images/vit/configs/vit_1d_tp2.py

This file was deleted.

32 changes: 0 additions & 32 deletions examples/images/vit/configs/vit_1d_tp2_ci.py

This file was deleted.

32 changes: 32 additions & 0 deletions examples/images/vit/data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import torch
from torch.utils.data import Dataset
from datasets import load_dataset

class BeansDataset(Dataset):

def __init__(self, image_processor, split='train'):

super().__init__()
self.image_processor = image_processor
self.ds = load_dataset('beans')[split]
self.label_names = self.ds.features['labels'].names
self.num_labels = len(self.label_names)
self.inputs = []
for example in self.ds:
self.inputs.append(self.process_example(example))

def __len__(self):
return len(self.inputs)

def __getitem__(self, idx):
return self.inputs[idx]

def process_example(self, example):
input = self.image_processor(example['image'], return_tensors='pt')
input['labels'] = example['labels']
return input


def beans_collator(batch):
return {'pixel_values': torch.cat([data['pixel_values'] for data in batch], dim=0),
'labels': torch.tensor([data['labels'] for data in batch], dtype=torch.int64)}
6 changes: 2 additions & 4 deletions examples/images/vit/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
colossalai >= 0.1.12
torch >= 1.8.1
numpy>=1.24.1
timm>=0.6.12
titans>=0.0.7
tqdm>=4.61.2
transformers>=4.25.1
nvidia-dali-cuda110>=1.8.0 --extra-index-url https://developer.download.nvidia.com/compute/redist
transformers>=4.20.0
datasets
15 changes: 0 additions & 15 deletions examples/images/vit/run.sh

This file was deleted.

27 changes: 27 additions & 0 deletions examples/images/vit/run_benchmark.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
set -xe
pip install -r requirements.txt

export BS=8
export MEMCAP=0
export GPUNUM=1

for BS in 8 32 128
do
for PLUGIN in "torch_ddp" "torch_ddp_fp16" "low_level_zero" "gemini"
do
for GPUNUM in 1 4
do

MODEL_PATH="google/vit-base-patch16-224"
torchrun \
--standalone \
--nproc_per_node ${GPUNUM} \
vit_benchmark.py \
--model_name_or_path ${MODEL_PATH} \
--mem_cap ${MEMCAP} \
--plugin ${PLUGIN} \
--batch_size ${BS}

done
done
done
Loading