Async-Grad

Your new tool for asynchronous deep learning

Description

This repository contains the code to reproduce the results presented in "PETRA: Parallel End-to-end Training with Reversible Architectures". (Link available soon)

Requiremets

pip install -r requirements.txt

Preparing data

If you want to download the ImageNet32 prior to training, you can run:

python scripts/imagenet32loader.py

The data is assumed to be located in the ./data folder. For ImageNet, you will need to provide the path for the dataset:

python main.py --dataset imagenet --dir PATH_TO_DATASET

Usage

Dataset

For CIFAR-10 and ImageNet32, the data is assumed to be located in the ./data folder. The code will automatically download those datasets if they are not on disk.

python main.py --dataset cifar10
python main.py --dataset imagenet32

For ImageNet, you will need to provide the path for the dataset:

python main.py --dataset imagenet --dir PATH_TO_DATASET

Model

The supported models are resnet18, resnet34, resnet50, revnet18, revnet34, revnet50.

python main.py --model MODEL

Training method

To train a model using standard backpropagation, you can use:

python main.py --synchronous --store-vjp

To train a model using PETRA, you can use:

python main.py --remove-ctx-param --remove-ctx-input

Note that PETRA only applies to reversible architecture, i.e. the option --remove-ctx-input will not be effective. Note also that the option --remove-ctx-param alone corresponds to the Diversely Stale Parameters (DSP) approach.

Optimizer

This code supports SGD, Adam and LARS optimizer.

To use the SGD optimizer:

python main.py --optimizer sgd --lr LEARNING_RATE --momentum MOMENTUM --dampening DAMPENING [--nesterov]

To use the LARS optimizer:

python main.py --optimizer lars --lr LEARNING_RATE --momentum MOMENTUM --dampening DAMPENING [--nesterov]

To use the Adam optimizer:

python main.py --optimizer adam --lr LEARNING_RATE --beta1 BETA1 --beta2 BETA2 [--amsgrad]

You can set the weight decay with the option --weight-decay WEIGHT_DECAY. You can also remove weight decay on biases and batch-norm parameters with the option --no-bn-weight-decay.

To perform gradient accumulation:

python main.py --accumulate-steps ACCUMULATION_STEPS [--accumulation-averaging]

The option --accumulation-averaging is used for averaging the gradients over the accumulation steps. If you want to the linear scaling rule from Goyal et al., use the option --goyal-lr-scaling. This will scale the learning rate according to the equation:

scaled_lr = lr * accumulation_steps * batch_size / 256

Scheduler

This code supports multiple schedulers along with linear warm-up.

python main.py --max-epoch MAX_EPOCH --warm-up WARM_UP_EPOCHS --scheduler SCHEDULER

To use the STEPLR scheduler:

python main.py --scheduler steplr --lr-decay-milestones MILESTONE_1, MILESTONE_2,... --lr-decay-fact DECAY_FACTOR

To use the Polynomial scheduler:

python main.py --scheduler polynomial

To use the Cosine scheduler:

python main.py --scheduler cosine

Checkpoint

To save a checkpoint after each epoch:

python main.py --name-checkpoint CHECKPOINT_PATH

To resume from a checkpoint:

python main.py --resume CHECKPOINT_PATH

Reproducing results

To reproduce the CIFAR-10 results for revnet18:

python main.py --no-git --dataset cifar10 --batch-size 64 -p 78 --workers 4 --model resnet18 --synchronous --store-vjp --remove-ctx-input --remove-ctx-param --optimizer sgd --lr 0.1 --weight-decay 0.0005 --no-bn-weight-decay --nesterov --accumulation-steps 2 --accumulation-averaging --goyal-lr-scaling --scheduler steplr --max-epoch 300 --warm-up 5 --lr-decay-fact 0.1 --lr-decay-milestones 150 225
python main.py --no-git --dataset cifar10 --batch-size 64 -p 78 --workers 4 --model revnet18 --synchronous --store-vjp --remove-ctx-input --remove-ctx-param --optimizer sgd --lr 0.1 --weight-decay 0.0005 --no-bn-weight-decay --nesterov --accumulation-steps 2 --accumulation-averaging --goyal-lr-scaling --scheduler steplr --max-epoch 300 --warm-up 5 --lr-decay-fact 0.1 --lr-decay-milestones 150 225
python main.py --no-git --dataset cifar10 --batch-size 64 -p 78 --workers 4 --model revnet18 --remove-ctx-input --remove-ctx-param --optimizer sgd --lr 0.1 --weight-decay 0.0005 --no-bn-weight-decay --nesterov --accumulation-steps 2 --accumulation-averaging --goyal-lr-scaling --scheduler steplr --max-epoch 300 --warm-up 5 --lr-decay-fact 0.1 --lr-decay-milestones 150 225

To reproduce the ImageNet32 results for revnet34:

python main.py --no-git --dataset imagenet32 --batch-size 64 -p 2001 --workers 4 --model resnet34 --synchronous --store-vjp --remove-ctx-input --remove-ctx-param --optimizer sgd --lr 0.1 --weight-decay 0.0001 --no-bn-weight-decay --nesterov --accumulation-steps 2 --accumulation-averaging --goyal-lr-scaling --scheduler steplr --max-epoch 90 --warm-up 5 --lr-decay-fact 0.1 --lr-decay-milestones 30 60 80
python main.py --no-git --dataset imagenet32 --batch-size 64 -p 2001 --workers 4 --model revnet34 --synchronous --store-vjp --remove-ctx-input --remove-ctx-param --optimizer sgd --lr 0.1 --weight-decay 0.0001 --no-bn-weight-decay --nesterov --accumulation-steps 2 --accumulation-averaging --goyal-lr-scaling --scheduler steplr --max-epoch 90 --warm-up 5 --lr-decay-fact 0.1 --lr-decay-milestones 30 60 80
python main.py --no-git --dataset imagenet32 --batch-size 64 -p 2001 --workers 4 --model revnet34 --remove-ctx-input --remove-ctx-param --optimizer sgd --lr 0.1 --weight-decay 0.0001 --no-bn-weight-decay --nesterov --accumulation-steps 2 --accumulation-averaging --goyal-lr-scaling --scheduler steplr --max-epoch 90 --warm-up 5 --lr-decay-fact 0.1 --lr-decay-milestones 30 60 80

To reproduce the ImageNet results for revnet50:

python main.py --no-git --dataset imagenet --dir [PATH_TO_DATASET] --batch-size 64 -p 2001 --workers 16 --model resnet50 --synchronous --store-vjp --remove-ctx-input --remove-ctx-param --optimizer sgd --lr 0.1 --weight-decay 0.0001 --no-bn-weight-decay --nesterov --accumulation-steps 4 --accumulation-averaging --goyal-lr-scaling --scheduler steplr --max-epoch 90 --warm-up 5 --lr-decay-fact 0.1 --lr-decay-milestones 30 60 80
python main.py --no-git --dataset imagenet --dir [PATH_TO_DATASET] --batch-size 64 -p 2001 --workers 16 --model revnet50 --synchronous --store-vjp --remove-ctx-input --remove-ctx-param --optimizer sgd --lr 0.1 --weight-decay 0.0001 --no-bn-weight-decay --nesterov --accumulation-steps 4 --accumulation-averaging --goyal-lr-scaling --scheduler steplr --max-epoch 90 --warm-up 5 --lr-decay-fact 0.1 --lr-decay-milestones 30 60 80
python main.py --no-git --dataset imagenet --dir [PATH_TO_DATASET] --batch-size 64 -p 2001 --workers 16 --model revnet50 --remove-ctx-input --remove-ctx-param --optimizer sgd --lr 0.1 --weight-decay 0.0001 --no-bn-weight-decay --nesterov --accumulation-steps 4 --accumulation-averaging --goyal-lr-scaling --scheduler steplr --max-epoch 90 --warm-up 5 --lr-decay-fact 0.1 --lr-decay-milestones 30 60 80

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
async_torch		async_torch
img		img
scripts		scripts
unit_test		unit_test
README.md		README.md
main.py		main.py
main_error_tracking.py		main_error_tracking.py
main_fixed_size.py		main_fixed_size.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Async-Grad

Description

Requiremets

Preparing data

Usage

Dataset

Model

Training method

Optimizer

Scheduler

Checkpoint

Reproducing results

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Async-Grad

Description

Requiremets

Preparing data

Usage

Dataset

Model

Training method

Optimizer

Scheduler

Checkpoint

Reproducing results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages