Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions modules/dynunet_pipeline/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Overview
This pipeline is modified from NNUnet [1][2] which wins the "Medical Segmentation Decathlon Challenge 2018" and open sourced from https://github.com/MIC-DKFZ/nnUNet.

## Data
The source decathlon datasets can be found from http://medicaldecathlon.com/.

After getting the dataset, please run `create_datalist.py` to get the datalists (please check the command line arguments first). The default seed can help to get the same 5 folds data splits as NNUnet has, and the created datalist will be in `config/`

My running environment:

- OS: Ubuntu 20.04.1 LTS
- Python: 3.8.5
- Pytorch: 1.8.0

To prevent the inconsistency, all json files are included in `config/` already.

## Training
Please run `train.py` for training. Please modify the command line arguments according
to the actual situation, such as `determinism_flag` for deterministic training, `amp` for automatic mixed precision.

## Validation
Please run `train.py` and set the argument `mode` to `val` for validation.

## Inference
Please run `inference.py` for inference.

## Examples
All training scripts for 10 tasks are included in `commands/`.
For instance:

- `train.sh` is used for training.
- `finetune.sh` is used for finetuning.
- `val.sh` is used for validation.
- `infer.sh` is used for inference.
- If you need to use multiple GPUs, please run scripts that contain `multi_gpu`.

You can take task 04's scripts for reference since for other tasks, only the training parts are included. A task folder that contains `train.sh` means it only needs to use 1 GPU for training, and `train_multi_gpu.sh` means it needs at least 2 GPUs for training.

The devices I used for training for all tasks are shown as follow:

| task | number of GPUs used (Tesla V100 32GB) |
|:----:|:-------------------------------------:|
| 1 | 2 |
| 2 | 1 |
| 3 | 4 |
| 4 | 1 |
| 5 | 1 |
| 6 | 1 |
| 7 | 2 |
| 8 | 2 |
| 9 | 1 |
| 10 | 1 |

I used these scripts and trained for all 5 folds for all 10 tasks. As for the test set, I did the ensemble by average the 5 feature maps (coming from 5 folds' models) before the `argmax` manipulation (for task 03, since the feature maps are very large, I just did voting for 5 final predictions). By submitting the ensembled results to the Decathlon Challenge's Leaderboard, I got the following results:

| | DynUNet class 1 | 2 | 3 | NNUNet class 1 | 2 | 3 |
|:-------:|:---------------:|:----:|:----:|:--------------:|:----:|:----:|
| task 01 | 0.68 | 0.47 | 0.69 | 0.68 | 0.47 | 0.68 |


| | DynUNet class 1 | NNUNet class 1 |
|:-------:|:---------------:|:--------------:|
| task 02 | 0.93 | 0.93 |
| task 06 | 0.67 | 0.74 |
| task 09 | 0.96 | 0.97 |
| task 10 | 0.55 | 0.58 |


| | DynUNet class 1 | 2 | NNUNet class 1 | 2 |
|:-------:|:---------------:|:----:|:--------------:|:----:|
| task 03 | 0.95 | 0.72 | 0.96 | 0.76 |
| task 04 | 0.90 | 0.88 | 0.90 | 0.89 |
| task 05 | 0.71 | 0.87 | 0.77 | 0.90 |
| task 07 | 0.81 | 0.54 | 0.82 | 0.53 |
| task 08 | 0.66 | 0.71 | 0.66 | 0.72 |

Comments:
- The results of DynUNet come from the re-implemented `3D_fullres` version in MONAI and without postprocessing.

- The results of NNUnet come from different versions (`3D_fullres` for task 01, 02 and 04, `3D_cascade` for task 10, and ensembled two versions for other tasks) and may have postprocessing [1].

- Therefore, the two results may not be fully comparable and the above tables are just for reference.

- After implementing this repository, I re-trained on task 04 and attached the validation results as follow, and the comparisons between DynUNet and NNUnet are all for the single `3D_fullres` version.

As for task 04, with the default settings in `train.sh` and `finetune.sh`, you can get around the following validation results:

| | 0 | 1 | 2 | 3 | 4 | Mean | NNUNet val |
|---------|--------|--------|--------|--------|--------|--------|------------|
| class 1 | 0.9007 | 0.8930 | 0.8985 | 0.8979 | 0.9015 | 0.8983 | 0.8975 |
| class 2 | 0.8835 | 0.8774 | 0.8826 | 0.8818 | 0.8828 | 0.8816 | 0.8807 |


# References
[1] Isensee F, Jäger P F, Kohl S A A, et al. Automated design of deep learning methods for biomedical image segmentation[J]. arXiv preprint arXiv:1904.08128, 2019.

[2] Isensee F, Petersen J, Klein A, et al. nnu-net: Self-adapting framework for u-net-based medical image segmentation[J]. arXiv preprint arXiv:1809.10486, 2018.
12 changes: 12 additions & 0 deletions modules/dynunet_pipeline/commands/task01/finetune_multi_gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# train step 2, finetune with small learning rate
# please replace the weight variable into your actual weight

lr=1e-2
fold=0
weight=model.pt

python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 \
--master_addr="localhost" --master_port=1234 \
train.py -fold $fold -train_num_workers 4 -interval 10 -num_samples 1 \
-learning_rate $lr -max_epochs 1000 -task_id 01 -pos_sample_num 1 \
-expr_name baseline -tta_val True -checkpoint $weight -multi_gpu True
12 changes: 12 additions & 0 deletions modules/dynunet_pipeline/commands/task01/train_multi_gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# train step 1, with large learning rate
# although max_epochs here is 3000, my results shown that for all 5 folds,
# the best epochs is less than 400, thus maybe you can manually stop early.

lr=1e-1
fold=0

python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 \
--master_addr="localhost" --master_port=1234 \
train.py -fold $fold -train_num_workers 4 -interval 10 -num_samples 1 \
-learning_rate $lr -max_epochs 3000 -task_id 01 -pos_sample_num 1 \
-expr_name baseline -tta_val True -multi_gpu True
11 changes: 11 additions & 0 deletions modules/dynunet_pipeline/commands/task02/finetune.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# train step 2, finetune with small learning rate
# please replace the weight variable into your actual weight

lr=1e-2
fold=0
weight=model.pt

python train.py -fold $fold -train_num_workers 4 -interval 1 -num_samples 4 \
-learning_rate $lr -max_epochs 500 -task_id 02 -pos_sample_num 1 \
-expr_name baseline -tta_val True -checkpoint $weight -determinism_flag True \
-determinism_seed 0
8 changes: 8 additions & 0 deletions modules/dynunet_pipeline/commands/task02/train.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# train step 1, with large learning rate

lr=1e-1
fold=0

python train.py -fold $fold -train_num_workers 4 -interval 1 -num_samples 4 \
-learning_rate $lr -max_epochs 3000 -task_id 02 -pos_sample_num 1 \
-expr_name baseline -tta_val True -determinism_flag True -determinism_seed 0
13 changes: 13 additions & 0 deletions modules/dynunet_pipeline/commands/task03/finetune_multi_gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# train step 2, finetune with small learning rate
# please replace the weight variable into your actual weight

lr=1e-2
fold=0
weight=model.pt

python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 \
--master_addr="localhost" --master_port=1234 \
train.py -fold $fold -train_num_workers 8 -interval 20 -num_samples 1 \
-learning_rate $lr -max_epochs 2000 -task_id 03 -pos_sample_num 1 \
-expr_name baseline -tta_val True -checkpoint $weight -multi_gpu True \
-eval_overlap 0.5 -sw_batch_size 2 -batch_dice True
11 changes: 11 additions & 0 deletions modules/dynunet_pipeline/commands/task03/train_multi_gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# train step 1, with large learning rate

lr=2e-2
fold=0

python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 \
--master_addr="localhost" --master_port=1234 \
train.py -fold $fold -train_num_workers 8 -interval 20 -num_samples 1 \
-learning_rate $lr -max_epochs 3000 -task_id 03 -pos_sample_num 1 \
-expr_name baseline -tta_val True -multi_gpu True -eval_overlap 0.1 \
-sw_batch_size 2 -batch_dice True
11 changes: 11 additions & 0 deletions modules/dynunet_pipeline/commands/task04/finetune.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# train step 2, finetune with small learning rate
# please replace the weight variable into your actual weight

lr=1e-3
fold=0
weight=model.pt

python train.py -fold $fold -train_num_workers 4 -interval 1 -num_samples 1 \
-learning_rate $lr -max_epochs 50 -task_id 04 -pos_sample_num 1 \
-expr_name baseline -tta_val True -checkpoint $weight -determinism_flag True \
-determinism_seed 0
12 changes: 12 additions & 0 deletions modules/dynunet_pipeline/commands/task04/finetune_multi_gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# train step 2, finetune with small learning rate
# please replace the weight variable into your actual weight

lr=1e-3
fold=0
weight=model.pt

python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 \
--master_addr="localhost" --master_port=1234 \
train.py -fold $fold -train_num_workers 4 -interval 1 -num_samples 1 \
-learning_rate $lr -max_epochs 50 -task_id 04 -pos_sample_num 1 \
-expr_name baseline -tta_val True -checkpoint $weight -multi_gpu True
7 changes: 7 additions & 0 deletions modules/dynunet_pipeline/commands/task04/infer.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# please replace the weight variable into your actual weight

weight=model.pt
fold=0

python inference.py -fold $fold -expr_name baseline -task_id 04 -tta_val True \
-checkpoint $weight
9 changes: 9 additions & 0 deletions modules/dynunet_pipeline/commands/task04/infer_multi_gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# please replace the weight variable into your actual weight

weight=model.pt
fold=0

python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 \
--master_addr="localhost" --master_port=1234 \
inference.py -fold $fold -expr_name baseline -task_id 04 -tta_val True \
-checkpoint $weight -multi_gpu True
8 changes: 8 additions & 0 deletions modules/dynunet_pipeline/commands/task04/train.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# train step 1, with large learning rate

lr=1e-1
fold=0

python train.py -fold $fold -train_num_workers 4 -interval 1 -num_samples 1 \
-learning_rate $lr -max_epochs 500 -task_id 04 -pos_sample_num 2 \
-expr_name baseline -tta_val True -determinism_flag True -determinism_seed 0
10 changes: 10 additions & 0 deletions modules/dynunet_pipeline/commands/task04/train_multi_gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# train step 1, with large learning rate

lr=1e-1
fold=0

python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 \
--master_addr="localhost" --master_port=1234 \
train.py -fold $fold -train_num_workers 4 -interval 1 -num_samples 1 \
-learning_rate $lr -max_epochs 500 -task_id 04 -pos_sample_num 2 \
-expr_name baseline -tta_val True -multi_gpu True
7 changes: 7 additions & 0 deletions modules/dynunet_pipeline/commands/task04/val.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# please replace the weight variable into your actual weight

weight=model.pt
fold=0

python train.py -fold $fold -expr_name baseline -task_id 04 -tta_val True \
-checkpoint $weight -mode val
9 changes: 9 additions & 0 deletions modules/dynunet_pipeline/commands/task04/val_multi_gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# please replace the weight variable into your actual weight

weight=model.pt
fold=0

python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 \
--master_addr="localhost" --master_port=1234 \
train.py -fold $fold -expr_name baseline -task_id 04 -tta_val True \
-checkpoint $weight -mode val -multi_gpu True
11 changes: 11 additions & 0 deletions modules/dynunet_pipeline/commands/task05/finetune.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# train step 2, finetune with small learning rate
# please replace the weight variable into your actual weight

lr=1e-2
fold=0
weight=model.pt

python train.py -fold $fold -train_num_workers 4 -interval 1 -num_samples 4 \
-learning_rate $lr -max_epochs 1000 -task_id 05 -pos_sample_num 1 \
-expr_name baseline -tta_val True -checkpoint $weight -determinism_flag True \
-determinism_seed 0
8 changes: 8 additions & 0 deletions modules/dynunet_pipeline/commands/task05/train.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# train step 1, with large learning rate

lr=1e-1
fold=0

python train.py -fold $fold -train_num_workers 4 -interval 5 -num_samples 4 \
-learning_rate $lr -max_epochs 1000 -task_id 05 -pos_sample_num 1 \
-expr_name baseline -tta_val True -determinism_flag True -determinism_seed 0
11 changes: 11 additions & 0 deletions modules/dynunet_pipeline/commands/task06/finetune.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# train step 2, finetune with small learning rate
# please replace the weight variable into your actual weight

lr=1e-3
fold=0
weight=model.pt

python train.py -fold $fold -train_num_workers 4 -interval 5 -num_samples 1 \
-learning_rate $lr -max_epochs 1000 -task_id 06 -pos_sample_num 1 \
-expr_name baseline -tta_val True -checkpoint $weight -determinism_flag True \
-determinism_seed 0 -batch_dice True
9 changes: 9 additions & 0 deletions modules/dynunet_pipeline/commands/task06/train.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# train step 1, with large learning rate

lr=1e-2
fold=0

python train.py -fold $fold -train_num_workers 4 -interval 10 -num_samples 1 \
-learning_rate $lr -max_epochs 3000 -task_id 06 -pos_sample_num 1 \
-expr_name baseline -tta_val True -determinism_flag True -determinism_seed 0 \
-batch_dice True
17 changes: 17 additions & 0 deletions modules/dynunet_pipeline/commands/task07/finetune_multi_gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# train step 2, finetune with small learning rate
# please replace the weight variable into your actual weight
# since this task uses lr scheduler, please set the lr and max epochs
# here according to the step 1 training results. The value of max epochs equals
# to 2000 minus the best epoch in step 1.

lr=5e-3
max_epochs=1000
fold=0
weight=model.pt

python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 \
--master_addr="localhost" --master_port=1234 \
train.py -fold $fold -train_num_workers 4 -interval 10 -num_samples 1 \
-learning_rate $lr -max_epochs $max_epochs -task_id 07 -pos_sample_num 1 \
-expr_name baseline -tta_val True -checkpoint $weight -multi_gpu True \
-lr_decay True
10 changes: 10 additions & 0 deletions modules/dynunet_pipeline/commands/task07/train_multi_gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# train step 1, with large learning rate

lr=1e-2
fold=0

python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 \
--master_addr="localhost" --master_port=1234 \
train.py -fold $fold -train_num_workers 4 -interval 10 -num_samples 1 \
-learning_rate $lr -max_epochs 2000 -task_id 07 -pos_sample_num 1 \
-expr_name baseline -tta_val True -multi_gpu True -lr_decay True
17 changes: 17 additions & 0 deletions modules/dynunet_pipeline/commands/task08/finetune_multi_gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# train step 2, finetune with small learning rate
# please replace the weight variable into your actual weight
# since this task uses lr scheduler, please set the lr and max epochs
# here according to the step 1 training results. The value of max epochs equals
# to 2000 minus the best epoch in step 1.

lr=5e-3
max_epochs=1000
fold=0
weight=model.pt

python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 \
--master_addr="localhost" --master_port=1234 \
train.py -fold $fold -train_num_workers 4 -interval 10 -num_samples 1 \
-learning_rate $lr -max_epochs $max_epochs -task_id 08 -pos_sample_num 1 \
-expr_name baseline -tta_val True -checkpoint $weight -multi_gpu True \
-lr_decay True
10 changes: 10 additions & 0 deletions modules/dynunet_pipeline/commands/task08/train_multi_gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# train step 1, with large learning rate

lr=1e-2
fold=0

python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --node_rank=0 \
--master_addr="localhost" --master_port=1234 \
train.py -fold $fold -train_num_workers 4 -interval 10 -num_samples 1 \
-learning_rate $lr -max_epochs 2000 -task_id 08 -pos_sample_num 1 \
-expr_name baseline -tta_val True -multi_gpu True -lr_decay True
11 changes: 11 additions & 0 deletions modules/dynunet_pipeline/commands/task09/finetune.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# train step 2, finetune with small learning rate
# please replace the weight variable into your actual weight

lr=1e-2
fold=0
weight=model.pt

python train.py -fold $fold -train_num_workers 4 -interval 5 -num_samples 3 \
-learning_rate $lr -max_epochs 1000 -task_id 09 -pos_sample_num 2 \
-expr_name baseline -tta_val True -checkpoint $weight -determinism_flag True \
-determinism_seed 0 -lr_decay True -batch_dice True
9 changes: 9 additions & 0 deletions modules/dynunet_pipeline/commands/task09/train.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# train step 1, with large learning rate

lr=1e-2
fold=0

python train.py -fold $fold -train_num_workers 4 -interval 10 -num_samples 3 \
-learning_rate $lr -max_epochs 3000 -task_id 09 -pos_sample_num 2 \
-expr_name baseline -tta_val True -determinism_flag True -determinism_seed 0 \
-batch_dice True
Loading