Skip to content

zero3 hangs in inference #860

@stas00

Description

@stas00

So training works with zero3 and then I do inference calling deepspeed.forward() and while it works on a very small sample, with just slightly bigger sample it hangs with 100% gpu utilization:

Thread 0x00007f57caf71740 (most recent call first):
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/cuda/streams.py", line 95 in synchronize
  File "/mnt/nvme1/code/github/00optimize/DeepSpeed/deepspeed/runtime/zero/stage3.py", line 490 in _synchronize_communication
  File "/mnt/nvme1/code/github/00optimize/DeepSpeed/deepspeed/runtime/zero/stage3.py", line 406 in fetch_sub_module
  File "/mnt/nvme1/code/github/00optimize/DeepSpeed/deepspeed/runtime/zero/stage3.py", line 1139 in pre_sub_module_forward_function
  File "/mnt/nvme1/code/github/00optimize/DeepSpeed/deepspeed/runtime/zero/stage3.py", line 1071 in _pre_forward_module_hook
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 881 in _call_impl
  File "/mnt/nvme1/code/huggingface/transformers-ds-zero-3/src/transformers/models/t5/modeling_t5.py", line 451 in project
  File "/mnt/nvme1/code/huggingface/transformers-ds-zero-3/src/transformers/models/t5/modeling_t5.py", line 474 in forward
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 892 in _call_impl
  File "/mnt/nvme1/code/huggingface/transformers-ds-zero-3/src/transformers/models/t5/modeling_t5.py", line 540 in forward
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 892 in _call_impl
  File "/mnt/nvme1/code/huggingface/transformers-ds-zero-3/src/transformers/models/t5/modeling_t5.py", line 633 in forward
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 892 in _call_impl
  File "/mnt/nvme1/code/huggingface/transformers-ds-zero-3/src/transformers/models/t5/modeling_t5.py", line 954 in forward
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 892 in _call_impl
  File "/mnt/nvme1/code/huggingface/transformers-ds-zero-3/src/transformers/models/t5/modeling_t5.py", line 1505 in forward
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 892 in _call_impl
  File "/mnt/nvme1/code/github/00optimize/DeepSpeed/deepspeed/runtime/engine.py", line 893 in forward
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 872 in _call_impl
  File "/mnt/nvme1/code/huggingface/transformers-ds-zero-3/src/transformers/trainer_seq2seq.py", line 185 in prediction_step
  File "/mnt/nvme1/code/huggingface/transformers-ds-zero-3/src/transformers/trainer.py", line 1800 in prediction_loop
  File "/mnt/nvme1/code/huggingface/transformers-ds-zero-3/src/transformers/trainer.py", line 1647 in evaluate
  File "/mnt/nvme1/code/huggingface/transformers-ds-zero-3/src/transformers/trainer_seq2seq.py", line 74 in evaluate
  File "examples/seq2seq/run_seq2seq.py", line 607 in main
  File "examples/seq2seq/run_seq2seq.py", line 655 in <module>

the trace is from faulthandler so please read in reverse.

I'm not sure if you have inference tests - may be this can be reproduced with just model.eval()?

Config:

{
    "fp16": {
        "enabled": true,
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 16,
        "hysteresis": 2,
        "min_loss_scale": 1
    },

    "zero_optimization": {
        "stage": 3,
        "cpu_offload": true,
        "cpu_offload_params": true,
        "cpu_offload_use_pin_memory" : true,
        "overlap_comm": true,
        "contiguous_gradients": true,
        "stage3_max_live_parameters": 1e9,
        "stage3_max_reuse_distance": 1e8,
        "stage3_prefetch_bucket_size": 2e5,
        "stage3_param_persitance_threshold": 1e5,
        "reduce_bucket_size": 3e6,
        "prefetch_bucket_size": 3e6,
        "sub_group_size": 1e6
    },

    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": 3e-5,
            "betas": [0.8, 0.999],
            "eps": 1e-8,
            "weight_decay": 3e-7
        }
    },

    "scheduler": {
        "type": "WarmupLR",
        "params": {
            "warmup_min_lr": 0,
            "warmup_max_lr": 3e-5,
            "warmup_num_steps": 500
        }
    },

    "steps_per_print": 2000,
    "wall_clock_breakdown": false
}

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions