Skip to content

Fix bug with deepspeed and accelerator args in training_args.py#39297

Open
MuyaoLi-jimo wants to merge 3 commits intohuggingface:mainfrom
MuyaoLi-jimo:dev
Open

Fix bug with deepspeed and accelerator args in training_args.py#39297
MuyaoLi-jimo wants to merge 3 commits intohuggingface:mainfrom
MuyaoLi-jimo:dev

Conversation

@MuyaoLi-jimo
Copy link
Copy Markdown

System Info

  • transformers version: 4.54.0.dev0
  • Platform: Linux-5.4.119-19.0009.28-x86_64-with-glibc2.31
  • Python version: 3.10.16
  • Huggingface_hub version: 0.31.1
  • Safetensors version: 0.4.5
  • Accelerate version: 0.22.0.dev0
    PyTorch version (GPU?): 2.6.0+cu124 (True)
    Tensorflow version (GPU?): not installed (NA)
    Flax version (CPU?/GPU?/TPU?): not installed (NA)
    Jax version: not installed
    JaxLib version: not installed
    Using GPU in script?: T
    Using distributed or parallel set-up in script?: T

Who can help?

@SunMarc and @qgallouedec

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Right now any command with --deepspeed /path/to/json will fail and throw the following error

--deepspeed: invalid Dict value

This bug has occurred before and was previously fixed in #24974, but unfortunately, it’s broken again.

The root cause seems to be that Python’s dataclass fields do not support Union[str, dict] when parsed from the CLI.

Expected behavior

deepspeed flag should support string.

@SunMarc
Copy link
Copy Markdown
Member

SunMarc commented Jul 11, 2025

This should have been fixed by this PR actually, not sure why this is complaining now. Mind exploring a bit the cause ? #30227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants