Skip to content

Errors with nn.RMSNorm in DeepSpeed #33176

@loadams

Description

@loadams

System Info

Using the latest transformers from source (newer than the latest 4.44.2 release tag), the changes in pytorch_utils from this PR add nn.RMSNorm to the list of modules, but nn.RMSNorm isn't added to torch until the torch 2.4 release, causing CI failures when using DeepSpeed unless we either update torch or pin the transformers version.

Who can help?

@muellerzr

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Clone latest DeepSpeed or run CI from hpu_gaudi2.yml workflow, failure here.

Expected behavior

Error out when requiring a specific torch version if it doesn't exist, or similar.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions