Skip to content

Enable GRPO Qwen 3 32B with 128k context length #883

@wangshangsam

Description

@wangshangsam

Is your feature request related to a problem? Please describe.

Nemotron folks (i.e., @pjin-nvidia and @bxyu-nvidia ) would like to run GRPO experiments on Qwen 3 32B with 128k context length.

Currently they were able to launch the following configs:

For this task, there's no restrictions on:

  • DTensor path vs. Megatron Core path. Anything works would be good.
  • Runtime. It would be good if the experiments can finish overnight, but effectively, as of now, Nemotron folks are not blocked by how long the experiments take to finish.

The current understanding of the technical challenge is on GPU global memory footprint, since activations dominate the memory footprint, the size of which scales linearly with the sequence length.

Describe the solution you'd like

Being able to run GRPO Qwen 3 32B + 128k context length with a reasonable number of nodes.

  • The initial target is 8 8xH100 nodes.

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions