Skip to content

feat: fp8 block scaling#543

Merged
terrykong merged 40 commits intomainfrom
jiemingz/fp8_block
Aug 22, 2025
Merged

feat: fp8 block scaling#543
terrykong merged 40 commits intomainfrom
jiemingz/fp8_block

Conversation

@jiemingz
Copy link
Copy Markdown
Contributor

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch 4 times, most recently from fb57ec1 to 5b9c1ba Compare June 26, 2025 14:35
Comment thread nemo_rl/models/generation/fp8.py Outdated
Comment thread nemo_rl/models/generation/fp8.py
Comment thread nemo_rl/models/generation/fp8.py Outdated
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch 4 times, most recently from 53d8ec3 to 59e8b12 Compare July 8, 2025 19:47
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch 3 times, most recently from 975df8c to 36c1710 Compare July 14, 2025 15:48
@jiemingz jiemingz changed the title draft: fp8 block scaling feat: fp8 block scaling Jul 14, 2025
@terrykong terrykong added the r0.3.0 Release r0.3.0 label Jul 14, 2025
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch from c8304c0 to 5bc8868 Compare July 14, 2025 18:57
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch from d68514a to e3a8daf Compare July 14, 2025 19:49
@jiemingz jiemingz requested a review from vcuinv July 14, 2025 21:55
@terrykong terrykong removed the r0.3.0 Release r0.3.0 label Jul 15, 2025
rybakov
rybakov previously approved these changes Jul 16, 2025
Copy link
Copy Markdown

@rybakov rybakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a config, RL/examples/configs/grpo_math_8B_fp8_L3_F1_G_i.yaml
For example, below config can be a good candidate (with optionally set num_last_layers_in_bf16: 0 num_first_layers_in_bf16: 0):

GRPO Algorithm Configuration

defaults: "grpo_math_1B.yaml"

grpo:
num_prompts_per_step: 64
num_generations_per_prompt: 32

loss_fn:
use_importance_sampling_correction: true

policy:
model_name: "meta-llama/Llama-3.1-8B-Instruct"
tokenizer:
name: ${policy.model_name} ## specify if you'd like to use a tokenizer different from the model's default
train_global_batch_size: 512
train_micro_batch_size: 1
generation_batch_size: 32 # Only used when generating using HF backend
logprob_batch_size: 2
max_total_sequence_length: 4096
precision: "bfloat16"
fsdp_offload_enabled: false
activation_checkpointing_enabled: false

dtensor_cfg:
enabled: True

dynamic_batching:
train_mb_tokens: 4096
logprob_mb_tokens: 8192

optimizer:
name: "torch.optim.AdamW"
kwargs:
lr: 3.0e-7
weight_decay: 0.01
betas: [0.9, 0.999]
eps: 1e-8

scheduler:
- name: "torch.optim.lr_scheduler.LinearLR"
kwargs:
start_factor: 0.1
end_factor: 1.0
# The scheduler iteration is per GPRO step and is decoupled with the optimizer step (may be >=1 per GPRO step)
total_iters: 13
- name: "torch.optim.lr_scheduler.ConstantLR"
kwargs:
factor: 1.0
total_iters: 10000000000
- milestones: [13]

generation:
backend: "vllm"
max_new_tokens: ${policy.max_total_sequence_length}
temperature: 1.0
top_p: 1.0
top_k: null
stop_token_ids: null
stop_strings: null
vllm_cfg:
precision: 'fp8'
use_deep_gemm: true
num_last_layers_in_bf16: 3
num_first_layers_in_bf16: 1
tensor_parallel_size: 1
gpu_memory_utilization: 0.6
max_model_len: ${policy.max_total_sequence_length}

cluster:
gpus_per_node: 8
num_nodes: 1

@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch 2 times, most recently from 36a127e to b899f3b Compare July 23, 2025 14:59
Copy link
Copy Markdown
Contributor

@SahilJain314 SahilJain314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super necessary immediately, but I think it'd be nice to include convergence plots for proof in the repo.

Comment thread nemo_rl/models/generation/vllm_backend.py Outdated
Comment thread nemo_rl/models/generation/vllm_backend.py Outdated
Comment thread nemo_rl/models/generation/fp8.py Outdated
Comment thread nemo_rl/algorithms/grpo.py Outdated
Comment thread pyproject.toml
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch from b899f3b to f5401dc Compare July 24, 2025 03:49
jiemingz and others added 19 commits August 20, 2025 13:16
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch from 8e74171 to 2573ae5 Compare August 20, 2025 21:56
@jiemingz jiemingz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Aug 20, 2025
jiemingz and others added 4 commits August 21, 2025 08:06
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests Documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FP8 vLLM inference

6 participants