diff --git a/README.md b/README.md index 447c20c668..5561fbcdb1 100644 --- a/README.md +++ b/README.md @@ -54,11 +54,66 @@ uv pip install -e '.[dev,test]' **Reminder**: Don't forget to set your HF_HOME and WANDB_API_KEY (if needed). You'll need to do a `huggingface-cli login` as well for Llama models. +### SFT + +We provide a sample SFT experiment that uses the [SQuAD dataset](https://rajpurkar.github.io/SQuAD-explorer/). + +#### Single Node + +The experiment is set up to run on 8 GPUs. If using a machine that has access to 8 GPUs, you can launch the experiment as follows: + +```sh +uv run python examples/run_sft.py +``` + +This trains `Llama3.1-8B` on 8 GPUs. To run on a single GPU, we'll have to override a few of the experiment settings. We replace the 8B model with a smaller 1B model, decrease the batch size, and update the cluster configuration to use a single gpu: + +```sh +uv run python examples/run_sft.py \ + policy.model_name="meta-llama/Llama-3.2-1B" \ + policy.train_global_batch_size=16 \ + sft.val_global_batch_size=16 \ + cluster.gpus_per_node=1 +``` + +Refer to [sft.yaml](examples/configs/sft.yaml) for a full list of parameters that can be overridden. + +#### Multi-node + +For distributed training across multiple nodes: + +Set `UV_CACHE_DIR` to a directory that can be read from all workers before running any uv run command. +```sh +export UV_CACHE_DIR=/path/that/all/workers/can/access/uv_cache +``` + +```sh +# Run from the root of NeMo-Reinforcer repo +NUM_ACTOR_NODES=2 +# Add a timestamp to make each job name unique +TIMESTAMP=$(date +%Y%m%d_%H%M%S) + +# SFT experiment uses Llama-3.1-8B model +COMMAND="uv pip install -e .; uv run ./examples/run_sft.py --config examples/configs/sft.yaml cluster.num_nodes=2 cluster.gpus_per_node=8 checkpointing.checkpoint_dir='results/sft_llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='sft-llama8b'" \ +RAY_DEDUP_LOGS=0 \ +UV_CACHE_DIR=YOUR_UV_CACHE_DIR \ +CONTAINER=YOUR_CONTAINER \ +MOUNTS="$PWD:$PWD" \ +sbatch \ + --nodes=${NUM_ACTOR_NODES} \ + --account=YOUR_ACCOUNT \ + --job-name=YOUR_JOBNAME \ + --partition=YOUR_PARTITION \ + --time=4:0:0 \ + --gres=gpu:8 \ + ray.sub +``` + ### GRPO We have a reference GRPO experiment config set up trained for math benchmarks using the [OpenInstructMath2](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) dataset. -#### Single GPU +#### Single Node To run GRPO on a single GPU for `Llama-3.2-1B-Instruct`: @@ -67,7 +122,15 @@ To run GRPO on a single GPU for `Llama-3.2-1B-Instruct`: uv run python examples/run_grpo_math.py ``` -By default, this uses the configuration in `examples/configs/grpo_math_1B.yaml`. You can customize parameters with command-line overrides: +By default, this uses the configuration in `examples/configs/grpo_math_1B.yaml`. You can customize parameters with command-line overrides. For example, to run on 8 gpus, + +```sh +# Run the GRPO math example using a 1B parameter model using 8 GPUs +uv run python examples/run_grpo_math.py \ + cluster.gpus_per_node=8 +``` + +You can override any of the parameters listed in the yaml configuration file. For example, ```sh uv run python examples/run_grpo_math.py \ @@ -75,17 +138,12 @@ uv run python examples/run_grpo_math.py \ checkpointing.checkpoint_dir="results/qwen1_5b_math" \ logger.wandb_enabled=True \ logger.wandb.name="grpo-qwen1_5b_math" \ - logger.num_val_samples_to_print=10 + logger.num_val_samples_to_print=10 \ ``` #### Multi-node -For distributed training across multiple nodes: - -Set `UV_CACHE_DIR` to a directory that can be read from all workers before running any uv run command. -```sh -export UV_CACHE_DIR=/path/that/all/workers/can/access/uv_cache -``` +For the general multi-node setup, refer to the [SFT multi-node](#multi-node) documentation. The only thing that differs from SFT is the `COMMAND`: ```sh # Run from the root of NeMo-Reinforcer repo