diff --git a/tutorials/llm/mamba/mamba.rst b/tutorials/llm/mamba/mamba.rst index 525be296730a..2ce5ee5f616b 100644 --- a/tutorials/llm/mamba/mamba.rst +++ b/tutorials/llm/mamba/mamba.rst @@ -28,6 +28,7 @@ In order to proceed, ensure that you have met the following requirements: * A Docker-enabled environment, with `NVIDIA Container Runtime `_ installed, which will make the container GPU-aware. + * `Authenticate with NVIDIA NGC `_, generate API KEY from `NGC `__, add the key to your credentials following instructions in `this guide `__, and get into NVIDIA NeMo dev container ``nvcr.io/nvidia/nemo:dev``. Step-by-step Guide for Fine-Tuning @@ -51,13 +52,13 @@ Convert the Pytorch Checkpoint to a NeMo Checkpoint .. code:: bash - CUDA_VISIBLE_DEVICES="0" python /NeMo/scripts/checkpoint_converters/convert_mamba2_pyt_to_nemo.py \ + CUDA_VISIBLE_DEVICES="0" python /opt/NeMo/scripts/checkpoint_converters/convert_mamba2_pyt_to_nemo.py \ --input_name_or_path \ --output_path \ - --ngroups_mamba 8 \ + --mamba_ssm_ngroups 8 \ --precision bf16 -* Note: the ``ngroups_mamba`` parameter should be 1 for the Mamba2 models from the `Transformers are SSMs paper `__ (130m, 370m, 780m, 1.3b, and 2.7b) and 8 for the Mamba2 and Mamba2-Hybrid models by `NVIDIA `__ (both 8b). +* Note: the ``mamba_ssm_ngroups`` parameter should be 1 for the Mamba2 models from the `Transformers are SSMs paper `__ (130m, 370m, 780m, 1.3b, and 2.7b) and 8 for the Mamba2 and Mamba2-Hybrid models by `NVIDIA `__ (both 8b). Model (Tensor) Parallelism for the 8b Models ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -106,8 +107,8 @@ Run Fine-Tuning export NVTE_FUSED_ATTN=1 export NVTE_FLASH_ATTN=0 - MASTER_PORT=15008 torchrun --nproc_per_node=${NUM_DEVICES} - /home/ataghibakhsh/NeMo/examples/nlp/language_modeling/tuning/megatron_mamba_finetuning.py \ + torchrun --nproc_per_node=${NUM_DEVICES} + /opt/NeMo/examples/nlp/language_modeling/tuning/megatron_mamba_finetuning.py \ --config-path=${CONFIG_PATH} \ --config-name=${CONFIG_NAME} \ trainer.devices=${NUM_DEVICES} \