Add patch to locomanipulation SDG pipeline to avoid use of flash attn#5596
Conversation
There was a problem hiding this comment.
Isaac Lab Review Bot 🤖
Re-review after commits 511bc8d → 15c318f
Summary: This PR adds an optional patch to bypass flash-attn dependency in the locomanipulation SDG pipeline for GR00T. Good fix for compatibility with newer GPU architectures (like Blackwell) where flash-attn prebuilt kernels don't exist yet.
📝 Changes in this push
The new commit adds a small but important fix to rollout_policy.py:
- Line 379-381: Resets
env_cfg.recordersto a freshActionStateRecorderManagerCfg()before setting the export path. This fixes a bug where the SDG output-data recorder term would try to pullenv._locomanipulation_sdg_output_datawhich is only populated during data generation, not during policy rollout. Good defensive fix.
✅ What looks good
- Well-scoped change - Only 3 files modified, focused on the specific issue
- Optional approach - The patch is opt-in, not forced on all users
- Good documentation - Clear explanation in the RST file about when/why to use the patch
- Graceful fallback - The patch handles missing
flash-attnimports gracefully with try/except blocks - Uses SDPA - PyTorch's native Scaled Dot Product Attention is a reasonable fallback that works across CUDA architectures
- Rollout fix - The new change properly isolates rollout from SDG-specific recorder state
📋 Minor suggestions (non-blocking)
- Checklist items unchecked - The PR checklist shows all items unchecked. Consider marking the applicable ones before merge.
⚠️ One consideration (unchanged from previous review)
The patch sets flash_attn_unpadded_qkvpacked_func = None and pad_input = unpad_input = None when imports fail, but FlashAttention class still exists and may be instantiated. If RADIO vision encoder is used (rather than SigLIP), this could hit a runtime error when the None functions are called. The comment notes "RADIO vision won't work, SigLIP does" — consider adding a runtime check that raises a clear error message if RADIO is selected without flash-attn installed.
Verdict: ✅ LGTM — straightforward compatibility fix with a good rollout bugfix in the latest commit.
Update (e3868cf): Branch rebased onto develop. The 3 PR files are unchanged — no new review items.
| @@ -0,0 +1,63 @@ | |||
| diff --git a/gr00t/model/backbone/eagle2_hg_model/config.json b/gr00t/model/backbone/eagle2_hg_model/config.json | |||
There was a problem hiding this comment.
Is this git patch sensitive to the particular commit of GR00T? I.e. if someone changes to a different commit and the code changes slightly this will result in a git apply failure. Maybe worth mentioning in the docs the particular commit that this patch works against.
There was a problem hiding this comment.
The gr00t checkout is already pinned to a specific version in the documentation that matches the patch
peterd-NV
left a comment
There was a problem hiding this comment.
lgtm, left a comment regarding documenting the commit the git patch works against.
…isaac-sim#5596) # Make locomanipulation SDG GR00T flow runnable without flash-attn ## Summary Two small fixes that let users finetune and roll out the locomanipulation SDG GR00T policy on hardware where `flash-attn` is unavailable (e.g. Blackwell, or any environment where the wheel fails to build). ## Changes - **`scripts/imitation_learning/locomanipulation_sdg/gr00t/no_flash_attn.patch`** (new): patch against the Isaac-GR00T repo that switches the bundled Eagle 2.5 VL model from `flash_attention_2` to PyTorch SDPA, and guards the RADIO vision module's `flash_attn` imports so the package becomes importable without flash-attn installed. SigLIP path works; RADIO path is unsupported without flash-attn (documented in the patch). - **`docs/source/overview/imitation-learning/humanoids_imitation.rst`**: adds a note in the GR00T install section explaining when to apply the patch (build failure, or `RuntimeError: FlashAttention only supports Ampere GPUs or newer`) and how to apply it from the sibling Isaac-GR00T checkout. - **`scripts/imitation_learning/locomanipulation_sdg/gr00t/rollout_policy.py`**: override `env_cfg.recorders` with `ActionStateRecorderManagerCfg()` so the rollout doesn't try to record `env._locomanipulation_sdg_output_data`, which is only populated by the data-generation state machine in `generate_data.py` and is absent during policy rollout. Without this, the recorder raises `AttributeError` on the first pre-step. <!-- As a practice, it is recommended to open an issue to have discussions on the proposed pull request. This makes it easier for the community to keep track of what is being developed or added, and if a given feature is demanded by more than one party. --> ## Type of change <!-- As you go through the list, delete the ones that are not applicable. --> - Bug fix (non-breaking change which fixes an issue) ## Screenshots Please attach before and after screenshots of the change if applicable. <!-- Example: | Before | After | | ------ | ----- | | _gif/png before_ | _gif/png after_ | To upload images to a PR -- simply drag and drop an image while in edit mode and it should upload the image directly. You can then paste that source into the above before/after sections. --> ## Checklist - [ ] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [ ] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there <!-- As you go through the checklist above, you can mark something as done by putting an x character in it For example, - [x] I have done this task - [ ] I have not done this task --> --------- Co-authored-by: Kelly Guo <kellyg@nvidia.com>
Make locomanipulation SDG GR00T flow runnable without flash-attn
Summary
Two small fixes that let users finetune and roll out the locomanipulation SDG
GR00T policy on hardware where
flash-attnis unavailable (e.g. Blackwell, orany environment where the wheel fails to build).
Changes
scripts/imitation_learning/locomanipulation_sdg/gr00t/no_flash_attn.patch(new): patch against the Isaac-GR00T repo that switches the bundled Eagle 2.5
VL model from
flash_attention_2to PyTorch SDPA, and guards the RADIOvision module's
flash_attnimports so the package becomes importablewithout flash-attn installed. SigLIP path works; RADIO path is unsupported
without flash-attn (documented in the patch).
docs/source/overview/imitation-learning/humanoids_imitation.rst: addsa note in the GR00T install section explaining when to apply the patch
(build failure, or
RuntimeError: FlashAttention only supports Ampere GPUs or newer) and how to apply it from the sibling Isaac-GR00T checkout.scripts/imitation_learning/locomanipulation_sdg/gr00t/rollout_policy.py:override
env_cfg.recorderswithActionStateRecorderManagerCfg()so therollout doesn't try to record
env._locomanipulation_sdg_output_data, whichis only populated by the data-generation state machine in
generate_data.pyand is absent during policy rollout. Without this, the recorder raises
AttributeErroron the first pre-step.Type of change
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there