Skip to content

Add patch to locomanipulation SDG pipeline to avoid use of flash attn#5596

Merged
kellyguo11 merged 7 commits into
isaac-sim:developfrom
jaybdub:jwelsh/locomanip_sdg_no_flash_attn_patch
May 15, 2026
Merged

Add patch to locomanipulation SDG pipeline to avoid use of flash attn#5596
kellyguo11 merged 7 commits into
isaac-sim:developfrom
jaybdub:jwelsh/locomanip_sdg_no_flash_attn_patch

Conversation

@jaybdub
Copy link
Copy Markdown
Contributor

@jaybdub jaybdub commented May 12, 2026

Make locomanipulation SDG GR00T flow runnable without flash-attn

Summary

Two small fixes that let users finetune and roll out the locomanipulation SDG
GR00T policy on hardware where flash-attn is unavailable (e.g. Blackwell, or
any environment where the wheel fails to build).

Changes

  • scripts/imitation_learning/locomanipulation_sdg/gr00t/no_flash_attn.patch
    (new): patch against the Isaac-GR00T repo that switches the bundled Eagle 2.5
    VL model from flash_attention_2 to PyTorch SDPA, and guards the RADIO
    vision module's flash_attn imports so the package becomes importable
    without flash-attn installed. SigLIP path works; RADIO path is unsupported
    without flash-attn (documented in the patch).

  • docs/source/overview/imitation-learning/humanoids_imitation.rst: adds
    a note in the GR00T install section explaining when to apply the patch
    (build failure, or RuntimeError: FlashAttention only supports Ampere GPUs or newer) and how to apply it from the sibling Isaac-GR00T checkout.

  • scripts/imitation_learning/locomanipulation_sdg/gr00t/rollout_policy.py:
    override env_cfg.recorders with ActionStateRecorderManagerCfg() so the
    rollout doesn't try to record env._locomanipulation_sdg_output_data, which
    is only populated by the data-generation state machine in generate_data.py
    and is absent during policy rollout. Without this, the recorder raises
    AttributeError on the first pre-step.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@jaybdub jaybdub changed the base branch from main to develop May 12, 2026 21:03
@github-actions github-actions Bot added documentation Improvements or additions to documentation isaac-sim Related to Isaac Sim team isaac-mimic Related to Isaac Mimic team infrastructure labels May 12, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isaac Lab Review Bot 🤖

Re-review after commits 511bc8d15c318f

Summary: This PR adds an optional patch to bypass flash-attn dependency in the locomanipulation SDG pipeline for GR00T. Good fix for compatibility with newer GPU architectures (like Blackwell) where flash-attn prebuilt kernels don't exist yet.

📝 Changes in this push

The new commit adds a small but important fix to rollout_policy.py:

  • Line 379-381: Resets env_cfg.recorders to a fresh ActionStateRecorderManagerCfg() before setting the export path. This fixes a bug where the SDG output-data recorder term would try to pull env._locomanipulation_sdg_output_data which is only populated during data generation, not during policy rollout. Good defensive fix.

✅ What looks good

  1. Well-scoped change - Only 3 files modified, focused on the specific issue
  2. Optional approach - The patch is opt-in, not forced on all users
  3. Good documentation - Clear explanation in the RST file about when/why to use the patch
  4. Graceful fallback - The patch handles missing flash-attn imports gracefully with try/except blocks
  5. Uses SDPA - PyTorch's native Scaled Dot Product Attention is a reasonable fallback that works across CUDA architectures
  6. Rollout fix - The new change properly isolates rollout from SDG-specific recorder state

📋 Minor suggestions (non-blocking)

  1. Checklist items unchecked - The PR checklist shows all items unchecked. Consider marking the applicable ones before merge.

⚠️ One consideration (unchanged from previous review)

The patch sets flash_attn_unpadded_qkvpacked_func = None and pad_input = unpad_input = None when imports fail, but FlashAttention class still exists and may be instantiated. If RADIO vision encoder is used (rather than SigLIP), this could hit a runtime error when the None functions are called. The comment notes "RADIO vision won't work, SigLIP does" — consider adding a runtime check that raises a clear error message if RADIO is selected without flash-attn installed.

Verdict: ✅ LGTM — straightforward compatibility fix with a good rollout bugfix in the latest commit.


Update (e3868cf): Branch rebased onto develop. The 3 PR files are unchanged — no new review items.

@@ -0,0 +1,63 @@
diff --git a/gr00t/model/backbone/eagle2_hg_model/config.json b/gr00t/model/backbone/eagle2_hg_model/config.json
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this git patch sensitive to the particular commit of GR00T? I.e. if someone changes to a different commit and the code changes slightly this will result in a git apply failure. Maybe worth mentioning in the docs the particular commit that this patch works against.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gr00t checkout is already pinned to a specific version in the documentation that matches the patch

Copy link
Copy Markdown
Collaborator

@peterd-NV peterd-NV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, left a comment regarding documenting the commit the git patch works against.

@kellyguo11 kellyguo11 moved this to In review in Isaac Lab May 15, 2026
@kellyguo11 kellyguo11 moved this from In review to Ready to merge in Isaac Lab May 15, 2026
@kellyguo11 kellyguo11 merged commit d84b905 into isaac-sim:develop May 15, 2026
32 of 33 checks passed
@github-project-automation github-project-automation Bot moved this from Ready to merge to Done in Isaac Lab May 15, 2026
matthewtrepte pushed a commit to matthewtrepte/IsaacLab that referenced this pull request May 18, 2026
…isaac-sim#5596)

# Make locomanipulation SDG GR00T flow runnable without flash-attn

## Summary

Two small fixes that let users finetune and roll out the
locomanipulation SDG
GR00T policy on hardware where `flash-attn` is unavailable (e.g.
Blackwell, or
any environment where the wheel fails to build).

## Changes

-
**`scripts/imitation_learning/locomanipulation_sdg/gr00t/no_flash_attn.patch`**
(new): patch against the Isaac-GR00T repo that switches the bundled
Eagle 2.5
VL model from `flash_attention_2` to PyTorch SDPA, and guards the RADIO
  vision module's `flash_attn` imports so the package becomes importable
without flash-attn installed. SigLIP path works; RADIO path is
unsupported
  without flash-attn (documented in the patch).

- **`docs/source/overview/imitation-learning/humanoids_imitation.rst`**:
adds
  a note in the GR00T install section explaining when to apply the patch
(build failure, or `RuntimeError: FlashAttention only supports Ampere
GPUs
  or newer`) and how to apply it from the sibling Isaac-GR00T checkout.

-
**`scripts/imitation_learning/locomanipulation_sdg/gr00t/rollout_policy.py`**:
override `env_cfg.recorders` with `ActionStateRecorderManagerCfg()` so
the
rollout doesn't try to record `env._locomanipulation_sdg_output_data`,
which
is only populated by the data-generation state machine in
`generate_data.py`
  and is absent during policy rollout. Without this, the recorder raises
  `AttributeError` on the first pre-step.


<!-- As a practice, it is recommended to open an issue to have
discussions on the proposed pull request.
This makes it easier for the community to keep track of what is being
developed or added, and if a given feature
is demanded by more than one party. -->

## Type of change

<!-- As you go through the list, delete the ones that are not
applicable. -->

- Bug fix (non-breaking change which fixes an issue)
## Screenshots

Please attach before and after screenshots of the change if applicable.

<!--
Example:

| Before | After |
| ------ | ----- |
| _gif/png before_ | _gif/png after_ |

To upload images to a PR -- simply drag and drop an image while in edit
mode and it should upload the image directly. You can then paste that
source into the above before/after sections.
-->

## Checklist

- [ ] I have read and understood the [contribution
guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html)
- [ ] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [ ] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->

---------

Co-authored-by: Kelly Guo <kellyg@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation infrastructure isaac-mimic Related to Isaac Mimic team isaac-sim Related to Isaac Sim team

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants