[Exp] Cherry-pick manager-based warp env infrastructure from dev/newton#4829
Conversation
|
@greptile review |
|
WIP but put in review for bot rebiew. |
Greptile SummaryThis PR cherry-picks and refactors the Warp manager-based RL environment infrastructure from Most issues raised in prior review rounds have been addressed:
Two remaining issues were identified:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant RL as RL Library
participant Env as ManagerBasedRLEnvWarp
participant MCS as ManagerCallSwitch
participant WGC as WarpGraphCache
participant Mgr as Warp Managers
RL->>Env: step(action)
Env->>Env: wp.copy(_action_in_wp)
Env->>MCS: call_stage("ActionManager_process_action")
MCS->>Mgr: WARP_CAPTURED → WarpGraphCache.capture_or_replay()
Note over WGC: 1st call: warm-up + capture<br/>2nd+ call: wp.capture_launch()
loop decimation
Env->>MCS: call_stage("ActionManager_apply_action")
Env->>MCS: call_stage("Scene_write_data_to_sim")
Note over MCS: Scene capped at WARP_NOT_CAPTURED
Env->>Env: sim.step()
Env->>Env: scene.update()
end
Env->>MCS: call_stage("TerminationManager_compute")
Env->>MCS: call_stage("RewardManager_compute")
Env->>Env: _reset_idx(reset_env_ids)
Env->>MCS: call_stage("EventManager_apply_interval")
Env->>MCS: call_stage("ObservationManager_compute_update_history")
Env-->>RL: obs, reward, terminated, truncated, extras
Reviews (10): Last reviewed commit: "Add warm-up before CUDA graph capture an..." | Re-trigger Greptile |
| def _load_cfg(self, cfg_source: str | None) -> dict[str, int]: | ||
| if cfg_source is not None and not isinstance(cfg_source, str): | ||
| raise TypeError(f"cfg_source must be a string or None, got: {type(cfg_source)}") | ||
| if cfg_source is None or cfg_source.strip() == "": | ||
| return dict(self.DEFAULT_CONFIG) |
There was a problem hiding this comment.
MAX_MODE_OVERRIDES not applied for default config.
When cfg_source is None (the common case without --manager_call_config), _load_cfg returns early at line 162 without applying MAX_MODE_OVERRIDES. The PR description and class comment both state that Scene_write_data_to_sim must be capped at WARP_NOT_CAPTURED (mode 1) because _apply_actuator_model is not CUDA-graph-capture-safe. With the default config {"default": 2}, get_mode_for_manager("Scene") will return 2, bypassing the cap entirely and allowing unsafe CUDA graph capture.
The fix is to apply MAX_MODE_OVERRIDES on the default-config path:
| def _load_cfg(self, cfg_source: str | None) -> dict[str, int]: | |
| if cfg_source is not None and not isinstance(cfg_source, str): | |
| raise TypeError(f"cfg_source must be a string or None, got: {type(cfg_source)}") | |
| if cfg_source is None or cfg_source.strip() == "": | |
| return dict(self.DEFAULT_CONFIG) | |
| def _load_cfg(self, cfg_source: str | None) -> dict[str, int]: | |
| if cfg_source is not None and not isinstance(cfg_source, str): | |
| raise TypeError(f"cfg_source must be a string or None, got: {type(cfg_source)}") | |
| if cfg_source is None or cfg_source.strip() == "": | |
| cfg = dict(self.DEFAULT_CONFIG) | |
| # Apply caps even for the default config | |
| default_mode = cfg[self.DEFAULT_KEY] | |
| for name, max_mode in self.MAX_MODE_OVERRIDES.items(): | |
| resolved = cfg.get(name, default_mode) | |
| if resolved > max_mode: | |
| cfg[name] = max_mode | |
| return cfg |
There was a problem hiding this comment.
Done. Restructured _load_cfg so both paths (default config and parsed env var) flow into the shared MAX_MODE_OVERRIDES application at the end.
| @Timer(name="env_step", msg="Step took:", enable=True, format="us") | ||
| def step(self, action: torch.Tensor) -> VecEnvStepReturn: |
There was a problem hiding this comment.
@Timer(enable=True) hardcoded to always-on for stable step().
The decorator is hardcoded to enable=True, meaning it will print timing output to stdout on every environment step during training. The module-level constant TIMER_ENABLED_STEP = False was clearly introduced to gate timing output globally, but this decorator bypasses it. All inner context-manager timers correctly respect TIMER_ENABLED_STEP (lines 184, 191, 209, etc.), making this inconsistency appear unintentional.
This will generate extremely verbose and undesirable output for all users of the stable environment.
| @Timer(name="env_step", msg="Step took:", enable=True, format="us") | |
| def step(self, action: torch.Tensor) -> VecEnvStepReturn: | |
| @Timer(name="env_step", msg="Step took:", enable=TIMER_ENABLED_STEP, format="us") | |
| def step(self, action: torch.Tensor) -> VecEnvStepReturn: |
There was a problem hiding this comment.
Not in scope — this comment is about the stable env (source/isaaclab/isaaclab/envs/manager_based_rl_env.py), which is not modified in this PR.
6d1ac95 to
7138023
Compare
|
@greptileai Review |
| self.reset_terminated = self.termination_manager.terminated | ||
| self.reset_time_outs = self.termination_manager.time_outs | ||
|
|
||
| @Timer(name="env_step", msg="Step took:", enable=True, format="us") |
There was a problem hiding this comment.
Outer step() timer always enabled, ignores TIMER_ENABLED_STEP
The @Timer decorator on step() has enable=True hardcoded, which means the outer step timer prints "Step took: …" to stdout on every environment step during training — even when neither DEBUG_TIMER_STEP nor DEBUG_TIMERS env vars are set.
The module-level constant TIMER_ENABLED_STEP was specifically introduced to gate this output (DEBUG_TIMER_STEP = os.environ.get("DEBUG_TIMER_STEP", "0") == "1"), but it is ignored here. All inner timers (lines 230, 265, 274, etc.) correctly respect TIMER_ENABLED_STEP. The inconsistency appears unintentional and will produce extremely verbose training output for all users.
| @Timer(name="env_step", msg="Step took:", enable=True, format="us") | |
| @Timer(name="env_step", msg="Step took:", enable=TIMER_ENABLED_STEP, format="us") |
There was a problem hiding this comment.
Done. Changed to enable=TIMER_ENABLED_STEP.
| """Select call path based on mode, execute, and apply output.""" | ||
| mode = self.get_mode_for_manager(self._manager_name_from_stage(stage)) | ||
| if mode == ManagerCallMode.STABLE: | ||
| call, result = stable_call, self._run_call(stable_call) |
There was a problem hiding this comment.
STABLE mode (mode=0) crashes unconditionally — stable_call is never provided by call sites
When mode == ManagerCallMode.STABLE, _dispatch immediately calls self._run_call(stable_call). However, stable_call defaults to None in call_stage, and every call site in manager_based_rl_env_warp.py (e.g. ActionManager_process_action, ActionManager_apply_action, Scene_write_data_to_sim, TerminationManager_compute, RewardManager_compute, ObservationManager_compute_update_history, etc.) omits stable_call entirely.
_run_call(None) will immediately raise TypeError: 'NoneType' object is not subscriptable because it indexes call["fn"]. This means configuring any manager — or "default" — to mode 0 via MANAGER_CALL_CONFIG will crash the training loop, despite STABLE mode being documented as a supported execution path that routes to isaaclab.managers.
A guard should be added so the failure is caught early and the error message is actionable:
if mode == ManagerCallMode.STABLE:
if stable_call is None:
raise ValueError(
f"Stage '{stage}' requested STABLE mode but no stable_call was provided."
)
call, result = stable_call, self._run_call(stable_call)There was a problem hiding this comment.
Done. Added a guard that raises ValueError with an actionable message when stable_call is None.
7138023 to
e044ddc
Compare
|
@greptileai Review |
| assert asset_cfg.joint_ids_wp is not None | ||
| assert env.rng_state_wp is not None |
There was a problem hiding this comment.
assert for runtime validation is unsafe
These assert statements are used for runtime validation, but assertions are silently skipped when Python is run with the -O (optimize) flag. If a caller passes a regular isaaclab.managers.SceneEntityCfg (which lacks joint_ids_wp), or an env without rng_state_wp, the failure will produce a cryptic AttributeError instead of a clear message.
Replace with explicit exception raises:
| assert asset_cfg.joint_ids_wp is not None | |
| assert env.rng_state_wp is not None | |
| if asset_cfg.joint_ids_wp is None: | |
| raise ValueError( | |
| f"reset_joints_by_offset requires an experimental SceneEntityCfg with resolved joint_ids_wp, " | |
| f"but got None for asset '{asset_cfg.name}'. " | |
| "Use isaaclab_experimental.managers.SceneEntityCfg and ensure joint_names are set." | |
| ) | |
| if not hasattr(env, "rng_state_wp") or env.rng_state_wp is None: | |
| raise AttributeError( | |
| "reset_joints_by_offset requires env.rng_state_wp to be initialized. " | |
| "Use ManagerBasedEnvWarp or ManagerBasedRLEnvWarp as the base environment." | |
| ) |
The same applies to reset_joints_by_scale at lines 153–154.
There was a problem hiding this comment.
Done. Replaced assert with explicit raise ValueError/AttributeError with actionable messages in both reset_joints_by_offset and reset_joints_by_scale.
|
|
||
| with contextlib.suppress(ImportError): | ||
| from isaaclab_experimental.envs import DirectRLEnvWarp, ManagerBasedRLEnvWarp |
There was a problem hiding this comment.
TYPE_CHECKING import inside contextlib.suppress leaves names undefined for type checkers
The contextlib.suppress(ImportError) pattern at runtime is correct, but under TYPE_CHECKING, static analysis tools (mypy, pyright) also evaluate this block literally. If the import fails in that context, DirectRLEnvWarp and ManagerBasedRLEnvWarp will be undefined when the type checker evaluates the unwrapped return annotation on line 132.
The same pattern appears in source/isaaclab_rl/isaaclab_rl/rl_games/rl_games.py, source/isaaclab_rl/isaaclab_rl/sb3.py, and source/isaaclab_rl/isaaclab_rl/skrl.py.
A cleaner alternative is to use a plain try/except ImportError that assigns stub type aliases when the import fails:
if TYPE_CHECKING:
try:
from isaaclab_experimental.envs import DirectRLEnvWarp, ManagerBasedEnvWarp, ManagerBasedRLEnvWarp
except ImportError:
from isaaclab.envs import ManagerBasedRLEnv as DirectRLEnvWarp # type: ignore[assignment]
from isaaclab.envs import ManagerBasedRLEnv as ManagerBasedEnvWarp # type: ignore[assignment]
from isaaclab.envs import ManagerBasedRLEnv as ManagerBasedRLEnvWarp # type: ignore[assignment]There was a problem hiding this comment.
Low priority — not fixing in this PR. The contextlib.suppress pattern is consistent with the other RL wrappers (rl_games, sb3, skrl) and works correctly at runtime.
e044ddc to
5f3bc76
Compare
0b666f6 to
59d82f4
Compare
Latest changesAddressed review feedback:
VerificationCartpole Warp (manager-based), 300 iterations, 4096 envs,
|
|
@greptileai Review |
3cd41f3 to
224a902
Compare
|
@greptileai Review |
| assert self._env.rng_state_wp is not None | ||
|
|
||
| # iterate over all the interval terms (fixed list; captured graph-friendly) | ||
| for i, term_cfg in enumerate(self._mode_term_cfgs["interval"]): | ||
| lower, upper = self._interval_term_ranges[i] | ||
| if self._interval_term_is_global[i]: | ||
| assert self._interval_global_rng_state_wp is not None |
There was a problem hiding this comment.
assert in hot-path methods — inconsistent with fixes applied elsewhere in this PR
_apply_interval (and _apply_reset at line 370) still use assert for internal invariant checks:
assert self._env.rng_state_wp is not None # line 331
assert self._interval_global_rng_state_wp is not None # line 337
assert self._scratch_term_mask_wp is not None # line 370Python silently strips all assert statements when running with the -O (optimize) flag, which is common in production training scripts. These are the exact same patterns that were already fixed in this PR (in events.py and terminations.py) by replacing them with explicit raise. The inconsistency is easy to miss since these live in the manager implementation while the prior fixes were in MDP term functions.
The conditions themselves are always True under normal construction order (all fields are set in __init__ before _apply_* can be called), so these won't trigger in practice. However, if initialization order changes during future refactoring, the failure will be a cryptic TypeError/AttributeError deep inside a Warp kernel rather than a clear message at the assertion site.
For consistency with the rest of this PR, consider replacing with explicit RuntimeError raises — e.g.:
if self._env.rng_state_wp is None:
raise RuntimeError("EventManager._apply_interval: env.rng_state_wp is not initialized.")The same applies to line 455 (_prepare_terms), which is a one-time init path but is still an assert.
There was a problem hiding this comment.
Done — all asserts already replaced with raise RuntimeError in the latest push.
0b71e14 to
028a7fd
Compare
|
@greptileai Review |
| self._cfg = self._load_cfg(cfg_source) | ||
| print("[INFO] ManagerCallSwitch configuration:") | ||
| print(f" - {self.DEFAULT_KEY}: {self._cfg[self.DEFAULT_KEY]}") | ||
| for manager_name in self.MANAGER_NAMES: | ||
| mode = int(self.get_mode_for_manager(manager_name)) | ||
| cap = self._max_modes.get(manager_name) | ||
| cap_str = f" (cap={cap})" if cap is not None else "" | ||
| print(f" - {manager_name}: {mode}{cap_str}") |
There was a problem hiding this comment.
"Scene" stages invisible in configuration printout despite having a hard cap
MAX_MODE_OVERRIDES enforces Scene → WARP_NOT_CAPTURED (mode 1), which is a critical constraint documented in the PR description. However, "Scene" is not in MANAGER_NAMES, so neither the default mode for Scene stages nor the cap ever appears in the printed configuration block. A user reviewing logs to understand why their Scene_write_data_to_sim stages aren't captured will find no evidence of the cap.
Consider appending the overrides section to the printout:
# After the manager-names loop:
if self._max_modes:
print(" [MAX_MODE_OVERRIDES]")
for name, max_mode in self._max_modes.items():
resolved = int(self.get_mode_for_manager(name))
print(f" - {name}: {resolved} (cap={max_mode})")There was a problem hiding this comment.
Done. Added "Scene" to MANAGER_NAMES so the cap is visible in the init printout.
| self.recorder_manager.record_post_reset(env_ids) | ||
|
|
||
| # compute observations | ||
| self.obs_buf = self.observation_manager.compute(update_history=True) |
There was a problem hiding this comment.
reset() bypasses _manager_call_switch for observation compute
ManagerBasedEnvWarp.reset() (inherited by ManagerBasedRLEnvWarp) calls observation_manager.compute() directly, bypassing the ManagerCallSwitch. This means the reset-time observation compute is always eager — even when ObservationManager is configured for WARP_CAPTURED (mode 2). The step() path correctly uses call_stage, but reset() does not.
This is inconsistent and could lead to subtle differences between reset and step time observations (e.g., no CUDA-graph warmup on the first reset). Wrapping this call in call_stage (or at minimum adding a code comment explaining the intentional bypass) would prevent confusion for future maintainers.
There was a problem hiding this comment.
Acknowledged — reset() calls observation_manager.compute() directly. This is the base env reset() path, called once at initialization. The RL env step() routes through ManagerCallSwitch correctly. Low risk, will align in a follow-up.
028a7fd to
5108ca8
Compare
|
@greptileai Review |
| def joint_vel_l1(env: ManagerBasedRLEnv, out, asset_cfg: SceneEntityCfg) -> None: | ||
| """Penalize joint velocities on the articulation using an L1-kernel. Writes into ``out``.""" | ||
| asset: Articulation = env.scene[asset_cfg.name] | ||
| wp.launch( | ||
| kernel=_sum_abs_masked_kernel, | ||
| dim=env.num_envs, | ||
| inputs=[asset.data.joint_vel, asset_cfg.joint_mask, out], | ||
| device=env.device, | ||
| ) |
There was a problem hiding this comment.
Missing joint_mask null guard before Warp kernel launch
joint_vel_l1 passes asset_cfg.joint_mask directly to the Warp kernel without checking whether it is None. If a caller passes a plain isaaclab.managers.SceneEntityCfg (which lacks joint_mask), this will crash inside the Warp runtime with an unhelpful error about a null array pointer rather than a clear message at the call site.
The same pattern was already fixed in terminations.py for joint_pos_out_of_manual_limit (lines 75-83), which explicitly checks asset_cfg.joint_mask is None and raises a descriptive ValueError. The same guard is needed here for consistency:
| def joint_vel_l1(env: ManagerBasedRLEnv, out, asset_cfg: SceneEntityCfg) -> None: | |
| """Penalize joint velocities on the articulation using an L1-kernel. Writes into ``out``.""" | |
| asset: Articulation = env.scene[asset_cfg.name] | |
| wp.launch( | |
| kernel=_sum_abs_masked_kernel, | |
| dim=env.num_envs, | |
| inputs=[asset.data.joint_vel, asset_cfg.joint_mask, out], | |
| device=env.device, | |
| ) | |
| def joint_vel_l1(env: ManagerBasedRLEnv, out, asset_cfg: SceneEntityCfg) -> None: | |
| """Penalize joint velocities on the articulation using an L1-kernel. Writes into ``out``.""" | |
| asset: Articulation = env.scene[asset_cfg.name] | |
| if asset_cfg.joint_mask is None: | |
| raise ValueError( | |
| f"joint_vel_l1 requires SceneEntityCfg with resolved joint_mask, " | |
| f"but got None for asset '{asset_cfg.name}'." | |
| ) | |
| wp.launch( | |
| kernel=_sum_abs_masked_kernel, | |
| dim=env.num_envs, | |
| inputs=[asset.data.joint_vel, asset_cfg.joint_mask, out], | |
| device=env.device, | |
| ) |
| def step(self, action: torch.Tensor) -> tuple[VecEnvObs, dict]: | ||
| """Execute one time-step of the environment's dynamics. | ||
|
|
||
| The environment steps forward at a fixed time-step, while the physics simulation is | ||
| decimated at a lower time-step. This is to ensure that the simulation is stable. These two | ||
| time-steps can be configured independently using the :attr:`ManagerBasedEnvCfg.decimation` (number of | ||
| simulation steps per environment step) and the :attr:`ManagerBasedEnvCfg.sim.dt` (physics time-step) | ||
| parameters. Based on these parameters, the environment time-step is computed as the product of the two. | ||
|
|
||
| Args: | ||
| action: The actions to apply on the environment. Shape is (num_envs, action_dim). | ||
|
|
||
| Returns: | ||
| A tuple containing the observations and extras. | ||
| """ | ||
| # process actions | ||
| action_device = action.to(self.device) | ||
| if action_device.dtype != torch.float32: | ||
| action_device = action_device.float() | ||
| if not action_device.is_contiguous(): | ||
| action_device = action_device.contiguous() | ||
| action_wp = wp.from_torch(action_device, dtype=wp.float32) | ||
| self.action_manager.process_action(action_wp) | ||
|
|
||
| self.recorder_manager.record_pre_step() | ||
|
|
||
| # check if we need to do rendering within the physics loop | ||
| # note: checked here once to avoid multiple checks within the loop | ||
| is_rendering = bool(self.sim.settings.get("/isaaclab/visualizer")) or self.sim.settings.get( | ||
| "/isaaclab/render/rtx_sensors" | ||
| ) | ||
|
|
||
| # perform physics stepping | ||
| for _ in range(self.cfg.decimation): | ||
| self._sim_step_counter += 1 | ||
| # set actions into buffers | ||
| self.action_manager.apply_action() | ||
| # set actions into simulator | ||
| self.scene.write_data_to_sim() | ||
| # simulate | ||
| self.sim.step(render=False) | ||
| # render between steps only if the GUI or an RTX sensor needs it | ||
| # note: we assume the render interval to be the shortest accepted rendering interval. | ||
| # If a camera needs rendering at a faster frequency, this will lead to unexpected behavior. | ||
| if self._sim_step_counter % self.cfg.sim.render_interval == 0 and is_rendering: | ||
| self.sim.render() | ||
| # update buffers at sim dt | ||
| self.scene.update(dt=self.physics_dt) | ||
|
|
||
| # post-step: step interval event | ||
| if "interval" in self.event_manager.available_modes: | ||
| self.event_manager.apply(mode="interval", dt=self.step_dt) | ||
|
|
||
| # -- compute observations | ||
| self.obs_buf = self.observation_manager.compute(update_history=True) | ||
| self.recorder_manager.record_post_step() | ||
|
|
||
| # return observations and extras | ||
| return self.obs_buf, self.extras |
There was a problem hiding this comment.
Base step() bypasses ManagerCallSwitch entirely
ManagerBasedEnvWarp.step() (the base class, non-RL) calls action_manager.process_action, action_manager.apply_action, scene.write_data_to_sim, event_manager.apply, and observation_manager.compute all directly, without routing through self._manager_call_switch. This means any ManagerCallSwitch configuration (including WARP_CAPTURED mode) has no effect when this step() path is taken.
Since ManagerBasedRLEnvWarp overrides step() and does use ManagerCallSwitch, this only affects direct users of ManagerBasedEnvWarp. However, given the class name suggests Warp-mode support, a developer who instantiates it directly expecting WARP_CAPTURED behavior would be surprised to find it never captures. A comment explaining this limitation would prevent confusion:
def step(self, action: torch.Tensor) -> tuple[VecEnvObs, dict]:
"""Execute one time-step of the environment's dynamics.
Note:
This base-class step runs all manager calls eagerly and does **not**
route through :attr:`_manager_call_switch`. CUDA graph capture is only
available via :class:`ManagerBasedRLEnvWarp` which overrides this method.
...|
Hi, @ooctipus. This PR still needs some attention to get merged. |
99edd67 to
8c0df8d
Compare
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — PR #4829
Summary
Cherry-pick of Warp manager-based env infrastructure from dev/newton. This is a large PR (48 files, ~8k lines) adding experimental Warp-first manager implementations, MDP terms, utilities, and a Cartpole reference task.
Overall Assessment: ✅ Approve with minor suggestions
The PR is well-structured: new code lives entirely under isaaclab_experimental (no stable code disruption), the ManagerCallSwitch design allows per-manager fallback to stable implementations, and the WarpGraphCache warm-up-before-capture pattern is a solid improvement. The cherry-pick is clean with no conflict artifacts visible.
CI Status
- ✅
labelerpassed ⚠️ No pre-commit/lint/ruff checks found for the head commit — suggest verifying linting passes locally.
Key Findings
Architecture (positive)
- Clean separation: all experimental code in
isaaclab_experimental, stable code only gets 2 minor comment/docstring cleanups. ManagerCallSwitchwith env-var-driven mode selection (MANAGER_CALL_CONFIG) is flexible for benchmarking stable vs warp vs captured paths.WarpGraphCachenow does eager warm-up before capture — correctly handles first-call allocations outside capture context.- RL library wrappers (rsl_rl, rl_games, sb3, skrl) updated uniformly with graceful
ImportErrorhandling.
Potential Issues (see inline comments)
_reset_idxsignature mismatch inManagerBasedRLEnvWarp— the baseManagerBasedEnvWarp._reset_idxtakesenv_idsonly, but the RL env override addsenv_maskkwarg. The basereset()calls_reset_idx(env_ids)withoutenv_mask, meaning the mask codepath in the RL env is never exercised fromreset().resolve_1d_maskallocateswp.arrayfrom Python list when ids are not torch/warp — this defeats capture-safety for theslice→list(range(...))path.- Observation dim inference in
_infer_term_dim_scalaris fragile — falls back towp.to_torch(asset.data.joint_pos).shape[1]which does a warp→torch conversion at init time. episode_length_bufsetter usesself._episode_length_buf[:] = valuewhich triggers a full copy — fine for correctness but worth documenting the intent (preserve warp linkage).recorder_managernot reset with mask — in_reset_idxofManagerBasedRLEnvWarp, recorder_manager getsenv_ids(torch) while all other managers getenv_mask(warp). This inconsistency could cause issues if recorder_manager is ever captured.NoiseModelCfg.rng_state_wpis set by side-effect during_prepare_terms— this coupling between observation manager and noise config is implicit and could break if noise configs are shared across managers.
| inputs=[seed_val, self.rng_state_wp], | ||
| device=self.device, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Minor: RNG seed -1 when cfg.seed is None
When cfg.seed is None, seed_val is set to -1 which is passed to wp.rand_init(seed, env_id). The wp.rand_init function may handle negative seeds differently across Warp versions. Consider defaulting to 0 or 42 instead of -1, or documenting the intent.
seed_val = int(self.cfg.seed) if self.cfg.seed is not None else -1| *, | ||
| env_mask: wp.array | None = None, | ||
| ): | ||
| """Reset environments based on specified indices. |
There was a problem hiding this comment.
Signature mismatch: _reset_idx in RL env vs base env
The base ManagerBasedEnvWarp._reset_idx(self, env_ids) has no env_mask parameter, but this override adds env_mask as a keyword argument. When the base class reset() method calls self._reset_idx(env_ids), the env_mask kwarg is never passed, so the mask-based codepath in this method is never exercised from reset(). This is fine for now (base reset() only calls from external API), but worth a # NOTE to prevent confusion.
| "mode": "reset", | ||
| "env_mask_wp": env_mask, | ||
| "global_env_step_count": self._global_env_step_count_wp, | ||
| }, |
There was a problem hiding this comment.
Inconsistency: recorder_manager.reset gets env_ids (torch) while others get env_mask (warp)
All other manager resets in this method use env_mask (warp boolean mask), but recorder_manager.reset(env_ids=env_ids) still uses torch integer indices. This is noted as intentional (recorder is still env_ids-based), but it means recorder cannot be captured. Consider adding a # TODO to track migration.
| return scratch_mask | ||
| ids_wp = ids | ||
| else: | ||
| if len(ids) == 0: |
There was a problem hiding this comment.
Capture-safety: wp.array(ids, ...) allocates during resolve_1d_mask
When ids is a Python list (e.g. from slice → list(range(...))), this line allocates a new wp.array, which is not safe during CUDA graph capture. The function comment says "No allocations happen inside this function" but that's not true for this path.
ids_wp = wp.array(ids, dtype=wp.int32, device=device)Consider either:
- Pre-allocating a scratch
idsbuffer alongsidescratch_mask, or - Documenting that the
slice→listpath is not capture-safe.
| # Guard: concat groups must use the Warp fast-path (standard concat dim, no history). | ||
| if self._group_obs_concatenate[group_name] and not can_use_group_buffer: | ||
| raise ValueError( | ||
| f"Observation group '{group_name}' is concatenated but cannot use the Warp" |
There was a problem hiding this comment.
Fragile dim inference: wp.to_torch conversion at init
The fallback wp.to_torch(asset.data.joint_pos).shape[1] performs a warp→torch conversion just to get a shape. Consider using asset.data.joint_pos.shape[1] directly (Warp arrays have a .shape attribute), which avoids the conversion overhead:
return int(asset.data.joint_pos.shape[1])| # if scale is set, check if single float or tuple | ||
| if term_cfg.scale is not None: | ||
| if not isinstance(term_cfg.scale, (float, int, tuple)): | ||
| raise TypeError( |
There was a problem hiding this comment.
History + Warp: NotImplementedError raised at init
The raise NotImplementedError("History reshaping is not implemented yet for warp.") at line 695 will prevent any observation term with history_length > 0 from being used. This is fine as a guard, but the code above it (lines 681-694) still executes and creates a CircularBuffer that will never be used. Consider moving the raise before the buffer creation to avoid wasted allocation.
| self._results[stage] = result | ||
| wp.capture_launch(self._graphs[stage]) | ||
| return self._results[stage] | ||
| if graph is not None: |
There was a problem hiding this comment.
Good improvement: eager warm-up before capture
The addition of fn(*args, **kwargs) before wp.ScopedCapture() is the right approach — it flushes first-call allocations (hasattr guards, lazy dtype casts) outside the capture context. This matches the pattern recommended by NVIDIA for CUDA graph capture.
One edge case: if fn has side effects that should only happen once (e.g. incrementing a counter), the warm-up + capture run will execute them twice. The PR description mentions this is intentional for the current use case, but worth a docstring note.
|
|
||
| @wp.kernel | ||
| def _reset_joints_by_offset_kernel( | ||
| env_mask: wp.array(dtype=wp.bool), |
There was a problem hiding this comment.
RNG state mutation: single thread per env is correct for race-freedom
Good design choice — using 1 thread per env with a sequential loop over joints avoids RNG state races. The comment documents this well. However, for articulations with many joints (e.g. humanoids with 20+ joints), this serialization could become a bottleneck. Consider noting this as a known limitation for future optimization (e.g. per-env per-joint RNG states).
| # store weighted reward rate (matches old: value/dt) | ||
| step_reward[env_id, term_idx] = weighted | ||
| val = weighted * dt | ||
| total += val |
There was a problem hiding this comment.
_reward_finalize kernel: step_reward uses weighted but not dt-scaled values
The kernel stores weighted = raw * weight into step_reward (the "reward rate"), but then val = weighted * dt goes into episode_sums and total. The comment says // store weighted reward rate (matches old: value/dt) but the stored value is actually raw * weight, not raw * weight / dt. The naming/comment could be clarified to avoid confusion — it's weighted_per_step, not a rate.
| # post-step: | ||
| # -- update env counters (used for curriculum generation) | ||
| self.episode_length_buf += 1 # step in current episode (per env) | ||
| self.common_step_counter += 1 # total step (common for all envs) |
There was a problem hiding this comment.
is_rendering check differs from base class
Base ManagerBasedEnvWarp.step() uses:
is_rendering = bool(self.sim.settings.get("/isaaclab/visualizer")) or self.sim.settings.get("/isaaclab/render/rtx_sensors")But this RL env uses:
is_rendering = self.sim.is_renderingThe commit message mentions "align warp env rendering checks with stable env: use sim.is_rendering". The base class ManagerBasedEnvWarp.step() should likely be updated too for consistency, or the difference documented.
|
@greptileai Review |
Add experimental warp-compatible manager implementations, MDP terms, utilities (buffers, modifiers, noise, warp kernels), ManagerCallSwitch for eager/captured dispatch, and manager-based env orchestration. Includes RL library wrapper updates (rsl_rl, rl_games, sb3, skrl) to accept warp env types, and minor stable fixes (settings_manager RuntimeError handling, observation_manager comment cleanup).
Add an experimental manager-based Cartpole environment using the warp manager infrastructure as a reference task for testing and benchmarking.
WarpGraphCache now runs an eager warm-up call before graph capture so that first-call initialisation (allocations, hasattr guards, dtype casts) executes outside the capture context. Also align warp env rendering checks with stable env: use sim.is_rendering for the step-loop check and cache has_rtx_sensors at init for rerender-on-reset and render-method guards.
8c0df8d to
cf951ac
Compare
…on (#4945) ## Summary * Cherry-picks [Newton] Migrate more envs and mdps to warp (#4690) onto develop * Cherry-picks [Newton] Add capture safety guards and fix WrenchComposer stale COM pose (#4779) onto develop ### Changes included - Warp-first MDP terms (observations, rewards, events, terminations, actions) for manager-based envs - Tested warp env configs: Ant, Humanoid, Cartpole, locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), Franka/UR10 reach - ManagerCallSwitch max_mode cap and scene capture config - MDP kernels made graph-capturable with consolidated test infrastructure - capture_unsafe safety guards on lazy-evaluated derived properties in articulation/rigid_object data - WrenchComposer fix: use fresh COM pose buffers instead of stale cached link poses ### Dropped - G1-29-DOF warp env (Isaac-Velocity-Flat-G1-Warp-v1): removed because the stable g1_29_dofs task config does not exist on develop (only on dev/newton). Warp env PRs should only add warp frontends for envs that already exist in the stable package. ## Dependencies Must be merged **after** these PRs (in order): 1. #4905 (merged) 2. #4829 ## Validated base Validated against develop at 7588fa9. ## Test plan - [x] Run warp env training sweep across all manager-based env configs (14/14 pass, mode=2, 4096 envs, 300 iters) - [ ] Run test_mdp_warp_parity.py and test_mdp_warp_parity_new_terms.py - [ ] Run test_action_warp_parity.py - [ ] Verify WrenchComposer COM pose is fresh (not stale) during graph replay --------- Co-authored-by: Antoine Richard <antoiner@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>
…on (isaac-sim#4945) ## Summary * Cherry-picks [Newton] Migrate more envs and mdps to warp (isaac-sim#4690) onto develop * Cherry-picks [Newton] Add capture safety guards and fix WrenchComposer stale COM pose (isaac-sim#4779) onto develop ### Changes included - Warp-first MDP terms (observations, rewards, events, terminations, actions) for manager-based envs - Tested warp env configs: Ant, Humanoid, Cartpole, locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), Franka/UR10 reach - ManagerCallSwitch max_mode cap and scene capture config - MDP kernels made graph-capturable with consolidated test infrastructure - capture_unsafe safety guards on lazy-evaluated derived properties in articulation/rigid_object data - WrenchComposer fix: use fresh COM pose buffers instead of stale cached link poses ### Dropped - G1-29-DOF warp env (Isaac-Velocity-Flat-G1-Warp-v1): removed because the stable g1_29_dofs task config does not exist on develop (only on dev/newton). Warp env PRs should only add warp frontends for envs that already exist in the stable package. ## Dependencies Must be merged **after** these PRs (in order): 1. isaac-sim#4905 (merged) 2. isaac-sim#4829 ## Validated base Validated against develop at 7588fa9. ## Test plan - [x] Run warp env training sweep across all manager-based env configs (14/14 pass, mode=2, 4096 envs, 300 iters) - [ ] Run test_mdp_warp_parity.py and test_mdp_warp_parity_new_terms.py - [ ] Run test_action_warp_parity.py - [ ] Verify WrenchComposer COM pose is fresh (not stale) during graph replay --------- Co-authored-by: Antoine Richard <antoiner@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>
…on (isaac-sim#4945) ## Summary * Cherry-picks [Newton] Migrate more envs and mdps to warp (isaac-sim#4690) onto develop * Cherry-picks [Newton] Add capture safety guards and fix WrenchComposer stale COM pose (isaac-sim#4779) onto develop ### Changes included - Warp-first MDP terms (observations, rewards, events, terminations, actions) for manager-based envs - Tested warp env configs: Ant, Humanoid, Cartpole, locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), Franka/UR10 reach - ManagerCallSwitch max_mode cap and scene capture config - MDP kernels made graph-capturable with consolidated test infrastructure - capture_unsafe safety guards on lazy-evaluated derived properties in articulation/rigid_object data - WrenchComposer fix: use fresh COM pose buffers instead of stale cached link poses ### Dropped - G1-29-DOF warp env (Isaac-Velocity-Flat-G1-Warp-v1): removed because the stable g1_29_dofs task config does not exist on develop (only on dev/newton). Warp env PRs should only add warp frontends for envs that already exist in the stable package. ## Dependencies Must be merged **after** these PRs (in order): 1. isaac-sim#4905 (merged) 2. isaac-sim#4829 ## Validated base Validated against develop at 7588fa9. ## Test plan - [x] Run warp env training sweep across all manager-based env configs (14/14 pass, mode=2, 4096 envs, 300 iters) - [ ] Run test_mdp_warp_parity.py and test_mdp_warp_parity_new_terms.py - [ ] Run test_action_warp_parity.py - [ ] Verify WrenchComposer COM pose is fresh (not stale) during graph replay --------- Co-authored-by: Antoine Richard <antoiner@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>
## Summary * Add warp environment overview doc (`warp-environments.rst`) * Add stable-to-warp migration guide (`warp-env-migration.rst`) * Align step timer setup across all 4 env base classes (stable + warp, direct + manager) ## Dependencies Must be merged **after** (validated against develop at `9720047`): 1. #4829 2. #4945 ## Status Performance comparison data included in docs.
## Summary * Add warp environment overview doc (`warp-environments.rst`) * Add stable-to-warp migration guide (`warp-env-migration.rst`) * Align step timer setup across all 4 env base classes (stable + warp, direct + manager) ## Dependencies Must be merged **after** (validated against develop at `9720047`): 1. isaac-sim#4829 2. isaac-sim#4945 ## Status Performance comparison data included in docs.
Summary
Cherry-pick of warp manager-based env infrastructure from
dev/newton, refactored fordevelop.isaaclab_experimentalActionManager,ObservationManager,EventManager,CommandManager,TerminationManager,RewardManager) with Warp kernel execution and CUDA graphcapture support.
ManagerCallSwitchutility for per-manager eager/captured dispatch, configured viaMANAGER_CALL_CONFIGenv var.ManagerBasedEnvWarpandManagerBasedRLEnvWarporchestration env classes.SceneEntityCfgwith warp joint mask/ids for kernel-level joint selection.ManagerBasefor automaticSceneEntityCfgresolution.isaaclab_tasks_experimentalIsaac-Cartpole-Warp-v0task as reference environment for warp manager-based workflow.isaaclab_rlManagerBasedRLEnvWarpandDirectRLEnvWarp.isaaclabSettingsManagerto catchRuntimeErrorwhen carb is unavailable.ObservationManager.Dependencies
Must be merged after:
Validated base
Validated against develop at
7588fa9ed5f.Known limitations
Scene_write_data_to_simcapped to mode=1 (eager) viaMAX_MODE_OVERRIDES— articulation_apply_actuator_modeluseswp.to_torch + torch indexing, not CUDA graph capture-safe.Test plan
Isaac-Cartpole-Warp-v0training (4096 envs, 300 iters, mode=2): converges (reward 4.95, ep_len 300)