[Exp] Cherry-pick manager-based warp env infrastructure from dev/newton by hujc7 · Pull Request #4829 · isaac-sim/IsaacLab

hujc7 · 2026-03-05T10:46:01Z

Summary

Cherry-pick of warp manager-based env infrastructure from dev/newton, refactored for develop.

`isaaclab_experimental`

Added warp-compatible manager implementations (ActionManager, ObservationManager, EventManager,
CommandManager, TerminationManager, RewardManager) with Warp kernel execution and CUDA graph
capture support.
Added ManagerCallSwitch utility for per-manager eager/captured dispatch, configured via
MANAGER_CALL_CONFIG env var.
Added ManagerBasedEnvWarp and ManagerBasedRLEnvWarp orchestration env classes.
Added warp MDP terms (observations, rewards, terminations, events, joint actions).
Added utility modules: buffers (circular buffer), modifiers, noise models, warp kernels/helpers.
Added experimental SceneEntityCfg with warp joint mask/ids for kernel-level joint selection.
Generalized configclass default materialization in ManagerBase for automatic SceneEntityCfg resolution.

`isaaclab_tasks_experimental`

Added Isaac-Cartpole-Warp-v0 task as reference environment for warp manager-based workflow.

`isaaclab_rl`

Updated rsl_rl, rl_games, sb3, skrl wrappers to accept ManagerBasedRLEnvWarp and DirectRLEnvWarp.

`isaaclab`

Fixed SettingsManager to catch RuntimeError when carb is unavailable.
Minor comment cleanup in ObservationManager.

Dependencies

Must be merged after:

[Exp] Cherry-pick direct warp envs from dev/newton #4905 (merged)

Validated base

Validated against develop at 7588fa9ed5f.

Known limitations

Scene_write_data_to_sim capped to mode=1 (eager) via MAX_MODE_OVERRIDES — articulation
_apply_actuator_model uses wp.to_torch + torch indexing, not CUDA graph capture-safe.

Test plan

Isaac-Cartpole-Warp-v0 training (4096 envs, 300 iters, mode=2): converges (reward 4.95, ep_len 300)

hujc7 · 2026-03-05T10:46:49Z

@greptile review

hujc7 · 2026-03-05T10:47:25Z

WIP but put in review for bot rebiew.

greptile-apps · 2026-03-05T10:53:20Z

Greptile Summary

This PR cherry-picks and refactors the Warp manager-based RL environment infrastructure from dev/newton into develop, introducing Warp-compatible manager implementations (ActionManager, ObservationManager, EventManager, RewardManager, TerminationManager), CUDA graph capture/replay via ManagerCallSwitch and WarpGraphCache, orchestration classes ManagerBasedEnvWarp/ManagerBasedRLEnvWarp, Warp MDP terms, and the Isaac-Cartpole-Warp-v0 reference task.

Most issues raised in prior review rounds have been addressed:

MAX_MODE_OVERRIDES now applied on both the default-config and env-var paths in _load_cfg
TIMER_ENABLED_STEP correctly gates the outer step() @Timer decorator
STABLE mode guard raises ValueError with an actionable message
WarpGraphCache performs an eager warm-up run before capture so first-call GPU execution is correct
assert → raise RuntimeError/ValueError fixes applied in events.py, terminations.py, event_manager.py, and manager_based_rl_env_warp.py
Scene added to MANAGER_NAMES so its capped mode is visible in the init printout

Two remaining issues were identified:

envs/mdp/rewards.py — joint_vel_l1 still passes asset_cfg.joint_mask directly to the Warp kernel without checking for None. The identical guard was applied to terminations.py earlier in this PR but not here.
cartpole/mdp/rewards.py — joint_pos_target_l2 uses a bare assert that is silently disabled with -O and will raise a confusing AttributeError (rather than a clear ValueError) if joint_mask is None.

Confidence Score: 4/5

Safe to merge after fixing the two joint_mask guard omissions in the MDP reward terms.
The vast majority of prior review concerns are resolved. Two P1 issues remain: a missing null guard for joint_mask in joint_vel_l1 (same pattern already fixed in terminations.py) and an unsafe assert in the cartpole task's joint_pos_target_l2 that will produce a confusing AttributeError if a wrong config is passed. Both are small, targeted fixes with clear solutions.
source/isaaclab_experimental/isaaclab_experimental/envs/mdp/rewards.py and source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/manager_based/classic/cartpole/mdp/rewards.py

Important Files Changed

Filename	Overview
source/isaaclab_experimental/isaaclab_experimental/envs/mdp/rewards.py	Missing `joint_mask` null guard in `joint_vel_l1` before Warp kernel launch — the same guard applied in `terminations.py` was not applied here.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/manager_based/classic/cartpole/mdp/rewards.py	`joint_pos_target_l2` uses a bare `assert` for runtime validation that is silently stripped with `-O` and will also crash with `AttributeError` if `joint_mask` is `None`.
source/isaaclab_experimental/isaaclab_experimental/utils/manager_call_switch.py	Per-manager call switch with STABLE/WARP_NOT_CAPTURED/WARP_CAPTURED modes; prior issues (STABLE crash guard, Scene cap visibility, default-config MAX_MODE_OVERRIDES) all addressed.
source/isaaclab_experimental/isaaclab_experimental/utils/warp_graph_cache.py	CUDA graph cache with eager warm-up + capture pattern; first-call execution issue resolved via warm-up run before capture.
source/isaaclab_experimental/isaaclab_experimental/envs/manager_based_rl_env_warp.py	RL env Warp entry point; timer `enable=TIMER_ENABLED_STEP` fix confirmed, assert→RuntimeError fix confirmed, action buffer stable-pointer design correct.
source/isaaclab_experimental/isaaclab_experimental/envs/manager_based_env_warp.py	Base Warp env; MAX_MODE_OVERRIDES application on both config paths confirmed, shared ENV_MASK latent issue acknowledged and tracked.
source/isaaclab_experimental/isaaclab_experimental/managers/reward_manager.py	Warp-compatible reward manager with single-kernel finalize; buffer layout (num_terms×num_envs for term_outs, num_envs×num_terms for step_reward) is consistent throughout.
source/isaaclab_experimental/isaaclab_experimental/managers/event_manager.py	assert→raise RuntimeError fixes confirmed in `_apply_interval` and `_apply_reset`; per-term captured event dispatch looks correct.
source/isaaclab_experimental/isaaclab_experimental/envs/mdp/terminations.py	`joint_mask` null guard and shape check correctly added via explicit `raise ValueError`; consistent with the fix pattern requested in prior review.

Sequence Diagram

sequenceDiagram
    participant RL as RL Library
    participant Env as ManagerBasedRLEnvWarp
    participant MCS as ManagerCallSwitch
    participant WGC as WarpGraphCache
    participant Mgr as Warp Managers

    RL->>Env: step(action)
    Env->>Env: wp.copy(_action_in_wp)
    Env->>MCS: call_stage("ActionManager_process_action")
    MCS->>Mgr: WARP_CAPTURED → WarpGraphCache.capture_or_replay()
    Note over WGC: 1st call: warm-up + capture<br/>2nd+ call: wp.capture_launch()

    loop decimation
        Env->>MCS: call_stage("ActionManager_apply_action")
        Env->>MCS: call_stage("Scene_write_data_to_sim")
        Note over MCS: Scene capped at WARP_NOT_CAPTURED
        Env->>Env: sim.step()
        Env->>Env: scene.update()
    end

    Env->>MCS: call_stage("TerminationManager_compute")
    Env->>MCS: call_stage("RewardManager_compute")
    Env->>Env: _reset_idx(reset_env_ids)
    Env->>MCS: call_stage("EventManager_apply_interval")
    Env->>MCS: call_stage("ObservationManager_compute_update_history")
    Env-->>RL: obs, reward, terminated, truncated, extras

_{Reviews (10): Last reviewed commit: "Add warm-up before CUDA graph capture an..." | Re-trigger Greptile}

greptile-apps · 2026-03-05T10:53:23Z

+    def _load_cfg(self, cfg_source: str | None) -> dict[str, int]:
+        if cfg_source is not None and not isinstance(cfg_source, str):
+            raise TypeError(f"cfg_source must be a string or None, got: {type(cfg_source)}")
+        if cfg_source is None or cfg_source.strip() == "":
+            return dict(self.DEFAULT_CONFIG)


MAX_MODE_OVERRIDES not applied for default config.

When cfg_source is None (the common case without --manager_call_config), _load_cfg returns early at line 162 without applying MAX_MODE_OVERRIDES. The PR description and class comment both state that Scene_write_data_to_sim must be capped at WARP_NOT_CAPTURED (mode 1) because _apply_actuator_model is not CUDA-graph-capture-safe. With the default config {"default": 2}, get_mode_for_manager("Scene") will return 2, bypassing the cap entirely and allowing unsafe CUDA graph capture.

The fix is to apply MAX_MODE_OVERRIDES on the default-config path:

Suggested change

def _load_cfg(self, cfg_source: str | None) -> dict[str, int]:

if cfg_source is not None and not isinstance(cfg_source, str):

raise TypeError(f"cfg_source must be a string or None, got: {type(cfg_source)}")

if cfg_source is None or cfg_source.strip() == "":

return dict(self.DEFAULT_CONFIG)

def _load_cfg(self, cfg_source: str | None) -> dict[str, int]:

if cfg_source is not None and not isinstance(cfg_source, str):

raise TypeError(f"cfg_source must be a string or None, got: {type(cfg_source)}")

if cfg_source is None or cfg_source.strip() == "":

cfg = dict(self.DEFAULT_CONFIG)

# Apply caps even for the default config

default_mode = cfg[self.DEFAULT_KEY]

for name, max_mode in self.MAX_MODE_OVERRIDES.items():

resolved = cfg.get(name, default_mode)

if resolved > max_mode:

cfg[name] = max_mode

return cfg

Done. Restructured _load_cfg so both paths (default config and parsed env var) flow into the shared MAX_MODE_OVERRIDES application at the end.

greptile-apps · 2026-03-05T10:53:24Z

+    @Timer(name="env_step", msg="Step took:", enable=True, format="us")
    def step(self, action: torch.Tensor) -> VecEnvStepReturn:


@Timer(enable=True) hardcoded to always-on for stable step().

The decorator is hardcoded to enable=True, meaning it will print timing output to stdout on every environment step during training. The module-level constant TIMER_ENABLED_STEP = False was clearly introduced to gate timing output globally, but this decorator bypasses it. All inner context-manager timers correctly respect TIMER_ENABLED_STEP (lines 184, 191, 209, etc.), making this inconsistency appear unintentional.

This will generate extremely verbose and undesirable output for all users of the stable environment.

Suggested change

@Timer(name="env_step", msg="Step took:", enable=True, format="us")

def step(self, action: torch.Tensor) -> VecEnvStepReturn:

@Timer(name="env_step", msg="Step took:", enable=TIMER_ENABLED_STEP, format="us")

def step(self, action: torch.Tensor) -> VecEnvStepReturn:

Not in scope — this comment is about the stable env (source/isaaclab/isaaclab/envs/manager_based_rl_env.py), which is not modified in this PR.

hujc7 · 2026-03-09T08:27:15Z

@greptileai Review

greptile-apps · 2026-03-09T08:35:28Z

+        self.reset_terminated = self.termination_manager.terminated
+        self.reset_time_outs = self.termination_manager.time_outs
+
+    @Timer(name="env_step", msg="Step took:", enable=True, format="us")


Outer step() timer always enabled, ignores TIMER_ENABLED_STEP

The @Timer decorator on step() has enable=True hardcoded, which means the outer step timer prints "Step took: …" to stdout on every environment step during training — even when neither DEBUG_TIMER_STEP nor DEBUG_TIMERS env vars are set.

The module-level constant TIMER_ENABLED_STEP was specifically introduced to gate this output (DEBUG_TIMER_STEP = os.environ.get("DEBUG_TIMER_STEP", "0") == "1"), but it is ignored here. All inner timers (lines 230, 265, 274, etc.) correctly respect TIMER_ENABLED_STEP. The inconsistency appears unintentional and will produce extremely verbose training output for all users.

Suggested change

@Timer(name="env_step", msg="Step took:", enable=True, format="us")

@Timer(name="env_step", msg="Step took:", enable=TIMER_ENABLED_STEP, format="us")

Done. Changed to enable=TIMER_ENABLED_STEP.

greptile-apps · 2026-03-09T08:35:29Z

+        """Select call path based on mode, execute, and apply output."""
+        mode = self.get_mode_for_manager(self._manager_name_from_stage(stage))
+        if mode == ManagerCallMode.STABLE:
+            call, result = stable_call, self._run_call(stable_call)


STABLE mode (mode=0) crashes unconditionally — stable_call is never provided by call sites

When mode == ManagerCallMode.STABLE, _dispatch immediately calls self._run_call(stable_call). However, stable_call defaults to None in call_stage, and every call site in manager_based_rl_env_warp.py (e.g. ActionManager_process_action, ActionManager_apply_action, Scene_write_data_to_sim, TerminationManager_compute, RewardManager_compute, ObservationManager_compute_update_history, etc.) omits stable_call entirely.

_run_call(None) will immediately raise TypeError: 'NoneType' object is not subscriptable because it indexes call["fn"]. This means configuring any manager — or "default" — to mode 0 via MANAGER_CALL_CONFIG will crash the training loop, despite STABLE mode being documented as a supported execution path that routes to isaaclab.managers.

A guard should be added so the failure is caught early and the error message is actionable:

if mode == ManagerCallMode.STABLE: if stable_call is None: raise ValueError( f"Stage '{stage}' requested STABLE mode but no stable_call was provided." ) call, result = stable_call, self._run_call(stable_call)

Done. Added a guard that raises ValueError with an actionable message when stable_call is None.

hujc7 · 2026-03-09T09:01:11Z

@greptileai Review

greptile-apps · 2026-03-09T09:08:44Z

+    assert asset_cfg.joint_ids_wp is not None
+    assert env.rng_state_wp is not None


assert for runtime validation is unsafe

These assert statements are used for runtime validation, but assertions are silently skipped when Python is run with the -O (optimize) flag. If a caller passes a regular isaaclab.managers.SceneEntityCfg (which lacks joint_ids_wp), or an env without rng_state_wp, the failure will produce a cryptic AttributeError instead of a clear message.

Replace with explicit exception raises:

Suggested change

assert asset_cfg.joint_ids_wp is not None

assert env.rng_state_wp is not None

if asset_cfg.joint_ids_wp is None:

raise ValueError(

f"reset_joints_by_offset requires an experimental SceneEntityCfg with resolved joint_ids_wp, "

f"but got None for asset '{asset_cfg.name}'. "

"Use isaaclab_experimental.managers.SceneEntityCfg and ensure joint_names are set."

)

if not hasattr(env, "rng_state_wp") or env.rng_state_wp is None:

raise AttributeError(

"reset_joints_by_offset requires env.rng_state_wp to be initialized. "

"Use ManagerBasedEnvWarp or ManagerBasedRLEnvWarp as the base environment."

)

The same applies to reset_joints_by_scale at lines 153–154.

Done. Replaced assert with explicit raise ValueError/AttributeError with actionable messages in both reset_joints_by_offset and reset_joints_by_scale.

greptile-apps · 2026-03-09T09:08:48Z


+    with contextlib.suppress(ImportError):
+        from isaaclab_experimental.envs import DirectRLEnvWarp, ManagerBasedRLEnvWarp


TYPE_CHECKING import inside contextlib.suppress leaves names undefined for type checkers

The contextlib.suppress(ImportError) pattern at runtime is correct, but under TYPE_CHECKING, static analysis tools (mypy, pyright) also evaluate this block literally. If the import fails in that context, DirectRLEnvWarp and ManagerBasedRLEnvWarp will be undefined when the type checker evaluates the unwrapped return annotation on line 132.

The same pattern appears in source/isaaclab_rl/isaaclab_rl/rl_games/rl_games.py, source/isaaclab_rl/isaaclab_rl/sb3.py, and source/isaaclab_rl/isaaclab_rl/skrl.py.

A cleaner alternative is to use a plain try/except ImportError that assigns stub type aliases when the import fails:

if TYPE_CHECKING: try: from isaaclab_experimental.envs import DirectRLEnvWarp, ManagerBasedEnvWarp, ManagerBasedRLEnvWarp except ImportError: from isaaclab.envs import ManagerBasedRLEnv as DirectRLEnvWarp # type: ignore[assignment] from isaaclab.envs import ManagerBasedRLEnv as ManagerBasedEnvWarp # type: ignore[assignment] from isaaclab.envs import ManagerBasedRLEnv as ManagerBasedRLEnvWarp # type: ignore[assignment]

Low priority — not fixing in this PR. The contextlib.suppress pattern is consistent with the other RL wrappers (rl_games, sb3, skrl) and works correctly at runtime.

hujc7 · 2026-03-12T06:09:36Z

Latest changes

Addressed review feedback:

Replaced assert with explicit raise ValueError/AttributeError in reset_joints_by_offset and reset_joints_by_scale (events.py)
Switched Timer import from isaaclab_experimental.utils.timer to isaaclab.utils.timer (3 files) and fixed format= → time_unit= kwarg to match stable API
Aligned manager-based Cartpole newton solver config with stable CartpolePhysicsCfg.newton (removed stale ls_iterations, ls_parallel)
Dropped redundant inhand manipulation commit (already in base PR [Exp] Cherry-pick direct warp envs from dev/newton #4905)

Verification

Cartpole Warp (manager-based), 300 iterations, 4096 envs, newton==1.0.0:

All managers running in mode 2 (WARP_CAPTURED)
Final episode length: 300.00 (solved)
Training time: 73s

hujc7 · 2026-03-12T06:14:14Z

@greptileai Review

hujc7 · 2026-03-13T07:49:08Z

@greptileai Review

greptile-apps · 2026-03-13T07:56:54Z

+        assert self._env.rng_state_wp is not None
+
+        # iterate over all the interval terms (fixed list; captured graph-friendly)
+        for i, term_cfg in enumerate(self._mode_term_cfgs["interval"]):
+            lower, upper = self._interval_term_ranges[i]
+            if self._interval_term_is_global[i]:
+                assert self._interval_global_rng_state_wp is not None


assert in hot-path methods — inconsistent with fixes applied elsewhere in this PR

_apply_interval (and _apply_reset at line 370) still use assert for internal invariant checks:

assert self._env.rng_state_wp is not None # line 331 assert self._interval_global_rng_state_wp is not None # line 337 assert self._scratch_term_mask_wp is not None # line 370

Python silently strips all assert statements when running with the -O (optimize) flag, which is common in production training scripts. These are the exact same patterns that were already fixed in this PR (in events.py and terminations.py) by replacing them with explicit raise. The inconsistency is easy to miss since these live in the manager implementation while the prior fixes were in MDP term functions.

The conditions themselves are always True under normal construction order (all fields are set in __init__ before _apply_* can be called), so these won't trigger in practice. However, if initialization order changes during future refactoring, the failure will be a cryptic TypeError/AttributeError deep inside a Warp kernel rather than a clear message at the assertion site.

For consistency with the rest of this PR, consider replacing with explicit RuntimeError raises — e.g.:

if self._env.rng_state_wp is None: raise RuntimeError("EventManager._apply_interval: env.rng_state_wp is not initialized.")

The same applies to line 455 (_prepare_terms), which is a one-time init path but is still an assert.

Done — all asserts already replaced with raise RuntimeError in the latest push.

hujc7 · 2026-03-14T21:46:14Z

@greptileai Review

greptile-apps · 2026-03-14T21:55:52Z

+        self._cfg = self._load_cfg(cfg_source)
+        print("[INFO] ManagerCallSwitch configuration:")
+        print(f"  - {self.DEFAULT_KEY}: {self._cfg[self.DEFAULT_KEY]}")
+        for manager_name in self.MANAGER_NAMES:
+            mode = int(self.get_mode_for_manager(manager_name))
+            cap = self._max_modes.get(manager_name)
+            cap_str = f" (cap={cap})" if cap is not None else ""
+            print(f"  - {manager_name}: {mode}{cap_str}")


"Scene" stages invisible in configuration printout despite having a hard cap

MAX_MODE_OVERRIDES enforces Scene → WARP_NOT_CAPTURED (mode 1), which is a critical constraint documented in the PR description. However, "Scene" is not in MANAGER_NAMES, so neither the default mode for Scene stages nor the cap ever appears in the printed configuration block. A user reviewing logs to understand why their Scene_write_data_to_sim stages aren't captured will find no evidence of the cap.

Consider appending the overrides section to the printout:

# After the manager-names loop: if self._max_modes: print(" [MAX_MODE_OVERRIDES]") for name, max_mode in self._max_modes.items(): resolved = int(self.get_mode_for_manager(name)) print(f" - {name}: {resolved} (cap={max_mode})")

Done. Added "Scene" to MANAGER_NAMES so the cap is visible in the init printout.

greptile-apps · 2026-03-14T21:55:52Z

+        self.recorder_manager.record_post_reset(env_ids)
+
+        # compute observations
+        self.obs_buf = self.observation_manager.compute(update_history=True)


reset() bypasses _manager_call_switch for observation compute

ManagerBasedEnvWarp.reset() (inherited by ManagerBasedRLEnvWarp) calls observation_manager.compute() directly, bypassing the ManagerCallSwitch. This means the reset-time observation compute is always eager — even when ObservationManager is configured for WARP_CAPTURED (mode 2). The step() path correctly uses call_stage, but reset() does not.

This is inconsistent and could lead to subtle differences between reset and step time observations (e.g., no CUDA-graph warmup on the first reset). Wrapping this call in call_stage (or at minimum adding a code comment explaining the intentional bypass) would prevent confusion for future maintainers.

Acknowledged — reset() calls observation_manager.compute() directly. This is the base env reset() path, called once at initialization. The RL env step() routes through ManagerCallSwitch correctly. Low risk, will align in a follow-up.

hujc7 · 2026-03-15T07:39:59Z

@greptileai Review

greptile-apps · 2026-03-15T07:49:46Z

+def joint_vel_l1(env: ManagerBasedRLEnv, out, asset_cfg: SceneEntityCfg) -> None:
+    """Penalize joint velocities on the articulation using an L1-kernel. Writes into ``out``."""
+    asset: Articulation = env.scene[asset_cfg.name]
+    wp.launch(
+        kernel=_sum_abs_masked_kernel,
+        dim=env.num_envs,
+        inputs=[asset.data.joint_vel, asset_cfg.joint_mask, out],
+        device=env.device,
+    )


Missing joint_mask null guard before Warp kernel launch

joint_vel_l1 passes asset_cfg.joint_mask directly to the Warp kernel without checking whether it is None. If a caller passes a plain isaaclab.managers.SceneEntityCfg (which lacks joint_mask), this will crash inside the Warp runtime with an unhelpful error about a null array pointer rather than a clear message at the call site.

The same pattern was already fixed in terminations.py for joint_pos_out_of_manual_limit (lines 75-83), which explicitly checks asset_cfg.joint_mask is None and raises a descriptive ValueError. The same guard is needed here for consistency:

Suggested change

def joint_vel_l1(env: ManagerBasedRLEnv, out, asset_cfg: SceneEntityCfg) -> None:

"""Penalize joint velocities on the articulation using an L1-kernel. Writes into ``out``."""

asset: Articulation = env.scene[asset_cfg.name]

wp.launch(

kernel=_sum_abs_masked_kernel,

dim=env.num_envs,

inputs=[asset.data.joint_vel, asset_cfg.joint_mask, out],

device=env.device,

)

def joint_vel_l1(env: ManagerBasedRLEnv, out, asset_cfg: SceneEntityCfg) -> None:

"""Penalize joint velocities on the articulation using an L1-kernel. Writes into ``out``."""

asset: Articulation = env.scene[asset_cfg.name]

if asset_cfg.joint_mask is None:

raise ValueError(

f"joint_vel_l1 requires SceneEntityCfg with resolved joint_mask, "

f"but got None for asset '{asset_cfg.name}'."

)

wp.launch(

kernel=_sum_abs_masked_kernel,

dim=env.num_envs,

inputs=[asset.data.joint_vel, asset_cfg.joint_mask, out],

device=env.device,

)

greptile-apps · 2026-03-15T07:49:47Z

+    def step(self, action: torch.Tensor) -> tuple[VecEnvObs, dict]:
+        """Execute one time-step of the environment's dynamics.
+
+        The environment steps forward at a fixed time-step, while the physics simulation is
+        decimated at a lower time-step. This is to ensure that the simulation is stable. These two
+        time-steps can be configured independently using the :attr:`ManagerBasedEnvCfg.decimation` (number of
+        simulation steps per environment step) and the :attr:`ManagerBasedEnvCfg.sim.dt` (physics time-step)
+        parameters. Based on these parameters, the environment time-step is computed as the product of the two.
+
+        Args:
+            action: The actions to apply on the environment. Shape is (num_envs, action_dim).
+
+        Returns:
+            A tuple containing the observations and extras.
+        """
+        # process actions
+        action_device = action.to(self.device)
+        if action_device.dtype != torch.float32:
+            action_device = action_device.float()
+        if not action_device.is_contiguous():
+            action_device = action_device.contiguous()
+        action_wp = wp.from_torch(action_device, dtype=wp.float32)
+        self.action_manager.process_action(action_wp)
+
+        self.recorder_manager.record_pre_step()
+
+        # check if we need to do rendering within the physics loop
+        # note: checked here once to avoid multiple checks within the loop
+        is_rendering = bool(self.sim.settings.get("/isaaclab/visualizer")) or self.sim.settings.get(
+            "/isaaclab/render/rtx_sensors"
+        )
+
+        # perform physics stepping
+        for _ in range(self.cfg.decimation):
+            self._sim_step_counter += 1
+            # set actions into buffers
+            self.action_manager.apply_action()
+            # set actions into simulator
+            self.scene.write_data_to_sim()
+            # simulate
+            self.sim.step(render=False)
+            # render between steps only if the GUI or an RTX sensor needs it
+            # note: we assume the render interval to be the shortest accepted rendering interval.
+            #    If a camera needs rendering at a faster frequency, this will lead to unexpected behavior.
+            if self._sim_step_counter % self.cfg.sim.render_interval == 0 and is_rendering:
+                self.sim.render()
+            # update buffers at sim dt
+            self.scene.update(dt=self.physics_dt)
+
+        # post-step: step interval event
+        if "interval" in self.event_manager.available_modes:
+            self.event_manager.apply(mode="interval", dt=self.step_dt)
+
+        # -- compute observations
+        self.obs_buf = self.observation_manager.compute(update_history=True)
+        self.recorder_manager.record_post_step()
+
+        # return observations and extras
+        return self.obs_buf, self.extras


Base step() bypasses ManagerCallSwitch entirely

ManagerBasedEnvWarp.step() (the base class, non-RL) calls action_manager.process_action, action_manager.apply_action, scene.write_data_to_sim, event_manager.apply, and observation_manager.compute all directly, without routing through self._manager_call_switch. This means any ManagerCallSwitch configuration (including WARP_CAPTURED mode) has no effect when this step() path is taken.

Since ManagerBasedRLEnvWarp overrides step() and does use ManagerCallSwitch, this only affects direct users of ManagerBasedEnvWarp. However, given the class name suggests Warp-mode support, a developer who instantiates it directly expecting WARP_CAPTURED behavior would be surprised to find it never captures. A comment explaining this limitation would prevent confusion:

def step(self, action: torch.Tensor) -> tuple[VecEnvObs, dict]: """Execute one time-step of the environment's dynamics. Note: This base-class step runs all manager calls eagerly and does **not** route through :attr:`_manager_call_switch`. CUDA graph capture is only available via :class:`ManagerBasedRLEnvWarp` which overrides this method. ...

hujc7 · 2026-03-30T23:57:17Z

Hi, @ooctipus. This PR still needs some attention to get merged.

isaaclab-review-bot

🤖 Isaac Lab Review Bot — PR #4829

Summary

Cherry-pick of Warp manager-based env infrastructure from dev/newton. This is a large PR (48 files, ~8k lines) adding experimental Warp-first manager implementations, MDP terms, utilities, and a Cartpole reference task.

Overall Assessment: ✅ Approve with minor suggestions

The PR is well-structured: new code lives entirely under isaaclab_experimental (no stable code disruption), the ManagerCallSwitch design allows per-manager fallback to stable implementations, and the WarpGraphCache warm-up-before-capture pattern is a solid improvement. The cherry-pick is clean with no conflict artifacts visible.

CI Status

✅ labeler passed
⚠️ No pre-commit/lint/ruff checks found for the head commit — suggest verifying linting passes locally.

Key Findings

Architecture (positive)

Clean separation: all experimental code in isaaclab_experimental, stable code only gets 2 minor comment/docstring cleanups.
ManagerCallSwitch with env-var-driven mode selection (MANAGER_CALL_CONFIG) is flexible for benchmarking stable vs warp vs captured paths.
WarpGraphCache now does eager warm-up before capture — correctly handles first-call allocations outside capture context.
RL library wrappers (rsl_rl, rl_games, sb3, skrl) updated uniformly with graceful ImportError handling.

Potential Issues (see inline comments)

_reset_idx signature mismatch in ManagerBasedRLEnvWarp — the base ManagerBasedEnvWarp._reset_idx takes env_ids only, but the RL env override adds env_mask kwarg. The base reset() calls _reset_idx(env_ids) without env_mask, meaning the mask codepath in the RL env is never exercised from reset().
resolve_1d_mask allocates wp.array from Python list when ids are not torch/warp — this defeats capture-safety for the slice → list(range(...)) path.
Observation dim inference in _infer_term_dim_scalar is fragile — falls back to wp.to_torch(asset.data.joint_pos).shape[1] which does a warp→torch conversion at init time.
episode_length_buf setter uses self._episode_length_buf[:] = value which triggers a full copy — fine for correctness but worth documenting the intent (preserve warp linkage).
recorder_manager not reset with mask — in _reset_idx of ManagerBasedRLEnvWarp, recorder_manager gets env_ids (torch) while all other managers get env_mask (warp). This inconsistency could cause issues if recorder_manager is ever captured.
NoiseModelCfg.rng_state_wp is set by side-effect during _prepare_terms — this coupling between observation manager and noise config is implicit and could break if noise configs are shared across managers.

isaaclab-review-bot · 2026-03-31T00:02:04Z

+            inputs=[seed_val, self.rng_state_wp],
+            device=self.device,
+        )
+


Minor: RNG seed -1 when cfg.seed is None

When cfg.seed is None, seed_val is set to -1 which is passed to wp.rand_init(seed, env_id). The wp.rand_init function may handle negative seeds differently across Warp versions. Consider defaulting to 0 or 42 instead of -1, or documenting the intent.

seed_val = int(self.cfg.seed) if self.cfg.seed is not None else -1

isaaclab-review-bot · 2026-03-31T00:02:04Z

+        *,
+        env_mask: wp.array | None = None,
+    ):
+        """Reset environments based on specified indices.


Signature mismatch: _reset_idx in RL env vs base env

The base ManagerBasedEnvWarp._reset_idx(self, env_ids) has no env_mask parameter, but this override adds env_mask as a keyword argument. When the base class reset() method calls self._reset_idx(env_ids), the env_mask kwarg is never passed, so the mask-based codepath in this method is never exercised from reset(). This is fine for now (base reset() only calls from external API), but worth a # NOTE to prevent confusion.

isaaclab-review-bot · 2026-03-31T00:02:04Z

+                        "mode": "reset",
+                        "env_mask_wp": env_mask,
+                        "global_env_step_count": self._global_env_step_count_wp,
+                    },


Inconsistency: recorder_manager.reset gets env_ids (torch) while others get env_mask (warp)

All other manager resets in this method use env_mask (warp boolean mask), but recorder_manager.reset(env_ids=env_ids) still uses torch integer indices. This is noted as intentional (recorder is still env_ids-based), but it means recorder cannot be captured. Consider adding a # TODO to track migration.

isaaclab-review-bot · 2026-03-31T00:02:04Z

+            return scratch_mask
+        ids_wp = ids
+    else:
+        if len(ids) == 0:


Capture-safety: wp.array(ids, ...) allocates during resolve_1d_mask

When ids is a Python list (e.g. from slice → list(range(...))), this line allocates a new wp.array, which is not safe during CUDA graph capture. The function comment says "No allocations happen inside this function" but that's not true for this path.

ids_wp = wp.array(ids, dtype=wp.int32, device=device)

Consider either:

Pre-allocating a scratch ids buffer alongside scratch_mask, or

Documenting that the slice → list path is not capture-safe.

isaaclab-review-bot · 2026-03-31T00:02:04Z

+            # Guard: concat groups must use the Warp fast-path (standard concat dim, no history).
+            if self._group_obs_concatenate[group_name] and not can_use_group_buffer:
+                raise ValueError(
+                    f"Observation group '{group_name}' is concatenated but cannot use the Warp"


Fragile dim inference: wp.to_torch conversion at init

The fallback wp.to_torch(asset.data.joint_pos).shape[1] performs a warp→torch conversion just to get a shape. Consider using asset.data.joint_pos.shape[1] directly (Warp arrays have a .shape attribute), which avoids the conversion overhead:

return int(asset.data.joint_pos.shape[1])

isaaclab-review-bot · 2026-03-31T00:02:04Z

+                # if scale is set, check if single float or tuple
+                if term_cfg.scale is not None:
+                    if not isinstance(term_cfg.scale, (float, int, tuple)):
+                        raise TypeError(


History + Warp: NotImplementedError raised at init

The raise NotImplementedError("History reshaping is not implemented yet for warp.") at line 695 will prevent any observation term with history_length > 0 from being used. This is fine as a guard, but the code above it (lines 681-694) still executes and creates a CircularBuffer that will never be used. Consider moving the raise before the buffer creation to avoid wasted allocation.

isaaclab-review-bot · 2026-03-31T00:02:04Z

-            self._results[stage] = result
-        wp.capture_launch(self._graphs[stage])
-        return self._results[stage]
+        if graph is not None:


Good improvement: eager warm-up before capture

The addition of fn(*args, **kwargs) before wp.ScopedCapture() is the right approach — it flushes first-call allocations (hasattr guards, lazy dtype casts) outside the capture context. This matches the pattern recommended by NVIDIA for CUDA graph capture.

One edge case: if fn has side effects that should only happen once (e.g. incrementing a counter), the warm-up + capture run will execute them twice. The PR description mentions this is intentional for the current use case, but worth a docstring note.

isaaclab-review-bot · 2026-03-31T00:02:04Z

+
+@wp.kernel
+def _reset_joints_by_offset_kernel(
+    env_mask: wp.array(dtype=wp.bool),


RNG state mutation: single thread per env is correct for race-freedom

Good design choice — using 1 thread per env with a sequential loop over joints avoids RNG state races. The comment documents this well. However, for articulations with many joints (e.g. humanoids with 20+ joints), this serialization could become a bottleneck. Consider noting this as a known limitation for future optimization (e.g. per-env per-joint RNG states).

isaaclab-review-bot · 2026-03-31T00:02:04Z

+            # store weighted reward rate (matches old: value/dt)
+            step_reward[env_id, term_idx] = weighted
+            val = weighted * dt
+            total += val


_reward_finalize kernel: step_reward uses weighted but not dt-scaled values

The kernel stores weighted = raw * weight into step_reward (the "reward rate"), but then val = weighted * dt goes into episode_sums and total. The comment says // store weighted reward rate (matches old: value/dt) but the stored value is actually raw * weight, not raw * weight / dt. The naming/comment could be clarified to avoid confusion — it's weighted_per_step, not a rate.

isaaclab-review-bot · 2026-03-31T00:02:04Z

+        # post-step:
+        # -- update env counters (used for curriculum generation)
+        self.episode_length_buf += 1  # step in current episode (per env)
+        self.common_step_counter += 1  # total step (common for all envs)


is_rendering check differs from base class

Base ManagerBasedEnvWarp.step() uses:

is_rendering = bool(self.sim.settings.get("/isaaclab/visualizer")) or self.sim.settings.get("/isaaclab/render/rtx_sensors")

But this RL env uses:

is_rendering = self.sim.is_rendering

The commit message mentions "align warp env rendering checks with stable env: use sim.is_rendering". The base class ManagerBasedEnvWarp.step() should likely be updated too for consistency, or the difference documented.

hujc7 · 2026-03-31T00:03:25Z

@greptileai Review

Add experimental warp-compatible manager implementations, MDP terms, utilities (buffers, modifiers, noise, warp kernels), ManagerCallSwitch for eager/captured dispatch, and manager-based env orchestration. Includes RL library wrapper updates (rsl_rl, rl_games, sb3, skrl) to accept warp env types, and minor stable fixes (settings_manager RuntimeError handling, observation_manager comment cleanup).

Add an experimental manager-based Cartpole environment using the warp manager infrastructure as a reference task for testing and benchmarking.

WarpGraphCache now runs an eager warm-up call before graph capture so that first-call initialisation (allocations, hasattr guards, dtype casts) executes outside the capture context. Also align warp env rendering checks with stable env: use sim.is_rendering for the step-loop check and cache has_rtx_sensors at init for rerender-on-reset and render-method guards.

…on (#4945) ## Summary * Cherry-picks [Newton] Migrate more envs and mdps to warp (#4690) onto develop * Cherry-picks [Newton] Add capture safety guards and fix WrenchComposer stale COM pose (#4779) onto develop ### Changes included - Warp-first MDP terms (observations, rewards, events, terminations, actions) for manager-based envs - Tested warp env configs: Ant, Humanoid, Cartpole, locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), Franka/UR10 reach - ManagerCallSwitch max_mode cap and scene capture config - MDP kernels made graph-capturable with consolidated test infrastructure - capture_unsafe safety guards on lazy-evaluated derived properties in articulation/rigid_object data - WrenchComposer fix: use fresh COM pose buffers instead of stale cached link poses ### Dropped - G1-29-DOF warp env (Isaac-Velocity-Flat-G1-Warp-v1): removed because the stable g1_29_dofs task config does not exist on develop (only on dev/newton). Warp env PRs should only add warp frontends for envs that already exist in the stable package. ## Dependencies Must be merged **after** these PRs (in order): 1. #4905 (merged) 2. #4829 ## Validated base Validated against develop at 7588fa9. ## Test plan - [x] Run warp env training sweep across all manager-based env configs (14/14 pass, mode=2, 4096 envs, 300 iters) - [ ] Run test_mdp_warp_parity.py and test_mdp_warp_parity_new_terms.py - [ ] Run test_action_warp_parity.py - [ ] Verify WrenchComposer COM pose is fresh (not stale) during graph replay --------- Co-authored-by: Antoine Richard <antoiner@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>

…on (isaac-sim#4945) ## Summary * Cherry-picks [Newton] Migrate more envs and mdps to warp (isaac-sim#4690) onto develop * Cherry-picks [Newton] Add capture safety guards and fix WrenchComposer stale COM pose (isaac-sim#4779) onto develop ### Changes included - Warp-first MDP terms (observations, rewards, events, terminations, actions) for manager-based envs - Tested warp env configs: Ant, Humanoid, Cartpole, locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), Franka/UR10 reach - ManagerCallSwitch max_mode cap and scene capture config - MDP kernels made graph-capturable with consolidated test infrastructure - capture_unsafe safety guards on lazy-evaluated derived properties in articulation/rigid_object data - WrenchComposer fix: use fresh COM pose buffers instead of stale cached link poses ### Dropped - G1-29-DOF warp env (Isaac-Velocity-Flat-G1-Warp-v1): removed because the stable g1_29_dofs task config does not exist on develop (only on dev/newton). Warp env PRs should only add warp frontends for envs that already exist in the stable package. ## Dependencies Must be merged **after** these PRs (in order): 1. isaac-sim#4905 (merged) 2. isaac-sim#4829 ## Validated base Validated against develop at 7588fa9. ## Test plan - [x] Run warp env training sweep across all manager-based env configs (14/14 pass, mode=2, 4096 envs, 300 iters) - [ ] Run test_mdp_warp_parity.py and test_mdp_warp_parity_new_terms.py - [ ] Run test_action_warp_parity.py - [ ] Verify WrenchComposer COM pose is fresh (not stale) during graph replay --------- Co-authored-by: Antoine Richard <antoiner@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>

## Summary * Add warp environment overview doc (`warp-environments.rst`) * Add stable-to-warp migration guide (`warp-env-migration.rst`) * Align step timer setup across all 4 env base classes (stable + warp, direct + manager) ## Dependencies Must be merged **after** (validated against develop at `9720047`): 1. #4829 2. #4945 ## Status Performance comparison data included in docs.

## Summary * Add warp environment overview doc (`warp-environments.rst`) * Add stable-to-warp migration guide (`warp-env-migration.rst`) * Align step timer setup across all 4 env base classes (stable + warp, direct + manager) ## Dependencies Must be merged **after** (validated against develop at `9720047`): 1. isaac-sim#4829 2. isaac-sim#4945 ## Status Performance comparison data included in docs.

hujc7 requested review from ClemensSchwarke, Mayankm96, Toni-SM, hhansen-bdai, jtigue-bdai, kellyguo11 and ooctipus as code owners March 5, 2026 10:46

github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Mar 5, 2026

hujc7 changed the title ~~Cherry-pick manager-based warp env infrastructure from dev/newton~~ [Experimental] Cherry-pick manager-based warp env infrastructure from dev/newton Mar 5, 2026

greptile-apps Bot reviewed Mar 5, 2026

View reviewed changes

hujc7 force-pushed the develop-manager-warp-cp branch 3 times, most recently from 6d1ac95 to 7138023 Compare March 9, 2026 08:26

greptile-apps Bot reviewed Mar 9, 2026

View reviewed changes

hujc7 force-pushed the develop-manager-warp-cp branch from 7138023 to e044ddc Compare March 9, 2026 09:00

greptile-apps Bot reviewed Mar 9, 2026

View reviewed changes

hujc7 changed the title ~~[Experimental] Cherry-pick manager-based warp env infrastructure from dev/newton~~ [Exp] Cherry-pick manager-based warp env infrastructure from dev/newton Mar 11, 2026

hujc7 force-pushed the develop-manager-warp-cp branch from e044ddc to 5f3bc76 Compare March 11, 2026 15:22

hujc7 mentioned this pull request Mar 11, 2026

[Exp] Cherry-pick warp MDP migration and capture safety from dev/newton #4945

Merged

4 tasks

hujc7 force-pushed the develop-manager-warp-cp branch 3 times, most recently from 0b666f6 to 59d82f4 Compare March 12, 2026 06:05

hujc7 force-pushed the develop-manager-warp-cp branch from 3cd41f3 to 224a902 Compare March 13, 2026 07:41

greptile-apps Bot reviewed Mar 13, 2026

View reviewed changes

hujc7 mentioned this pull request Mar 13, 2026

Add warp environment docs and timer alignment #4995

Merged

hujc7 force-pushed the develop-manager-warp-cp branch 2 times, most recently from 0b71e14 to 028a7fd Compare March 14, 2026 21:38

greptile-apps Bot reviewed Mar 14, 2026

View reviewed changes

hujc7 force-pushed the develop-manager-warp-cp branch from 028a7fd to 5108ca8 Compare March 15, 2026 07:38

greptile-apps Bot reviewed Mar 15, 2026

View reviewed changes

AntoineRichard approved these changes Mar 16, 2026

View reviewed changes

hujc7 mentioned this pull request Mar 23, 2026

[Exp] Slim warp velocity envs and enable Rough variants for all robots #5088

Open

hujc7 force-pushed the develop-manager-warp-cp branch from 99edd67 to 8c0df8d Compare March 30, 2026 23:58

isaaclab-review-bot Bot reviewed Mar 31, 2026

View reviewed changes

hujc7 added 3 commits April 8, 2026 10:07

Add warp Cartpole task configuration

ef7b371

Add an experimental manager-based Cartpole environment using the warp manager infrastructure as a reference task for testing and benchmarking.

hujc7 force-pushed the develop-manager-warp-cp branch from 8c0df8d to cf951ac Compare April 8, 2026 23:40

AntoineRichard approved these changes Apr 16, 2026

View reviewed changes

AntoineRichard added 2 commits April 16, 2026 13:14

Merge branch 'develop' into develop-manager-warp-cp

14d511b

Merge branch 'develop' into develop-manager-warp-cp

8fc71f8

AntoineRichard merged commit 5dee881 into isaac-sim:develop Apr 16, 2026
1 check passed

		@Timer(name="env_step", msg="Step took:", enable=True, format="us")
		def step(self, action: torch.Tensor) -> VecEnvStepReturn:

		assert asset_cfg.joint_ids_wp is not None
		assert env.rng_state_wp is not None

-    assert asset_cfg.joint_ids_wp is not None
-    assert env.rng_state_wp is not None
+    if asset_cfg.joint_ids_wp is None:
+        raise ValueError(
+            f"reset_joints_by_offset requires an experimental SceneEntityCfg with resolved joint_ids_wp, "
+            f"but got None for asset '{asset_cfg.name}'. "
+            "Use isaaclab_experimental.managers.SceneEntityCfg and ensure joint_names are set."
+        )
+    if not hasattr(env, "rng_state_wp") or env.rng_state_wp is None:
+        raise AttributeError(
+            "reset_joints_by_offset requires env.rng_state_wp to be initialized. "
+            "Use ManagerBasedEnvWarp or ManagerBasedRLEnvWarp as the base environment."
+        )


		with contextlib.suppress(ImportError):
		from isaaclab_experimental.envs import DirectRLEnvWarp, ManagerBasedRLEnvWarp

Conversation

hujc7 commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

isaaclab_experimental

isaaclab_tasks_experimental

isaaclab_rl

isaaclab

Dependencies

Validated base

Known limitations

Test plan

Uh oh!

hujc7 commented Mar 5, 2026

Uh oh!

hujc7 commented Mar 5, 2026

Uh oh!

greptile-apps Bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hujc7 commented Mar 9, 2026

Uh oh!

greptile-apps Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hujc7 commented Mar 9, 2026

Uh oh!

greptile-apps Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hujc7 commented Mar 12, 2026

Latest changes

Verification

Uh oh!

hujc7 commented Mar 12, 2026

Uh oh!

hujc7 commented Mar 13, 2026

Uh oh!

greptile-apps Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hujc7 commented Mar 14, 2026

Uh oh!

greptile-apps Bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 14, 2026

Choose a reason for hiding this comment

hujc7 commented Mar 5, 2026 •

edited

Loading

`isaaclab_experimental`

`isaaclab_tasks_experimental`

`isaaclab_rl`

`isaaclab`

greptile-apps Bot commented Mar 5, 2026 •

edited

Loading