[Exp] Cherry-pick direct warp envs from dev/newton#4905
Conversation
Greptile SummaryThis PR cherry-picks the experimental Warp-native RL environment infrastructure (
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant RL as RSL-RL Runner
participant Wrapper as RslRlVecEnvWrapper
participant Env as DirectRLEnvWarp
participant Cache as WarpGraphCache
participant Scene as InteractiveSceneWarp
RL->>Wrapper: step(actions)
Wrapper->>Env: step(actions)
Env->>Env: _pre_physics_step(wp.from_torch(actions))
loop decimation
Env->>Cache: capture_or_replay("action", step_warp_action)
Cache-->>Env: graph captured/replayed
Env->>Scene: write_data_to_sim() [outside graph]
Env->>Env: sim.step()
Env->>Scene: scene.update()
end
Env->>Cache: capture_or_replay("end_pre", _step_warp_end_pre)
Note over Cache,Env: Captured: add_to_env → _get_dones → _get_rewards → _reset_idx(reset_buf)
Env->>Scene: write_data_to_sim() [outside graph]
Env->>Cache: capture_or_replay("end_post", _step_warp_end_post)
Note over Cache,Env: Captured: _get_observations()
Env->>Env: _post_step_visualize() [outside graph]
Env-->>Wrapper: obs_dict, rewards, terminated, truncated, extras
Wrapper-->>RL: TensorDict(obs), rew, dones, extras
|
| joint_vel[env_index, 0] = default_joint_vel[env_index, 0] | ||
| joint_vel[env_index, 1] = default_joint_vel[env_index, 1] |
There was a problem hiding this comment.
Hardcoded joint indices break generality of velocity reset
Joint positions are reset using the parameterized cart_dof_idx / pole_dof_idx, but joint velocities are reset using hardcoded indices 0 and 1. For the default cartpole this happens to be correct, but if the joint ordering ever changes (or this kernel is reused), velocities would be silently reset for the wrong degrees of freedom.
| joint_vel[env_index, 0] = default_joint_vel[env_index, 0] | |
| joint_vel[env_index, 1] = default_joint_vel[env_index, 1] | |
| joint_vel[env_index, cart_dof_idx] = default_joint_vel[env_index, cart_dof_idx] | |
| joint_vel[env_index, pole_dof_idx] = default_joint_vel[env_index, pole_dof_idx] |
There was a problem hiding this comment.
Fixed — now uses cart_dof_idx/pole_dof_idx for velocity reset.
| to_targets[env_index] = wp.transform_get_translation(root_pose[env_index]) - wp.transform_get_translation( | ||
| default_root_pose[env_index] | ||
| ) | ||
| to_targets[env_index][2] = 0.0 | ||
| potentials[env_index] = -wp.length(to_targets[env_index]) / dt |
There was a problem hiding this comment.
to_targets and initial potentials in reset_root are computed incorrectly
After the root pose is set to default_root_pose + env_origins, the code computes:
to_targets = (default_root_pose + env_origins) - default_root_pose = env_origins
so to_targets holds the environment origin offset (~4–10 m), not the vector from the agent to the distant target (initialized to env_origins + (1000, 0, 0)). As a result potentials = -|env_origins| / dt instead of the expected ≈ -1000 / dt.
In practice this does not corrupt training because _reset_idx immediately calls _compute_intermediate_values() at its end, which correctly overwrites both to_targets and potentials (and sets prev_potentials from the now-correct potentials). However, the intention of this initial calculation in reset_root is clearly wrong and could confuse future readers or cause subtle bugs if the subsequent _compute_intermediate_values() call is ever removed or reordered.
The correct expression would be:
to_targets[env_index] = targets[env_index] - wp.transform_get_translation(root_pose[env_index])
to_targets[env_index][2] = wp.float32(0.0)
potentials[env_index] = -wp.length(to_targets[env_index]) / dt(where targets would need to be added as a kernel input)
There was a problem hiding this comment.
Fixed — reset_root now takes the targets array and computes to_targets = targets - root_pose correctly.
| @abstractmethod | ||
| def _get_observations(self) -> None: | ||
| """Compute the observations for the environment. | ||
|
|
||
| Writes results into the observation buffers (e.g., ``self.obs_buf``). | ||
| """ | ||
| raise NotImplementedError(f"Please implement the '_get_observations' method for {self.__class__.__name__}.") |
There was a problem hiding this comment.
_get_observations declared -> None but implementations return a dict relied on by vecenv_wrapper
The abstract signature declares a None return type and the docstring says "Writes results into the observation buffers", but every concrete implementation (CartpoleWarpEnv, LocomotionWarpEnv, etc.) returns {"policy": self.torch_obs_buf}. The updated RslRlVecEnvWrapper.get_observations() in this PR also calls _get_observations() and uses its return value directly:
obs_dict = self.unwrapped._get_observations()
return TensorDict(obs_dict, batch_size=[self.num_envs])If a new subclass faithfully follows the declared -> None contract, get_observations() will raise a TypeError when constructing the TensorDict. The return type should be updated to make the contract explicit:
| @abstractmethod | |
| def _get_observations(self) -> None: | |
| """Compute the observations for the environment. | |
| Writes results into the observation buffers (e.g., ``self.obs_buf``). | |
| """ | |
| raise NotImplementedError(f"Please implement the '_get_observations' method for {self.__class__.__name__}.") | |
| def _get_observations(self) -> dict: |
and the docstring updated to reflect that the method both writes buffers and returns the observation dict.
There was a problem hiding this comment.
Fixed — changed abstract signature to -> dict with updated docstring.
| "numpy", | ||
| "prettytable==3.3.0", | ||
| "toml", | ||
| "hidapi", |
There was a problem hiding this comment.
This is not needed?
There was a problem hiding this comment.
Cleaned up — the Kit-only [python.pipapi] section was removed. File now has only [package], [dependencies], [core], and [[python.module]].
| "toml", | ||
| "hidapi", | ||
| "gymnasium==0.29.0", | ||
| "trimesh" |
There was a problem hiding this comment.
This is no needed?
There was a problem hiding this comment.
Cleaned up — the Kit-only [python.pipapi] section was removed. File now has only [package], [dependencies], [core], and [[python.module]].
| [python.pipapi] | ||
| requirements = [ | ||
| "numpy", | ||
| "prettytable==3.3.0", |
There was a problem hiding this comment.
This should not be needed?
There was a problem hiding this comment.
Cleaned up — removed the entire [python.pipapi] section (Kit-only, not needed for standalone package). Matched the isaaclab_newton extension.toml pattern.
| "numpy", | ||
| "prettytable", | ||
| "toml", | ||
| "hid", |
There was a problem hiding this comment.
Cleaned up — the Kit-only [python.pipapi] section was removed. File now has only [package], [dependencies], [core], and [[python.module]].
| "toml", | ||
| "hid", | ||
| "gymnasium", | ||
| "trimesh" |
There was a problem hiding this comment.
Cleaned up — the Kit-only [python.pipapi] section was removed. File now has only [package], [dependencies], [core], and [[python.module]].
| "numpy>2", | ||
| "warp-lang>=1.9.0.dev20250825", # TODO: update to 1.11.0 | ||
| "torch>=2.7", | ||
| "prettytable==3.3.0", |
There was a problem hiding this comment.
Removed — INSTALL_REQUIRES stripped to just toml. All other deps inherited transitively from isaaclab.
| INSTALL_REQUIRES = [ | ||
| # generic | ||
| "numpy>2", | ||
| "warp-lang>=1.9.0.dev20250825", # TODO: update to 1.11.0 |
There was a problem hiding this comment.
This should point to warp1.12?
There was a problem hiding this comment.
Stripped all redundant deps — INSTALL_REQUIRES now only has toml (needed by setup.py itself). numpy, torch, warp, prettytable are all inherited transitively from isaaclab.
| # generic | ||
| "numpy>2", | ||
| "warp-lang>=1.9.0.dev20250825", # TODO: update to 1.11.0 | ||
| "torch>=2.7", |
There was a problem hiding this comment.
Removed — deps stripped. See above.
There was a problem hiding this comment.
I think we already have a modified timer in develop. Do we need that one too?
There was a problem hiding this comment.
Switched direct_rl_env_warp.py to import from isaaclab.utils.timer instead. The API is compatible (msg, name, enable kwargs all match). The experimental timer file is kept in place but no longer imported by the base class.
| with contextlib.suppress(ImportError): | ||
| import isaaclab_tasks_experimental # noqa: F401 |
There was a problem hiding this comment.
Yes — this triggers gym registration for experimental tasks (Isaac-Cartpole-Direct-Warp-v0, etc.). The contextlib.suppress(ImportError) makes it optional so users without isaaclab_tasks_experimental installed are unaffected.
|
@hujc7 Can you double check that ant doesn't work after the newton update + our internal fix? |
develop TOT was not working for my other PRs so I used an older commit. Will check again. |
0d6b502 to
3d60b9d
Compare
cec15c7 to
495ec7a
Compare
135024c to
4e54fdd
Compare
|
@AntoineRichard Confirmed — all envs work now. Rebased onto latest develop, aligned solver configs to stable PresetCfg values, and referenced stable agent configs directly. All three envs (cartpole, ant, humanoid) pass 300 iterations at 4096 envs. |
|
@greptileai Review |
|
|
||
| goal_rot[env_id] = randomize_rotation(rand0, rand1, x_unit_vec, y_unit_vec) | ||
| reset_goal_buf[env_id] = False | ||
|
|
||
| # Warp-native addition: goal position in world frame. | ||
| goal_pos_w[env_id] = goal_pos[env_id] + env_origins[env_id] |
There was a problem hiding this comment.
goal_pos_w update runs for every env on every step, not just masked ones
The assignment on line 93 is outside the if env_mask[env_id]: guard, so it executes for all environments on every invocation of _reset_target_pose. Because _reset_target_pose is called from _get_rewards (which is captured inside the CUDA graph via _step_warp_end_pre), this means goal_pos_w is recomputed for all envs every step, regardless of reset_goal_buf.
While functionally harmless today (because goal_pos and env_origins are constant), this is semantically incorrect: the intent is clearly to update goal_pos_w only for environments whose goal is being reset. Any future change that makes goal_pos per-env-mutable (e.g., curriculum) would silently overwrite the world-frame goal for non-resetting envs each step.
| goal_rot[env_id] = randomize_rotation(rand0, rand1, x_unit_vec, y_unit_vec) | |
| reset_goal_buf[env_id] = False | |
| # Warp-native addition: goal position in world frame. | |
| goal_pos_w[env_id] = goal_pos[env_id] + env_origins[env_id] | |
| env_id = wp.tid() | |
| if env_mask[env_id]: | |
| rand0 = wp.randf(rng_state[env_id], wp.float32(-1.0), wp.float32(1.0)) | |
| rng_state[env_id] += wp.uint32(1) | |
| rand1 = wp.randf(rng_state[env_id], wp.float32(-1.0), wp.float32(1.0)) | |
| rng_state[env_id] += wp.uint32(1) | |
| goal_rot[env_id] = randomize_rotation(rand0, rand1, x_unit_vec, y_unit_vec) | |
| reset_goal_buf[env_id] = False | |
| # Warp-native addition: goal position in world frame. | |
| goal_pos_w[env_id] = goal_pos[env_id] + env_origins[env_id] |
There was a problem hiding this comment.
Fixed — moved goal_pos_w update inside the env_mask guard.
| for deformable_object in self._deformable_objects.values(): | ||
| deformable_object.reset(env_ids) | ||
| for rigid_object in self._rigid_objects.values(): | ||
| rigid_object.reset(env_ids, env_mask=env_mask) | ||
| for surface_gripper in self._surface_grippers.values(): | ||
| surface_gripper.reset(env_ids) | ||
| for rigid_object_collection in self._rigid_object_collections.values(): | ||
| rigid_object_collection.reset(env_ids, env_mask=env_mask) | ||
| # -- sensors (no env_mask support) | ||
| for sensor in self._sensors.values(): | ||
| sensor.reset(env_ids) |
There was a problem hiding this comment.
env_mask-only reset silently resets all deformable objects and surface grippers
When _reset_idx is called with only env_mask (and env_ids=None), the assets that do not yet support env_mask — deformable_object.reset(env_ids) and surface_gripper.reset(env_ids) — receive env_ids=None, which typically means reset all environments. This is the exact opposite of what the caller intended.
Current environments in this PR (Cartpole, Ant, Humanoid, InHand) happen to have no deformable objects or surface grippers, so there is no observable failure. However, any future environment that uses either asset type and calls _reset_idx with a partial mask (e.g., during per-env resets inside a CUDA-graph-captured path) will silently reset every environment on every step.
Consider either:
- Converting
env_mask→env_idsbefore delegating to these two asset types (consistent with howenv_mask-unaware assets already work), or - Adding a guard/warning to document the limitation:
if env_mask is not None and env_ids is None:
# deformable objects / surface grippers do not support env_mask;
# passing env_ids=None will reset ALL environments.
pass # or raise / warnThere was a problem hiding this comment.
Acknowledged — latent bug. No current envs use deformable objects or surface grippers with warp env_mask path.
|
|
||
| # reset | ||
| max_cart_pos = 3.0 # the cart is reset if it exceeds that position [m] | ||
| initial_pole_angle_range = [-0.25, 0.25] # the range in which the pole angle is sampled from on reset [rad] |
There was a problem hiding this comment.
Config comment says [rad] but the reset kernel multiplies the range by wp.pi
The config field is documented as radians ([rad]), but the reset kernel on line 198–200 does:
wp.randf(state[env_index],
initial_pose_angle_range[0] * wp.pi,
initial_pose_angle_range[1] * wp.pi)With the default [-0.25, 0.25], the actual reset range is [-0.25π, 0.25π] ≈ [-0.785, 0.785] rad — more than three times larger than the comment implies. The comment should be updated to clarify that the values are in units of π radians:
| initial_pole_angle_range = [-0.25, 0.25] # the range in which the pole angle is sampled from on reset [rad] | |
| initial_pole_angle_range = [-0.25, 0.25] # pole angle reset range in units of π [rad] (actual range ≈ ±0.785 rad) |
There was a problem hiding this comment.
Fixed — comment updated to [x pi rad] to match the kernel which multiplies by pi.
| newton_cfg = NewtonCfg( | ||
| solver_cfg=solver_cfg, | ||
| num_substeps=2, | ||
| debug_mode=False, | ||
| ) |
There was a problem hiding this comment.
HumanoidWarpEnvCfg does not set use_cuda_graph=True unlike AntWarpEnvCfg
AntWarpEnvCfg.newton_cfg sets use_cuda_graph=True, but HumanoidWarpEnvCfg.newton_cfg omits this flag (defaults to False). Since the entire DirectRLEnvWarp infrastructure is designed around CUDA graph capture for performance, disabling Newton's own CUDA graph for Humanoid while enabling it for Ant creates an inconsistency that may hurt Humanoid training throughput.
If this is intentional (e.g., Humanoid requires update_data_interval=2 which conflicts with Newton's graph capture), the reason should be documented with a comment:
| newton_cfg = NewtonCfg( | |
| solver_cfg=solver_cfg, | |
| num_substeps=2, | |
| debug_mode=False, | |
| ) | |
| newton_cfg = NewtonCfg( | |
| solver_cfg=solver_cfg, | |
| num_substeps=2, | |
| debug_mode=False, | |
| # use_cuda_graph omitted: Newton graph capture is incompatible with update_data_interval > 1 | |
| ) |
There was a problem hiding this comment.
Fixed — added use_cuda_graph=True to match stable and AntWarpEnvCfg.
Validation: Warp vs Stable (Newton) Direct Env Parity300 iterations, 4096 envs, Final metrics (last iteration):
Timing breakdown (last iteration):
Observations:
|
|
@greptileai Review |
4e54fdd to
03fe0e6
Compare
|
@greptileai Review |
There was a problem hiding this comment.
Do we need this? It should not be needed anymore.
There was a problem hiding this comment.
Removed — now imports directly from stable isaaclab.envs.utils.spaces.
d8bca3f to
b037b52
Compare
Add isaaclab_experimental package with DirectRLEnvWarp base class, InteractiveSceneWarp, and WarpGraphCache utility. Add direct warp environments in isaaclab_tasks_experimental: - Cartpole, Ant, Humanoid, Locomotion (base), InHand Manipulation, Allegro Hand, with agent configs for rsl_rl, rl_games, skrl, sb3. Adapt to develop base class API: - find_joints 2-value return (indices, names) - episode_length_buf as property with in-place copy for warp sync - _ALL_ENV_MASK on base env instead of articulation - set_joint_effort_target_mask for CUDA graph compatibility - _get_observations returns dict for rsl_rl wrapper Align solver configs with stable develop PresetCfg values. Add safe_normalize to guard against NaN from wp.normalize. Fix reset_root to_targets computation to use actual targets. Fix cartpole reset kernel to use parameterized joint indices. Clean up extension.toml and setup.py dependencies. Switch timer import to isaaclab.utils.timer.
b037b52 to
c5259f7
Compare
|
@greptileai Review |
1 similar comment
|
@greptileai Review |
Signed-off-by: Antoine RICHARD <antoiner@nvidia.com>
## Summary Adds experimental warp infrastructure and direct warp environments from `dev/newton`, adapted for `develop`. Absorbs PR isaac-sim#4812 (inhand-cp). ### `isaaclab_experimental` * `DirectRLEnvWarp` base class with CUDA graph capture via `WarpGraphCache` * `InteractiveSceneWarp` with warp-native env_mask reset support * `episode_length_buf` property with in-place copy to preserve warp/torch shared memory ### `isaaclab_tasks_experimental` (direct envs) * **Cartpole** (`Isaac-Cartpole-Direct-Warp-v0`) * **Ant** (`Isaac-Ant-Direct-Warp-v0`) * **Humanoid** (`Isaac-Humanoid-Direct-Warp-v0`) * **Locomotion** base warp env (shared by ant/humanoid) * **InHand Manipulation** + **Allegro Hand** * Agent configs reference stable `isaaclab_tasks.direct.<env>.agents` directly — no duplication ### API adaptations for `develop` * `find_joints` 2-value return (indices, names) * `episode_length_buf` as property with in-place `copy_()` for warp/torch shared memory * `self._ALL_ENV_MASK` from base env * `set_joint_effort_target_mask` for CUDA graph compatibility * `_get_observations` returns `{"policy": tensor}` dict * `safe_normalize` to guard `wp.normalize` on zero-length vectors * Solver configs aligned with stable develop `PresetCfg` values ### Test results (rsl_rl, 4096 envs, 300 iterations, headless, `newton==1.0.0`) | Env | Status | Time | |-----|--------|------| | Cartpole | PASS | 70s | | Ant | PASS | 98s | | Humanoid | PASS | 172s | ## Test plan - [x] Cartpole: 300 iteration training converges - [x] Ant: 300 iteration training converges - [x] Humanoid: 300 iteration training converges --------- Signed-off-by: Antoine RICHARD <antoiner@nvidia.com> Co-authored-by: Antoine RICHARD <antoiner@nvidia.com>
…on (#4829) ## Summary Cherry-pick of warp manager-based env infrastructure from `dev/newton`, refactored for `develop`. ### `isaaclab_experimental` * Added warp-compatible manager implementations (`ActionManager`, `ObservationManager`, `EventManager`, `CommandManager`, `TerminationManager`, `RewardManager`) with Warp kernel execution and CUDA graph capture support. * Added `ManagerCallSwitch` utility for per-manager eager/captured dispatch, configured via `MANAGER_CALL_CONFIG` env var. * Added `ManagerBasedEnvWarp` and `ManagerBasedRLEnvWarp` orchestration env classes. * Added warp MDP terms (observations, rewards, terminations, events, joint actions). * Added utility modules: buffers (circular buffer), modifiers, noise models, warp kernels/helpers. * Added experimental `SceneEntityCfg` with warp joint mask/ids for kernel-level joint selection. * Generalized configclass default materialization in `ManagerBase` for automatic `SceneEntityCfg` resolution. ### `isaaclab_tasks_experimental` * Added `Isaac-Cartpole-Warp-v0` task as reference environment for warp manager-based workflow. ### `isaaclab_rl` * Updated rsl_rl, rl_games, sb3, skrl wrappers to accept `ManagerBasedRLEnvWarp` and `DirectRLEnvWarp`. ### `isaaclab` * Fixed `SettingsManager` to catch `RuntimeError` when carb is unavailable. * Minor comment cleanup in `ObservationManager`. ## Dependencies Must be merged **after**: 1. #4905 (merged) ## Validated base Validated against develop at `7588fa9ed5f`. ## Known limitations * `Scene_write_data_to_sim` capped to mode=1 (eager) via `MAX_MODE_OVERRIDES` — articulation `_apply_actuator_model` uses `wp.to_torch + torch indexing`, not CUDA graph capture-safe. ## Test plan - [x] `Isaac-Cartpole-Warp-v0` training (4096 envs, 300 iters, mode=2): converges (reward 4.95, ep_len 300) --------- Co-authored-by: Antoine RICHARD <antoiner@nvidia.com>
…on (#4945) ## Summary * Cherry-picks [Newton] Migrate more envs and mdps to warp (#4690) onto develop * Cherry-picks [Newton] Add capture safety guards and fix WrenchComposer stale COM pose (#4779) onto develop ### Changes included - Warp-first MDP terms (observations, rewards, events, terminations, actions) for manager-based envs - Tested warp env configs: Ant, Humanoid, Cartpole, locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), Franka/UR10 reach - ManagerCallSwitch max_mode cap and scene capture config - MDP kernels made graph-capturable with consolidated test infrastructure - capture_unsafe safety guards on lazy-evaluated derived properties in articulation/rigid_object data - WrenchComposer fix: use fresh COM pose buffers instead of stale cached link poses ### Dropped - G1-29-DOF warp env (Isaac-Velocity-Flat-G1-Warp-v1): removed because the stable g1_29_dofs task config does not exist on develop (only on dev/newton). Warp env PRs should only add warp frontends for envs that already exist in the stable package. ## Dependencies Must be merged **after** these PRs (in order): 1. #4905 (merged) 2. #4829 ## Validated base Validated against develop at 7588fa9. ## Test plan - [x] Run warp env training sweep across all manager-based env configs (14/14 pass, mode=2, 4096 envs, 300 iters) - [ ] Run test_mdp_warp_parity.py and test_mdp_warp_parity_new_terms.py - [ ] Run test_action_warp_parity.py - [ ] Verify WrenchComposer COM pose is fresh (not stale) during graph replay --------- Co-authored-by: Antoine Richard <antoiner@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>
…on (isaac-sim#4945) ## Summary * Cherry-picks [Newton] Migrate more envs and mdps to warp (isaac-sim#4690) onto develop * Cherry-picks [Newton] Add capture safety guards and fix WrenchComposer stale COM pose (isaac-sim#4779) onto develop ### Changes included - Warp-first MDP terms (observations, rewards, events, terminations, actions) for manager-based envs - Tested warp env configs: Ant, Humanoid, Cartpole, locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), Franka/UR10 reach - ManagerCallSwitch max_mode cap and scene capture config - MDP kernels made graph-capturable with consolidated test infrastructure - capture_unsafe safety guards on lazy-evaluated derived properties in articulation/rigid_object data - WrenchComposer fix: use fresh COM pose buffers instead of stale cached link poses ### Dropped - G1-29-DOF warp env (Isaac-Velocity-Flat-G1-Warp-v1): removed because the stable g1_29_dofs task config does not exist on develop (only on dev/newton). Warp env PRs should only add warp frontends for envs that already exist in the stable package. ## Dependencies Must be merged **after** these PRs (in order): 1. isaac-sim#4905 (merged) 2. isaac-sim#4829 ## Validated base Validated against develop at 7588fa9. ## Test plan - [x] Run warp env training sweep across all manager-based env configs (14/14 pass, mode=2, 4096 envs, 300 iters) - [ ] Run test_mdp_warp_parity.py and test_mdp_warp_parity_new_terms.py - [ ] Run test_action_warp_parity.py - [ ] Verify WrenchComposer COM pose is fresh (not stale) during graph replay --------- Co-authored-by: Antoine Richard <antoiner@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>
…on (isaac-sim#4945) ## Summary * Cherry-picks [Newton] Migrate more envs and mdps to warp (isaac-sim#4690) onto develop * Cherry-picks [Newton] Add capture safety guards and fix WrenchComposer stale COM pose (isaac-sim#4779) onto develop ### Changes included - Warp-first MDP terms (observations, rewards, events, terminations, actions) for manager-based envs - Tested warp env configs: Ant, Humanoid, Cartpole, locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), Franka/UR10 reach - ManagerCallSwitch max_mode cap and scene capture config - MDP kernels made graph-capturable with consolidated test infrastructure - capture_unsafe safety guards on lazy-evaluated derived properties in articulation/rigid_object data - WrenchComposer fix: use fresh COM pose buffers instead of stale cached link poses ### Dropped - G1-29-DOF warp env (Isaac-Velocity-Flat-G1-Warp-v1): removed because the stable g1_29_dofs task config does not exist on develop (only on dev/newton). Warp env PRs should only add warp frontends for envs that already exist in the stable package. ## Dependencies Must be merged **after** these PRs (in order): 1. isaac-sim#4905 (merged) 2. isaac-sim#4829 ## Validated base Validated against develop at 7588fa9. ## Test plan - [x] Run warp env training sweep across all manager-based env configs (14/14 pass, mode=2, 4096 envs, 300 iters) - [ ] Run test_mdp_warp_parity.py and test_mdp_warp_parity_new_terms.py - [ ] Run test_action_warp_parity.py - [ ] Verify WrenchComposer COM pose is fresh (not stale) during graph replay --------- Co-authored-by: Antoine Richard <antoiner@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>
Summary
Adds experimental warp infrastructure and direct warp environments from
dev/newton, adapted fordevelop. Absorbs PR #4812 (inhand-cp).isaaclab_experimentalDirectRLEnvWarpbase class with CUDA graph capture viaWarpGraphCacheInteractiveSceneWarpwith warp-native env_mask reset supportepisode_length_bufproperty with in-place copy to preserve warp/torch shared memoryisaaclab_tasks_experimental(direct envs)Isaac-Cartpole-Direct-Warp-v0)Isaac-Ant-Direct-Warp-v0)Isaac-Humanoid-Direct-Warp-v0)isaaclab_tasks.direct.<env>.agentsdirectly — no duplicationAPI adaptations for
developfind_joints2-value return (indices, names)episode_length_bufas property with in-placecopy_()for warp/torch shared memoryself._ALL_ENV_MASKfrom base envset_joint_effort_target_maskfor CUDA graph compatibility_get_observationsreturns{"policy": tensor}dictsafe_normalizeto guardwp.normalizeon zero-length vectorsPresetCfgvaluesTest results (rsl_rl, 4096 envs, 300 iterations, headless,
newton==1.0.0)Test plan