[Exp] Cherry-pick direct warp envs from dev/newton by hujc7 · Pull Request #4905 · isaac-sim/IsaacLab

hujc7 · 2026-03-10T08:13:55Z

Summary

Adds experimental warp infrastructure and direct warp environments from dev/newton, adapted for develop. Absorbs PR #4812 (inhand-cp).

`isaaclab_experimental`

DirectRLEnvWarp base class with CUDA graph capture via WarpGraphCache
InteractiveSceneWarp with warp-native env_mask reset support
episode_length_buf property with in-place copy to preserve warp/torch shared memory

`isaaclab_tasks_experimental` (direct envs)

Cartpole (Isaac-Cartpole-Direct-Warp-v0)
Ant (Isaac-Ant-Direct-Warp-v0)
Humanoid (Isaac-Humanoid-Direct-Warp-v0)
Locomotion base warp env (shared by ant/humanoid)
InHand Manipulation + Allegro Hand
Agent configs reference stable isaaclab_tasks.direct.<env>.agents directly — no duplication

API adaptations for `develop`

find_joints 2-value return (indices, names)
episode_length_buf as property with in-place copy_() for warp/torch shared memory
self._ALL_ENV_MASK from base env
set_joint_effort_target_mask for CUDA graph compatibility
_get_observations returns {"policy": tensor} dict
safe_normalize to guard wp.normalize on zero-length vectors
Solver configs aligned with stable develop PresetCfg values

Test results (rsl_rl, 4096 envs, 300 iterations, headless, `newton==1.0.0`)

Env	Status	Time
Cartpole	PASS	70s
Ant	PASS	98s
Humanoid	PASS	172s

Test plan

Cartpole: 300 iteration training converges
Ant: 300 iteration training converges
Humanoid: 300 iteration training converges

greptile-apps · 2026-03-10T08:19:34Z

Greptile Summary

This PR cherry-picks the experimental Warp-native RL environment infrastructure (DirectRLEnvWarp, InteractiveSceneWarp, WarpGraphCache) and five direct environments (Cartpole, Ant, Humanoid, InHand Manipulation, Allegro Hand) from dev/newton into develop, adapting the API to match develop's conventions (find_joints 2-tuple return, episode_length_buf property with in-place copy, set_joint_effort_target_mask, etc.). The three tested environments (Cartpole, Ant, Humanoid) converge successfully. Several previously flagged issues have been addressed. The remaining concerns are:

AllegroHandWarpEnvCfg missing use_cuda_graph=True — unlike AntWarpEnvCfg and HumanoidWarpEnvCfg, the Allegro Hand newton config does not enable Newton's internal CUDA graph, inconsistent with the rest of the PR and likely harming throughput for the most compute-intensive environment.
InHandManipulationWarpEnv is absent from the PR test results — the most complex environment (980 lines, multi-asset, atomic reward reductions, consecutive-successes tracking) has not been validated with a training run.
Sensors reset with env_ids=None — InteractiveSceneWarp.reset passes env_ids=None to sensors (reset-all semantics) when a partial mask reset is intended, mirroring the same latent bug already acknowledged for deformable objects and surface grippers; not observable today since no current warp env uses sensors, but undocumented.
WarpGraphCache silent capture assumption — correctness of the capture-or-replay idiom depends on wp.ScopedCapture recording kernels without executing them; a comment or reference to the Warp docs would guard against future regressions.
Potential velocity component order ambiguity in locomotion_env_warp.py — the spatial_vectorf convention for root_vel_w (angular-first vs linear-first) is not documented, which could cause silent policy-incompatibility between the warp and standard direct locomotion envs.

Confidence Score: 3/5

Safe to merge for the three tested environments (Cartpole, Ant, Humanoid); InHand Manipulation should not be considered production-ready until tested.
Three of five environments are tested and converge. The infrastructure (DirectRLEnvWarp, WarpGraphCache, InteractiveSceneWarp) is well-designed and the most critical previously flagged bugs (velocity reset indices, to_targets computation, _get_observations return type, goal_pos_w guard, Humanoid use_cuda_graph) are all fixed. However, the InHand Manipulation environment has no test coverage and its config has a notable inconsistency (use_cuda_graph not enabled for Newton). The sensor.reset(None) latent bug in InteractiveSceneWarp is undocumented. For an experimental PR, these issues are acceptable to land but warrant follow-up before the env set is considered complete.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/allegro_hand/allegro_hand_warp_env_cfg.py and source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/inhand_manipulation/inhand_manipulation_warp_env.py need the most attention — missing Newton CUDA graph flag and no training validation.

Important Files Changed

Filename	Overview
source/isaaclab_experimental/isaaclab_experimental/envs/direct_rl_env_warp.py	New `DirectRLEnvWarp` base class implementing the direct RL env interface with CUDA graph capture via `WarpGraphCache`; well-structured with correct property-setter pattern for `episode_length_buf`, clean split between capturable (`_step_warp_end_pre/post`) and non-capturable (scene writes, visualization) phases.
source/isaaclab_experimental/isaaclab_experimental/envs/interactive_scene_warp.py	Extends `InteractiveScene` with `env_mask` support for warp-native resets; correctly delegates articulations and rigid objects but passes `env_ids=None` (reset-all) to sensors, with the same latent bug already acknowledged for deformable objects and surface grippers.
source/isaaclab_experimental/isaaclab_experimental/utils/warp_graph_cache.py	Clean CUDA graph capture-or-replay utility; correctness depends on `wp.ScopedCapture` recording without executing kernels (standard CUDA graph behavior), which should be documented to guard against future regressions.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/cartpole/cartpole_warp_env.py	Fully warp-native Cartpole env; all previously flagged issues (hardcoded velocity indices, comment unit mismatch) are addressed; CUDA graph capturable reset/action/observation kernels look correct.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/locomotion/locomotion_env_warp.py	Shared locomotion base for Ant/Humanoid; previously flagged `reset_root` `to_targets` bug is fixed; a potential velocity component ordering concern (angular vs linear placement in observation vector) is noted but doesn't block convergence.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/inhand_manipulation/inhand_manipulation_warp_env.py	Most complex env in the PR (980 lines); previously flagged `goal_pos_w` guard issue is fixed; however this environment is absent from the PR test results, and the AllegroHand config is missing `use_cuda_graph=True`, making its training performance and correctness unvalidated.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/allegro_hand/allegro_hand_warp_env_cfg.py	AllegroHand config omits `use_cuda_graph=True` in `newton_cfg`, inconsistent with Ant and Humanoid configs; this will disable Newton's internal CUDA graph for the most compute-intensive environment in the PR.
source/isaaclab_rl/isaaclab_rl/rsl_rl/vecenv_wrapper.py	Cleanly adds `DirectRLEnvWarp` to the allowed types via a try/except import guard; existing `get_observations` path (calling `_get_observations()` directly) correctly works with the warp env's `-> dict` return signature.

Sequence Diagram

sequenceDiagram
    participant RL as RSL-RL Runner
    participant Wrapper as RslRlVecEnvWrapper
    participant Env as DirectRLEnvWarp
    participant Cache as WarpGraphCache
    participant Scene as InteractiveSceneWarp

    RL->>Wrapper: step(actions)
    Wrapper->>Env: step(actions)
    Env->>Env: _pre_physics_step(wp.from_torch(actions))
    loop decimation
        Env->>Cache: capture_or_replay("action", step_warp_action)
        Cache-->>Env: graph captured/replayed
        Env->>Scene: write_data_to_sim() [outside graph]
        Env->>Env: sim.step()
        Env->>Scene: scene.update()
    end
    Env->>Cache: capture_or_replay("end_pre", _step_warp_end_pre)
    Note over Cache,Env: Captured: add_to_env → _get_dones → _get_rewards → _reset_idx(reset_buf)
    Env->>Scene: write_data_to_sim() [outside graph]
    Env->>Cache: capture_or_replay("end_post", _step_warp_end_post)
    Note over Cache,Env: Captured: _get_observations()
    Env->>Env: _post_step_visualize() [outside graph]
    Env-->>Wrapper: obs_dict, rewards, terminated, truncated, extras
    Wrapper-->>RL: TensorDict(obs), rew, dones, extras

Comments Outside Diff (5)

source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/allegro_hand/allegro_hand_warp_env_cfg.py, line 46-52 (link)

AllegroHandWarpEnvCfg missing use_cuda_graph=True in newton_cfg

AntWarpEnvCfg and HumanoidWarpEnvCfg (after the fix in the previous review thread) both set use_cuda_graph=True inside newton_cfg. AllegroHandWarpEnvCfg omits this flag, defaulting to False:
```
newton_cfg = NewtonCfg(
    solver_cfg=solver_cfg,
    num_substeps=2,
    debug_mode=False,
    # use_cuda_graph missing → defaults to False
)
```
DirectRLEnvWarp's entire value proposition is CUDA graph capture via WarpGraphCache. Disabling Newton's internal CUDA graph for the most complex task (Allegro Hand) while enabling it for Ant and Humanoid creates an inconsistency that will hurt training throughput. If there's a reason this must be disabled (e.g., a known compatibility issue with ls_parallel=False or the solver="newton" variant), it should be documented with a comment explaining the constraint.
source/isaaclab_experimental/isaaclab_experimental/envs/interactive_scene_warp.py, line 40-42 (link)

Sensors also receive env_ids=None (reset-all) when partial mask reset is intended

The existing PR thread already acknowledged the same latent bug for deformable_object and surface_gripper. Sensors have the same problem:
```
for sensor in self._sensors.values():
    sensor.reset(env_ids)  # env_ids is always None here
```
Since DirectRLEnvWarp._reset_idx always calls scene.reset(env_ids=None, env_mask=mask), when only a subset of environments needs resetting (e.g., during _step_warp_end_pre), sensor.reset(None) resets all sensors — the opposite of what is intended.

None of the current warp environments use sensors, so there is no observable failure today. However, since the class is designed for future extension, and sensors are very common in IsaacLab environments, this should be guarded the same way deformable objects and surface grippers are (with a comment or a mask→ids conversion). Consider adding the same acknowledgment comment as used for the deformable/gripper paths, so the limitation is consistently documented.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/locomotion/locomotion_env_warp.py, line 71-84 (link)

Velocity component ordering in observations may be swapped relative to the spatial_vectorf convention

The observations kernel assigns vec_loc indices [0]–[5] to observation positions [1]–[6]. Based on the comment in inhand_manipulation_warp_env.py (line 363: "spatial_vectorf layout: [0:3]=angular, [3:6]=linear"), vec_loc[0:3] is angular velocity and vec_loc[3:6] is linear velocity:
```
observations[env_index, 1] = velocity[env_index][0]   # → angular_x (no scale)
observations[env_index, 2] = velocity[env_index][1]   # → angular_y (no scale)
observations[env_index, 3] = velocity[env_index][2]   # → angular_z (no scale)
observations[env_index, 4] = velocity[env_index][3] * angular_velocity_scale  # → linear_x * angular_velocity_scale
observations[env_index, 5] = velocity[env_index][4] * angular_velocity_scale  # → linear_y * angular_velocity_scale
observations[env_index, 6] = velocity[env_index][5] * angular_velocity_scale  # → linear_z * angular_velocity_scale
```
If this convention holds, the scale (angular_velocity_scale = 1.0 for Ant, 0.25 for Humanoid) is applied to the linear components rather than the angular ones, and the two groups are swapped compared to the standard IsaacLab direct-env observation layout (linear first, angular second). Training still converges because the policy can adapt to any consistent observation encoding. However, a policy trained with this warp env cannot be reused with the standard AntEnv/HumanoidEnv without re-training, which may be surprising to users.

If Newton's root_vel_w actually uses [linear, angular] order (opposite to the inhand comment), this is fine. Please verify the Newton spatial velocity convention and, if it does differ from the inhand convention, add a comment here (e.g. # Newton root_vel_w layout: [0:3]=linear, [3:6]=angular) to prevent future confusion.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/inhand_manipulation/inhand_manipulation_warp_env.py, line 545-550 (link)

InHandManipulationWarpEnv is absent from the PR's test results

The PR test table covers Cartpole, Ant, and Humanoid, but InHandManipulationWarpEnv (Allegro Hand, Isaac-AllegroHand-InHandManipulation-Warp-v0) is not listed. Given that this is the most complex environment in the PR (980-line file, multi-asset scene, CUDA-graph-captured reward with atomic reductions, consecutive-successes tracking, goal-marker visualization), its absence from tested environments is a notable gap. Untested convergence behavior, the missing use_cuda_graph=True in Newton config (see separate comment in allegro_hand_warp_env_cfg.py), and the obs_nonfinite_flag sanitizer all suggest this env may still have unresolved issues. Please include at least a short training run (e.g., 500 iterations) in the test matrix before merging.
source/isaaclab_experimental/isaaclab_experimental/utils/warp_graph_cache.py, line 55-65 (link)

capture_or_replay first-call behavior depends on whether wp.ScopedCapture executes kernels

The docstring states that on the first call the function is "recorded into a CUDA graph and then immediately replayed". Under standard CUDA graph capture semantics (cudaStreamBeginCapture), kernels submitted to a captured stream are not executed — they are only recorded. The wp.capture_launch call immediately after is therefore the first actual execution, which is correct.

However, this assumption (capture-without-execution) should be validated against Warp's implementation. If Warp's ScopedCapture does execute kernels during recording (eager-capture mode), then on the very first training step every operation inside _step_warp_end_pre / _step_warp_end_post would run twice:
- add_to_env would increment episode_length_buf by 2 instead of 1
- _reset_idx would fire twice for the same reset_buf
- All reward and observation kernels would launch twice
A one-line comment confirming the no-eager-execution assumption (or a reference to the Warp docs section that guarantees it) would prevent a subtle correctness regression if Warp ever changes its capture behavior.

_{Last reviewed commit: c5259f7}

greptile-apps · 2026-03-10T08:19:45Z

+        joint_vel[env_index, 0] = default_joint_vel[env_index, 0]
+        joint_vel[env_index, 1] = default_joint_vel[env_index, 1]


Hardcoded joint indices break generality of velocity reset

Joint positions are reset using the parameterized cart_dof_idx / pole_dof_idx, but joint velocities are reset using hardcoded indices 0 and 1. For the default cartpole this happens to be correct, but if the joint ordering ever changes (or this kernel is reused), velocities would be silently reset for the wrong degrees of freedom.

Suggested change

joint_vel[env_index, 0] = default_joint_vel[env_index, 0]

joint_vel[env_index, 1] = default_joint_vel[env_index, 1]

joint_vel[env_index, cart_dof_idx] = default_joint_vel[env_index, cart_dof_idx]

joint_vel[env_index, pole_dof_idx] = default_joint_vel[env_index, pole_dof_idx]

Fixed — now uses cart_dof_idx/pole_dof_idx for velocity reset.

greptile-apps · 2026-03-10T08:19:47Z

+        to_targets[env_index] = wp.transform_get_translation(root_pose[env_index]) - wp.transform_get_translation(
+            default_root_pose[env_index]
+        )
+        to_targets[env_index][2] = 0.0
+        potentials[env_index] = -wp.length(to_targets[env_index]) / dt


to_targets and initial potentials in reset_root are computed incorrectly

After the root pose is set to default_root_pose + env_origins, the code computes:

to_targets = (default_root_pose + env_origins) - default_root_pose = env_origins

so to_targets holds the environment origin offset (~4–10 m), not the vector from the agent to the distant target (initialized to env_origins + (1000, 0, 0)). As a result potentials = -|env_origins| / dt instead of the expected ≈ -1000 / dt.

In practice this does not corrupt training because _reset_idx immediately calls _compute_intermediate_values() at its end, which correctly overwrites both to_targets and potentials (and sets prev_potentials from the now-correct potentials). However, the intention of this initial calculation in reset_root is clearly wrong and could confuse future readers or cause subtle bugs if the subsequent _compute_intermediate_values() call is ever removed or reordered.

The correct expression would be:

to_targets[env_index] = targets[env_index] - wp.transform_get_translation(root_pose[env_index]) to_targets[env_index][2] = wp.float32(0.0) potentials[env_index] = -wp.length(to_targets[env_index]) / dt

(where targets would need to be added as a kernel input)

Fixed — reset_root now takes the targets array and computes to_targets = targets - root_pose correctly.

greptile-apps · 2026-03-10T08:19:49Z

+    @abstractmethod
+    def _get_observations(self) -> None:
+        """Compute the observations for the environment.
+
+        Writes results into the observation buffers (e.g., ``self.obs_buf``).
+        """
+        raise NotImplementedError(f"Please implement the '_get_observations' method for {self.__class__.__name__}.")


_get_observations declared -> None but implementations return a dict relied on by vecenv_wrapper

The abstract signature declares a None return type and the docstring says "Writes results into the observation buffers", but every concrete implementation (CartpoleWarpEnv, LocomotionWarpEnv, etc.) returns {"policy": self.torch_obs_buf}. The updated RslRlVecEnvWrapper.get_observations() in this PR also calls _get_observations() and uses its return value directly:

obs_dict = self.unwrapped._get_observations() return TensorDict(obs_dict, batch_size=[self.num_envs])

If a new subclass faithfully follows the declared -> None contract, get_observations() will raise a TypeError when constructing the TensorDict. The return type should be updated to make the contract explicit:

Suggested change

@abstractmethod

def _get_observations(self) -> None:

"""Compute the observations for the environment.

Writes results into the observation buffers (e.g., ``self.obs_buf``).

"""

raise NotImplementedError(f"Please implement the '_get_observations' method for {self.__class__.__name__}.")

def _get_observations(self) -> dict:

and the docstring updated to reflect that the method both writes buffers and returns the observation dict.

Fixed — changed abstract signature to -> dict with updated docstring.

AntoineRichard · 2026-03-10T13:48:00Z

+    "numpy",
+    "prettytable==3.3.0",
+    "toml",
+    "hidapi",


This is not needed?

Cleaned up — the Kit-only [python.pipapi] section was removed. File now has only [package], [dependencies], [core], and [[python.module]].

AntoineRichard · 2026-03-10T13:48:07Z

+    "toml",
+    "hidapi",
+    "gymnasium==0.29.0",
+    "trimesh"


This is no needed?

Cleaned up — the Kit-only [python.pipapi] section was removed. File now has only [package], [dependencies], [core], and [[python.module]].

AntoineRichard · 2026-03-10T13:48:14Z

+[python.pipapi]
+requirements = [
+    "numpy",
+    "prettytable==3.3.0",


This should not be needed?

Cleaned up — removed the entire [python.pipapi] section (Kit-only, not needed for standalone package). Matched the isaaclab_newton extension.toml pattern.

AntoineRichard · 2026-03-10T13:48:25Z

+    "numpy",
+    "prettytable",
+    "toml",
+    "hid",


Not needed?

Cleaned up — the Kit-only [python.pipapi] section was removed. File now has only [package], [dependencies], [core], and [[python.module]].

AntoineRichard · 2026-03-10T13:48:29Z

+    "toml",
+    "hid",
+    "gymnasium",
+    "trimesh"


Not needed?

Cleaned up — the Kit-only [python.pipapi] section was removed. File now has only [package], [dependencies], [core], and [[python.module]].

AntoineRichard · 2026-03-10T13:53:32Z

+    "numpy>2",
+    "warp-lang>=1.9.0.dev20250825",  # TODO: update to 1.11.0
+    "torch>=2.7",
+    "prettytable==3.3.0",


Do we need this?

Removed — INSTALL_REQUIRES stripped to just toml. All other deps inherited transitively from isaaclab.

AntoineRichard · 2026-03-10T13:53:39Z

+INSTALL_REQUIRES = [
+    # generic
+    "numpy>2",
+    "warp-lang>=1.9.0.dev20250825",  # TODO: update to 1.11.0


This should point to warp1.12?

Stripped all redundant deps — INSTALL_REQUIRES now only has toml (needed by setup.py itself). numpy, torch, warp, prettytable are all inherited transitively from isaaclab.

AntoineRichard · 2026-03-10T13:53:45Z

+    # generic
+    "numpy>2",
+    "warp-lang>=1.9.0.dev20250825",  # TODO: update to 1.11.0
+    "torch>=2.7",


Removed — deps stripped. See above.

AntoineRichard · 2026-03-10T13:54:50Z

I think we already have a modified timer in develop. Do we need that one too?

Switched direct_rl_env_warp.py to import from isaaclab.utils.timer instead. The API is compatible (msg, name, enable kwargs all match). The experimental timer file is kept in place but no longer imported by the base class.

removed now.

AntoineRichard · 2026-03-10T14:01:54Z

+with contextlib.suppress(ImportError):
+    import isaaclab_tasks_experimental  # noqa: F401


Do we need this?

Yes — this triggers gym registration for experimental tasks (Isaac-Cartpole-Direct-Warp-v0, etc.). The contextlib.suppress(ImportError) makes it optional so users without isaaclab_tasks_experimental installed are unaffected.

AntoineRichard · 2026-03-10T14:02:35Z

@hujc7 Can you double check that ant doesn't work after the newton update + our internal fix?

hujc7 · 2026-03-10T15:47:21Z

@hujc7 Can you double check that ant doesn't work after the newton update + our internal fix?

develop TOT was not working for my other PRs so I used an older commit. Will check again.

hujc7 · 2026-03-11T07:22:46Z

@AntoineRichard Confirmed — all envs work now. Rebased onto latest develop, aligned solver configs to stable PresetCfg values, and referenced stable agent configs directly. All three envs (cartpole, ant, humanoid) pass 300 iterations at 4096 envs.

hujc7 · 2026-03-11T07:29:07Z

@greptileai Review

greptile-apps · 2026-03-11T07:34:04Z

+
+        goal_rot[env_id] = randomize_rotation(rand0, rand1, x_unit_vec, y_unit_vec)
+        reset_goal_buf[env_id] = False
+
+    # Warp-native addition: goal position in world frame.
+    goal_pos_w[env_id] = goal_pos[env_id] + env_origins[env_id]


goal_pos_w update runs for every env on every step, not just masked ones

The assignment on line 93 is outside the if env_mask[env_id]: guard, so it executes for all environments on every invocation of _reset_target_pose. Because _reset_target_pose is called from _get_rewards (which is captured inside the CUDA graph via _step_warp_end_pre), this means goal_pos_w is recomputed for all envs every step, regardless of reset_goal_buf.

While functionally harmless today (because goal_pos and env_origins are constant), this is semantically incorrect: the intent is clearly to update goal_pos_w only for environments whose goal is being reset. Any future change that makes goal_pos per-env-mutable (e.g., curriculum) would silently overwrite the world-frame goal for non-resetting envs each step.

Suggested change

goal_rot[env_id] = randomize_rotation(rand0, rand1, x_unit_vec, y_unit_vec)

reset_goal_buf[env_id] = False

# Warp-native addition: goal position in world frame.

goal_pos_w[env_id] = goal_pos[env_id] + env_origins[env_id]

env_id = wp.tid()

if env_mask[env_id]:

rand0 = wp.randf(rng_state[env_id], wp.float32(-1.0), wp.float32(1.0))

rng_state[env_id] += wp.uint32(1)

rand1 = wp.randf(rng_state[env_id], wp.float32(-1.0), wp.float32(1.0))

rng_state[env_id] += wp.uint32(1)

goal_rot[env_id] = randomize_rotation(rand0, rand1, x_unit_vec, y_unit_vec)

reset_goal_buf[env_id] = False

# Warp-native addition: goal position in world frame.

goal_pos_w[env_id] = goal_pos[env_id] + env_origins[env_id]

Fixed — moved goal_pos_w update inside the env_mask guard.

greptile-apps · 2026-03-11T07:34:05Z

+        for deformable_object in self._deformable_objects.values():
+            deformable_object.reset(env_ids)
+        for rigid_object in self._rigid_objects.values():
+            rigid_object.reset(env_ids, env_mask=env_mask)
+        for surface_gripper in self._surface_grippers.values():
+            surface_gripper.reset(env_ids)
+        for rigid_object_collection in self._rigid_object_collections.values():
+            rigid_object_collection.reset(env_ids, env_mask=env_mask)
+        # -- sensors (no env_mask support)
+        for sensor in self._sensors.values():
+            sensor.reset(env_ids)


env_mask-only reset silently resets all deformable objects and surface grippers

When _reset_idx is called with only env_mask (and env_ids=None), the assets that do not yet support env_mask — deformable_object.reset(env_ids) and surface_gripper.reset(env_ids) — receive env_ids=None, which typically means reset all environments. This is the exact opposite of what the caller intended.

Current environments in this PR (Cartpole, Ant, Humanoid, InHand) happen to have no deformable objects or surface grippers, so there is no observable failure. However, any future environment that uses either asset type and calls _reset_idx with a partial mask (e.g., during per-env resets inside a CUDA-graph-captured path) will silently reset every environment on every step.

Consider either:

Converting env_mask → env_ids before delegating to these two asset types (consistent with how env_mask-unaware assets already work), or

Adding a guard/warning to document the limitation:

if env_mask is not None and env_ids is None: # deformable objects / surface grippers do not support env_mask; # passing env_ids=None will reset ALL environments. pass # or raise / warn

Acknowledged — latent bug. No current envs use deformable objects or surface grippers with warp env_mask path.

greptile-apps · 2026-03-11T07:34:06Z

+
+    # reset
+    max_cart_pos = 3.0  # the cart is reset if it exceeds that position [m]
+    initial_pole_angle_range = [-0.25, 0.25]  # the range in which the pole angle is sampled from on reset [rad]


Config comment says [rad] but the reset kernel multiplies the range by wp.pi

The config field is documented as radians ([rad]), but the reset kernel on line 198–200 does:

wp.randf(state[env_index], initial_pose_angle_range[0] * wp.pi, initial_pose_angle_range[1] * wp.pi)

With the default [-0.25, 0.25], the actual reset range is [-0.25π, 0.25π] ≈ [-0.785, 0.785] rad — more than three times larger than the comment implies. The comment should be updated to clarify that the values are in units of π radians:

Suggested change

initial_pole_angle_range = [-0.25, 0.25] # the range in which the pole angle is sampled from on reset [rad]

initial_pole_angle_range = [-0.25, 0.25] # pole angle reset range in units of π [rad] (actual range ≈ ±0.785 rad)

Fixed — comment updated to [x pi rad] to match the kernel which multiplies by pi.

greptile-apps · 2026-03-11T07:34:07Z

+    newton_cfg = NewtonCfg(
+        solver_cfg=solver_cfg,
+        num_substeps=2,
+        debug_mode=False,
+    )


HumanoidWarpEnvCfg does not set use_cuda_graph=True unlike AntWarpEnvCfg

AntWarpEnvCfg.newton_cfg sets use_cuda_graph=True, but HumanoidWarpEnvCfg.newton_cfg omits this flag (defaults to False). Since the entire DirectRLEnvWarp infrastructure is designed around CUDA graph capture for performance, disabling Newton's own CUDA graph for Humanoid while enabling it for Ant creates an inconsistency that may hurt Humanoid training throughput.

If this is intentional (e.g., Humanoid requires update_data_interval=2 which conflicts with Newton's graph capture), the reason should be documented with a comment:

Suggested change

newton_cfg = NewtonCfg(

solver_cfg=solver_cfg,

num_substeps=2,

debug_mode=False,

)

newton_cfg = NewtonCfg(

solver_cfg=solver_cfg,

num_substeps=2,

debug_mode=False,

# use_cuda_graph omitted: Newton graph capture is incompatible with update_data_interval > 1

)

Fixed — added use_cuda_graph=True to match stable and AntWarpEnvCfg.

hujc7 · 2026-03-11T23:21:52Z

Validation: Warp vs Stable (Newton) Direct Env Parity

300 iterations, 4096 envs, presets=newton for stable envs.

Final metrics (last iteration):

Env	Version	Mean Reward	Mean Ep Length	Training Time
Cartpole	warp	296.64	299.00	68s
Cartpole	stable	297.04	299.00	76s
Ant	warp	9624.26	895.02	102s
Ant	stable	8570.36	880.11	134s
Humanoid	warp	7318.83	877.63	178s
Humanoid	stable	8120.81	877.18	210s
Allegro	warp	91.17	273.93	430s
Allegro	stable	104.48	287.64	486s

Timing breakdown (last iteration):

Env	Version	Collection	Learning	Iter Total	SPS
Cartpole	warp	0.098s	0.183s	0.280s	233,105
Cartpole	stable	0.091s	0.073s	0.160s	399,136
Ant	warp	0.134s	0.128s	0.260s	499,177
Ant	stable	0.222s	0.117s	0.340s	386,920
Humanoid	warp	0.413s	0.129s	0.540s	241,926
Humanoid	stable	0.505s	0.138s	0.640s	203,757
Allegro	warp	1.349s	0.276s	1.620s	40,338
Allegro	stable	1.463s	0.271s	1.730s	37,781

Observations:

Episode lengths (physics behavior) match closely across all 4 env pairs
Reward differences within normal RL training variance, no systematic divergence
Warp collection time is faster for larger envs (Ant +40%, Humanoid +18%), but Cartpole warp has a learning time regression (0.183s vs 0.073s) — likely a tensor transfer overhead in the warp→torch handoff for the PPO update
All 8 runs passed (4 warp + 4 stable with presets=newton)

hujc7 · 2026-03-12T06:14:15Z

@greptileai Review

hujc7 · 2026-03-12T08:38:07Z

@greptileai Review

AntoineRichard · 2026-03-12T17:19:46Z

Do we need this? It should not be needed anymore.

Removed — now imports directly from stable isaaclab.envs.utils.spaces.

Add isaaclab_experimental package with DirectRLEnvWarp base class, InteractiveSceneWarp, and WarpGraphCache utility. Add direct warp environments in isaaclab_tasks_experimental: - Cartpole, Ant, Humanoid, Locomotion (base), InHand Manipulation, Allegro Hand, with agent configs for rsl_rl, rl_games, skrl, sb3. Adapt to develop base class API: - find_joints 2-value return (indices, names) - episode_length_buf as property with in-place copy for warp sync - _ALL_ENV_MASK on base env instead of articulation - set_joint_effort_target_mask for CUDA graph compatibility - _get_observations returns dict for rsl_rl wrapper Align solver configs with stable develop PresetCfg values. Add safe_normalize to guard against NaN from wp.normalize. Fix reset_root to_targets computation to use actual targets. Fix cartpole reset kernel to use parameterized joint indices. Clean up extension.toml and setup.py dependencies. Switch timer import to isaaclab.utils.timer.

hujc7 · 2026-03-12T23:46:35Z

@greptileai Review

hujc7 · 2026-03-13T07:49:07Z

@greptileai Review

Signed-off-by: Antoine RICHARD <antoiner@nvidia.com>

## Summary Adds experimental warp infrastructure and direct warp environments from `dev/newton`, adapted for `develop`. Absorbs PR isaac-sim#4812 (inhand-cp). ### `isaaclab_experimental` * `DirectRLEnvWarp` base class with CUDA graph capture via `WarpGraphCache` * `InteractiveSceneWarp` with warp-native env_mask reset support * `episode_length_buf` property with in-place copy to preserve warp/torch shared memory ### `isaaclab_tasks_experimental` (direct envs) * **Cartpole** (`Isaac-Cartpole-Direct-Warp-v0`) * **Ant** (`Isaac-Ant-Direct-Warp-v0`) * **Humanoid** (`Isaac-Humanoid-Direct-Warp-v0`) * **Locomotion** base warp env (shared by ant/humanoid) * **InHand Manipulation** + **Allegro Hand** * Agent configs reference stable `isaaclab_tasks.direct.<env>.agents` directly — no duplication ### API adaptations for `develop` * `find_joints` 2-value return (indices, names) * `episode_length_buf` as property with in-place `copy_()` for warp/torch shared memory * `self._ALL_ENV_MASK` from base env * `set_joint_effort_target_mask` for CUDA graph compatibility * `_get_observations` returns `{"policy": tensor}` dict * `safe_normalize` to guard `wp.normalize` on zero-length vectors * Solver configs aligned with stable develop `PresetCfg` values ### Test results (rsl_rl, 4096 envs, 300 iterations, headless, `newton==1.0.0`) | Env | Status | Time | |-----|--------|------| | Cartpole | PASS | 70s | | Ant | PASS | 98s | | Humanoid | PASS | 172s | ## Test plan - [x] Cartpole: 300 iteration training converges - [x] Ant: 300 iteration training converges - [x] Humanoid: 300 iteration training converges --------- Signed-off-by: Antoine RICHARD <antoiner@nvidia.com> Co-authored-by: Antoine RICHARD <antoiner@nvidia.com>

…on (#4829) ## Summary Cherry-pick of warp manager-based env infrastructure from `dev/newton`, refactored for `develop`. ### `isaaclab_experimental` * Added warp-compatible manager implementations (`ActionManager`, `ObservationManager`, `EventManager`, `CommandManager`, `TerminationManager`, `RewardManager`) with Warp kernel execution and CUDA graph capture support. * Added `ManagerCallSwitch` utility for per-manager eager/captured dispatch, configured via `MANAGER_CALL_CONFIG` env var. * Added `ManagerBasedEnvWarp` and `ManagerBasedRLEnvWarp` orchestration env classes. * Added warp MDP terms (observations, rewards, terminations, events, joint actions). * Added utility modules: buffers (circular buffer), modifiers, noise models, warp kernels/helpers. * Added experimental `SceneEntityCfg` with warp joint mask/ids for kernel-level joint selection. * Generalized configclass default materialization in `ManagerBase` for automatic `SceneEntityCfg` resolution. ### `isaaclab_tasks_experimental` * Added `Isaac-Cartpole-Warp-v0` task as reference environment for warp manager-based workflow. ### `isaaclab_rl` * Updated rsl_rl, rl_games, sb3, skrl wrappers to accept `ManagerBasedRLEnvWarp` and `DirectRLEnvWarp`. ### `isaaclab` * Fixed `SettingsManager` to catch `RuntimeError` when carb is unavailable. * Minor comment cleanup in `ObservationManager`. ## Dependencies Must be merged **after**: 1. #4905 (merged) ## Validated base Validated against develop at `7588fa9ed5f`. ## Known limitations * `Scene_write_data_to_sim` capped to mode=1 (eager) via `MAX_MODE_OVERRIDES` — articulation `_apply_actuator_model` uses `wp.to_torch + torch indexing`, not CUDA graph capture-safe. ## Test plan - [x] `Isaac-Cartpole-Warp-v0` training (4096 envs, 300 iters, mode=2): converges (reward 4.95, ep_len 300) --------- Co-authored-by: Antoine RICHARD <antoiner@nvidia.com>

…on (#4945) ## Summary * Cherry-picks [Newton] Migrate more envs and mdps to warp (#4690) onto develop * Cherry-picks [Newton] Add capture safety guards and fix WrenchComposer stale COM pose (#4779) onto develop ### Changes included - Warp-first MDP terms (observations, rewards, events, terminations, actions) for manager-based envs - Tested warp env configs: Ant, Humanoid, Cartpole, locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), Franka/UR10 reach - ManagerCallSwitch max_mode cap and scene capture config - MDP kernels made graph-capturable with consolidated test infrastructure - capture_unsafe safety guards on lazy-evaluated derived properties in articulation/rigid_object data - WrenchComposer fix: use fresh COM pose buffers instead of stale cached link poses ### Dropped - G1-29-DOF warp env (Isaac-Velocity-Flat-G1-Warp-v1): removed because the stable g1_29_dofs task config does not exist on develop (only on dev/newton). Warp env PRs should only add warp frontends for envs that already exist in the stable package. ## Dependencies Must be merged **after** these PRs (in order): 1. #4905 (merged) 2. #4829 ## Validated base Validated against develop at 7588fa9. ## Test plan - [x] Run warp env training sweep across all manager-based env configs (14/14 pass, mode=2, 4096 envs, 300 iters) - [ ] Run test_mdp_warp_parity.py and test_mdp_warp_parity_new_terms.py - [ ] Run test_action_warp_parity.py - [ ] Verify WrenchComposer COM pose is fresh (not stale) during graph replay --------- Co-authored-by: Antoine Richard <antoiner@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>

…on (isaac-sim#4945) ## Summary * Cherry-picks [Newton] Migrate more envs and mdps to warp (isaac-sim#4690) onto develop * Cherry-picks [Newton] Add capture safety guards and fix WrenchComposer stale COM pose (isaac-sim#4779) onto develop ### Changes included - Warp-first MDP terms (observations, rewards, events, terminations, actions) for manager-based envs - Tested warp env configs: Ant, Humanoid, Cartpole, locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), Franka/UR10 reach - ManagerCallSwitch max_mode cap and scene capture config - MDP kernels made graph-capturable with consolidated test infrastructure - capture_unsafe safety guards on lazy-evaluated derived properties in articulation/rigid_object data - WrenchComposer fix: use fresh COM pose buffers instead of stale cached link poses ### Dropped - G1-29-DOF warp env (Isaac-Velocity-Flat-G1-Warp-v1): removed because the stable g1_29_dofs task config does not exist on develop (only on dev/newton). Warp env PRs should only add warp frontends for envs that already exist in the stable package. ## Dependencies Must be merged **after** these PRs (in order): 1. isaac-sim#4905 (merged) 2. isaac-sim#4829 ## Validated base Validated against develop at 7588fa9. ## Test plan - [x] Run warp env training sweep across all manager-based env configs (14/14 pass, mode=2, 4096 envs, 300 iters) - [ ] Run test_mdp_warp_parity.py and test_mdp_warp_parity_new_terms.py - [ ] Run test_action_warp_parity.py - [ ] Verify WrenchComposer COM pose is fresh (not stale) during graph replay --------- Co-authored-by: Antoine Richard <antoiner@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>

hujc7 requested review from ClemensSchwarke, Mayankm96 and ooctipus as code owners March 10, 2026 08:13

github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Mar 10, 2026

greptile-apps Bot reviewed Mar 10, 2026

View reviewed changes

AntoineRichard requested changes Mar 10, 2026

View reviewed changes

hujc7 force-pushed the jichuanh/direct-warp-envs branch from 0d6b502 to 3d60b9d Compare March 11, 2026 04:03

hujc7 requested a review from hhansen-bdai as a code owner March 11, 2026 04:03

github-actions Bot added the infrastructure label Mar 11, 2026

hujc7 force-pushed the jichuanh/direct-warp-envs branch 2 times, most recently from cec15c7 to 495ec7a Compare March 11, 2026 06:10

hujc7 mentioned this pull request Mar 11, 2026

Adds inhand manipulation warp env #4812

Closed

5 tasks

hujc7 force-pushed the jichuanh/direct-warp-envs branch 3 times, most recently from 135024c to 4e54fdd Compare March 11, 2026 06:51

greptile-apps Bot reviewed Mar 11, 2026

View reviewed changes

hujc7 changed the title ~~Cherry-pick direct warp envs from dev/newton~~ [EXP] Cherry-pick direct warp envs from dev/newton Mar 11, 2026

hujc7 changed the title ~~[EXP] Cherry-pick direct warp envs from dev/newton~~ [Exp] Cherry-pick direct warp envs from dev/newton Mar 11, 2026

hujc7 mentioned this pull request Mar 11, 2026

[Exp] Cherry-pick warp MDP migration and capture safety from dev/newton #4945

Merged

4 tasks

hujc7 mentioned this pull request Mar 12, 2026

[Exp] Cherry-pick manager-based warp env infrastructure from dev/newton #4829

Merged

1 task

hujc7 force-pushed the jichuanh/direct-warp-envs branch from 4e54fdd to 03fe0e6 Compare March 12, 2026 08:03

AntoineRichard approved these changes Mar 12, 2026

View reviewed changes

AntoineRichard requested changes Mar 12, 2026

View reviewed changes

hujc7 force-pushed the jichuanh/direct-warp-envs branch from d8bca3f to b037b52 Compare March 12, 2026 22:21

hujc7 force-pushed the jichuanh/direct-warp-envs branch from b037b52 to c5259f7 Compare March 12, 2026 23:27

hujc7 mentioned this pull request Mar 13, 2026

[WIP] Add warp environment docs and timer alignment #4995

Open

AntoineRichard reviewed Mar 13, 2026

View reviewed changes

Comment thread source/isaaclab_experimental/setup.py

AntoineRichard added 3 commits March 13, 2026 13:23

Merge branch 'develop' into jichuanh/direct-warp-envs

1769bbe

Update source/isaaclab_experimental/setup.py

7eaaa3f

Signed-off-by: Antoine RICHARD <antoiner@nvidia.com>

Merge branch 'develop' into jichuanh/direct-warp-envs

2ed1ac9

AntoineRichard approved these changes Mar 13, 2026

View reviewed changes

AntoineRichard merged commit 92145f5 into isaac-sim:develop Mar 13, 2026
9 of 10 checks passed

		joint_vel[env_index, 0] = default_joint_vel[env_index, 0]
		joint_vel[env_index, 1] = default_joint_vel[env_index, 1]

		with contextlib.suppress(ImportError):
		import isaaclab_tasks_experimental # noqa: F401

	initial_pole_angle_range = [-0.25, 0.25] # the range in which the pole angle is sampled from on reset [rad]
	initial_pole_angle_range = [-0.25, 0.25] # pole angle reset range in units of π [rad] (actual range ≈ ±0.785 rad)

Conversation

hujc7 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

isaaclab_experimental

isaaclab_tasks_experimental (direct envs)

API adaptations for develop

Test results (rsl_rl, 4096 envs, 300 iterations, headless, newton==1.0.0)

Test plan

Uh oh!

greptile-apps Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (5)

Uh oh!

greptile-apps Bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AntoineRichard commented Mar 10, 2026

Uh oh!

hujc7 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hujc7 commented Mar 10, 2026 •

edited

Loading

`isaaclab_experimental`

`isaaclab_tasks_experimental` (direct envs)

API adaptations for `develop`

Test results (rsl_rl, 4096 envs, 300 iterations, headless, `newton==1.0.0`)

greptile-apps Bot commented Mar 10, 2026 •

edited

Loading

hujc7 commented Mar 10, 2026 •

edited

Loading