Skip to content

[Exp] Cherry-pick direct warp envs from dev/newton#4905

Merged
AntoineRichard merged 4 commits into
isaac-sim:developfrom
hujc7:jichuanh/direct-warp-envs
Mar 13, 2026
Merged

[Exp] Cherry-pick direct warp envs from dev/newton#4905
AntoineRichard merged 4 commits into
isaac-sim:developfrom
hujc7:jichuanh/direct-warp-envs

Conversation

@hujc7
Copy link
Copy Markdown
Collaborator

@hujc7 hujc7 commented Mar 10, 2026

Summary

Adds experimental warp infrastructure and direct warp environments from dev/newton, adapted for develop. Absorbs PR #4812 (inhand-cp).

isaaclab_experimental

  • DirectRLEnvWarp base class with CUDA graph capture via WarpGraphCache
  • InteractiveSceneWarp with warp-native env_mask reset support
  • episode_length_buf property with in-place copy to preserve warp/torch shared memory

isaaclab_tasks_experimental (direct envs)

  • Cartpole (Isaac-Cartpole-Direct-Warp-v0)
  • Ant (Isaac-Ant-Direct-Warp-v0)
  • Humanoid (Isaac-Humanoid-Direct-Warp-v0)
  • Locomotion base warp env (shared by ant/humanoid)
  • InHand Manipulation + Allegro Hand
  • Agent configs reference stable isaaclab_tasks.direct.<env>.agents directly — no duplication

API adaptations for develop

  • find_joints 2-value return (indices, names)
  • episode_length_buf as property with in-place copy_() for warp/torch shared memory
  • self._ALL_ENV_MASK from base env
  • set_joint_effort_target_mask for CUDA graph compatibility
  • _get_observations returns {"policy": tensor} dict
  • safe_normalize to guard wp.normalize on zero-length vectors
  • Solver configs aligned with stable develop PresetCfg values

Test results (rsl_rl, 4096 envs, 300 iterations, headless, newton==1.0.0)

Env Status Time
Cartpole PASS 70s
Ant PASS 98s
Humanoid PASS 172s

Test plan

  • Cartpole: 300 iteration training converges
  • Ant: 300 iteration training converges
  • Humanoid: 300 iteration training converges

@github-actions github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Mar 10, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 10, 2026

Greptile Summary

This PR cherry-picks the experimental Warp-native RL environment infrastructure (DirectRLEnvWarp, InteractiveSceneWarp, WarpGraphCache) and five direct environments (Cartpole, Ant, Humanoid, InHand Manipulation, Allegro Hand) from dev/newton into develop, adapting the API to match develop's conventions (find_joints 2-tuple return, episode_length_buf property with in-place copy, set_joint_effort_target_mask, etc.). The three tested environments (Cartpole, Ant, Humanoid) converge successfully. Several previously flagged issues have been addressed. The remaining concerns are:

  • AllegroHandWarpEnvCfg missing use_cuda_graph=True — unlike AntWarpEnvCfg and HumanoidWarpEnvCfg, the Allegro Hand newton config does not enable Newton's internal CUDA graph, inconsistent with the rest of the PR and likely harming throughput for the most compute-intensive environment.
  • InHandManipulationWarpEnv is absent from the PR test results — the most complex environment (980 lines, multi-asset, atomic reward reductions, consecutive-successes tracking) has not been validated with a training run.
  • Sensors reset with env_ids=NoneInteractiveSceneWarp.reset passes env_ids=None to sensors (reset-all semantics) when a partial mask reset is intended, mirroring the same latent bug already acknowledged for deformable objects and surface grippers; not observable today since no current warp env uses sensors, but undocumented.
  • WarpGraphCache silent capture assumption — correctness of the capture-or-replay idiom depends on wp.ScopedCapture recording kernels without executing them; a comment or reference to the Warp docs would guard against future regressions.
  • Potential velocity component order ambiguity in locomotion_env_warp.py — the spatial_vectorf convention for root_vel_w (angular-first vs linear-first) is not documented, which could cause silent policy-incompatibility between the warp and standard direct locomotion envs.

Confidence Score: 3/5

  • Safe to merge for the three tested environments (Cartpole, Ant, Humanoid); InHand Manipulation should not be considered production-ready until tested.
  • Three of five environments are tested and converge. The infrastructure (DirectRLEnvWarp, WarpGraphCache, InteractiveSceneWarp) is well-designed and the most critical previously flagged bugs (velocity reset indices, to_targets computation, _get_observations return type, goal_pos_w guard, Humanoid use_cuda_graph) are all fixed. However, the InHand Manipulation environment has no test coverage and its config has a notable inconsistency (use_cuda_graph not enabled for Newton). The sensor.reset(None) latent bug in InteractiveSceneWarp is undocumented. For an experimental PR, these issues are acceptable to land but warrant follow-up before the env set is considered complete.
  • source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/allegro_hand/allegro_hand_warp_env_cfg.py and source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/inhand_manipulation/inhand_manipulation_warp_env.py need the most attention — missing Newton CUDA graph flag and no training validation.

Important Files Changed

Filename Overview
source/isaaclab_experimental/isaaclab_experimental/envs/direct_rl_env_warp.py New DirectRLEnvWarp base class implementing the direct RL env interface with CUDA graph capture via WarpGraphCache; well-structured with correct property-setter pattern for episode_length_buf, clean split between capturable (_step_warp_end_pre/post) and non-capturable (scene writes, visualization) phases.
source/isaaclab_experimental/isaaclab_experimental/envs/interactive_scene_warp.py Extends InteractiveScene with env_mask support for warp-native resets; correctly delegates articulations and rigid objects but passes env_ids=None (reset-all) to sensors, with the same latent bug already acknowledged for deformable objects and surface grippers.
source/isaaclab_experimental/isaaclab_experimental/utils/warp_graph_cache.py Clean CUDA graph capture-or-replay utility; correctness depends on wp.ScopedCapture recording without executing kernels (standard CUDA graph behavior), which should be documented to guard against future regressions.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/cartpole/cartpole_warp_env.py Fully warp-native Cartpole env; all previously flagged issues (hardcoded velocity indices, comment unit mismatch) are addressed; CUDA graph capturable reset/action/observation kernels look correct.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/locomotion/locomotion_env_warp.py Shared locomotion base for Ant/Humanoid; previously flagged reset_root to_targets bug is fixed; a potential velocity component ordering concern (angular vs linear placement in observation vector) is noted but doesn't block convergence.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/inhand_manipulation/inhand_manipulation_warp_env.py Most complex env in the PR (980 lines); previously flagged goal_pos_w guard issue is fixed; however this environment is absent from the PR test results, and the AllegroHand config is missing use_cuda_graph=True, making its training performance and correctness unvalidated.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/allegro_hand/allegro_hand_warp_env_cfg.py AllegroHand config omits use_cuda_graph=True in newton_cfg, inconsistent with Ant and Humanoid configs; this will disable Newton's internal CUDA graph for the most compute-intensive environment in the PR.
source/isaaclab_rl/isaaclab_rl/rsl_rl/vecenv_wrapper.py Cleanly adds DirectRLEnvWarp to the allowed types via a try/except import guard; existing get_observations path (calling _get_observations() directly) correctly works with the warp env's -> dict return signature.

Sequence Diagram

sequenceDiagram
    participant RL as RSL-RL Runner
    participant Wrapper as RslRlVecEnvWrapper
    participant Env as DirectRLEnvWarp
    participant Cache as WarpGraphCache
    participant Scene as InteractiveSceneWarp

    RL->>Wrapper: step(actions)
    Wrapper->>Env: step(actions)
    Env->>Env: _pre_physics_step(wp.from_torch(actions))
    loop decimation
        Env->>Cache: capture_or_replay("action", step_warp_action)
        Cache-->>Env: graph captured/replayed
        Env->>Scene: write_data_to_sim() [outside graph]
        Env->>Env: sim.step()
        Env->>Scene: scene.update()
    end
    Env->>Cache: capture_or_replay("end_pre", _step_warp_end_pre)
    Note over Cache,Env: Captured: add_to_env → _get_dones → _get_rewards → _reset_idx(reset_buf)
    Env->>Scene: write_data_to_sim() [outside graph]
    Env->>Cache: capture_or_replay("end_post", _step_warp_end_post)
    Note over Cache,Env: Captured: _get_observations()
    Env->>Env: _post_step_visualize() [outside graph]
    Env-->>Wrapper: obs_dict, rewards, terminated, truncated, extras
    Wrapper-->>RL: TensorDict(obs), rew, dones, extras
Loading

Comments Outside Diff (5)

  1. source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/allegro_hand/allegro_hand_warp_env_cfg.py, line 46-52 (link)

    AllegroHandWarpEnvCfg missing use_cuda_graph=True in newton_cfg

    AntWarpEnvCfg and HumanoidWarpEnvCfg (after the fix in the previous review thread) both set use_cuda_graph=True inside newton_cfg. AllegroHandWarpEnvCfg omits this flag, defaulting to False:

    newton_cfg = NewtonCfg(
        solver_cfg=solver_cfg,
        num_substeps=2,
        debug_mode=False,
        # use_cuda_graph missing → defaults to False
    )

    DirectRLEnvWarp's entire value proposition is CUDA graph capture via WarpGraphCache. Disabling Newton's internal CUDA graph for the most complex task (Allegro Hand) while enabling it for Ant and Humanoid creates an inconsistency that will hurt training throughput. If there's a reason this must be disabled (e.g., a known compatibility issue with ls_parallel=False or the solver="newton" variant), it should be documented with a comment explaining the constraint.

  2. source/isaaclab_experimental/isaaclab_experimental/envs/interactive_scene_warp.py, line 40-42 (link)

    Sensors also receive env_ids=None (reset-all) when partial mask reset is intended

    The existing PR thread already acknowledged the same latent bug for deformable_object and surface_gripper. Sensors have the same problem:

    for sensor in self._sensors.values():
        sensor.reset(env_ids)  # env_ids is always None here

    Since DirectRLEnvWarp._reset_idx always calls scene.reset(env_ids=None, env_mask=mask), when only a subset of environments needs resetting (e.g., during _step_warp_end_pre), sensor.reset(None) resets all sensors — the opposite of what is intended.

    None of the current warp environments use sensors, so there is no observable failure today. However, since the class is designed for future extension, and sensors are very common in IsaacLab environments, this should be guarded the same way deformable objects and surface grippers are (with a comment or a mask→ids conversion). Consider adding the same acknowledgment comment as used for the deformable/gripper paths, so the limitation is consistently documented.

  3. source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/locomotion/locomotion_env_warp.py, line 71-84 (link)

    Velocity component ordering in observations may be swapped relative to the spatial_vectorf convention

    The observations kernel assigns vec_loc indices [0][5] to observation positions [1][6]. Based on the comment in inhand_manipulation_warp_env.py (line 363: "spatial_vectorf layout: [0:3]=angular, [3:6]=linear"), vec_loc[0:3] is angular velocity and vec_loc[3:6] is linear velocity:

    observations[env_index, 1] = velocity[env_index][0]   # → angular_x (no scale)
    observations[env_index, 2] = velocity[env_index][1]   # → angular_y (no scale)
    observations[env_index, 3] = velocity[env_index][2]   # → angular_z (no scale)
    observations[env_index, 4] = velocity[env_index][3] * angular_velocity_scale  # → linear_x * angular_velocity_scale
    observations[env_index, 5] = velocity[env_index][4] * angular_velocity_scale  # → linear_y * angular_velocity_scale
    observations[env_index, 6] = velocity[env_index][5] * angular_velocity_scale  # → linear_z * angular_velocity_scale

    If this convention holds, the scale (angular_velocity_scale = 1.0 for Ant, 0.25 for Humanoid) is applied to the linear components rather than the angular ones, and the two groups are swapped compared to the standard IsaacLab direct-env observation layout (linear first, angular second). Training still converges because the policy can adapt to any consistent observation encoding. However, a policy trained with this warp env cannot be reused with the standard AntEnv/HumanoidEnv without re-training, which may be surprising to users.

    If Newton's root_vel_w actually uses [linear, angular] order (opposite to the inhand comment), this is fine. Please verify the Newton spatial velocity convention and, if it does differ from the inhand convention, add a comment here (e.g. # Newton root_vel_w layout: [0:3]=linear, [3:6]=angular) to prevent future confusion.

  4. source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/direct/inhand_manipulation/inhand_manipulation_warp_env.py, line 545-550 (link)

    InHandManipulationWarpEnv is absent from the PR's test results

    The PR test table covers Cartpole, Ant, and Humanoid, but InHandManipulationWarpEnv (Allegro Hand, Isaac-AllegroHand-InHandManipulation-Warp-v0) is not listed. Given that this is the most complex environment in the PR (980-line file, multi-asset scene, CUDA-graph-captured reward with atomic reductions, consecutive-successes tracking, goal-marker visualization), its absence from tested environments is a notable gap. Untested convergence behavior, the missing use_cuda_graph=True in Newton config (see separate comment in allegro_hand_warp_env_cfg.py), and the obs_nonfinite_flag sanitizer all suggest this env may still have unresolved issues. Please include at least a short training run (e.g., 500 iterations) in the test matrix before merging.

  5. source/isaaclab_experimental/isaaclab_experimental/utils/warp_graph_cache.py, line 55-65 (link)

    capture_or_replay first-call behavior depends on whether wp.ScopedCapture executes kernels

    The docstring states that on the first call the function is "recorded into a CUDA graph and then immediately replayed". Under standard CUDA graph capture semantics (cudaStreamBeginCapture), kernels submitted to a captured stream are not executed — they are only recorded. The wp.capture_launch call immediately after is therefore the first actual execution, which is correct.

    However, this assumption (capture-without-execution) should be validated against Warp's implementation. If Warp's ScopedCapture does execute kernels during recording (eager-capture mode), then on the very first training step every operation inside _step_warp_end_pre / _step_warp_end_post would run twice:

    • add_to_env would increment episode_length_buf by 2 instead of 1
    • _reset_idx would fire twice for the same reset_buf
    • All reward and observation kernels would launch twice

    A one-line comment confirming the no-eager-execution assumption (or a reference to the Warp docs section that guarantees it) would prevent a subtle correctness regression if Warp ever changes its capture behavior.

Last reviewed commit: c5259f7

Comment on lines +202 to +203
joint_vel[env_index, 0] = default_joint_vel[env_index, 0]
joint_vel[env_index, 1] = default_joint_vel[env_index, 1]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded joint indices break generality of velocity reset

Joint positions are reset using the parameterized cart_dof_idx / pole_dof_idx, but joint velocities are reset using hardcoded indices 0 and 1. For the default cartpole this happens to be correct, but if the joint ordering ever changes (or this kernel is reused), velocities would be silently reset for the wrong degrees of freedom.

Suggested change
joint_vel[env_index, 0] = default_joint_vel[env_index, 0]
joint_vel[env_index, 1] = default_joint_vel[env_index, 1]
joint_vel[env_index, cart_dof_idx] = default_joint_vel[env_index, cart_dof_idx]
joint_vel[env_index, pole_dof_idx] = default_joint_vel[env_index, pole_dof_idx]

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — now uses cart_dof_idx/pole_dof_idx for velocity reset.

Comment on lines +127 to +131
to_targets[env_index] = wp.transform_get_translation(root_pose[env_index]) - wp.transform_get_translation(
default_root_pose[env_index]
)
to_targets[env_index][2] = 0.0
potentials[env_index] = -wp.length(to_targets[env_index]) / dt
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_targets and initial potentials in reset_root are computed incorrectly

After the root pose is set to default_root_pose + env_origins, the code computes:

to_targets = (default_root_pose + env_origins) - default_root_pose = env_origins

so to_targets holds the environment origin offset (~4–10 m), not the vector from the agent to the distant target (initialized to env_origins + (1000, 0, 0)). As a result potentials = -|env_origins| / dt instead of the expected ≈ -1000 / dt.

In practice this does not corrupt training because _reset_idx immediately calls _compute_intermediate_values() at its end, which correctly overwrites both to_targets and potentials (and sets prev_potentials from the now-correct potentials). However, the intention of this initial calculation in reset_root is clearly wrong and could confuse future readers or cause subtle bugs if the subsequent _compute_intermediate_values() call is ever removed or reordered.

The correct expression would be:

to_targets[env_index] = targets[env_index] - wp.transform_get_translation(root_pose[env_index])
to_targets[env_index][2] = wp.float32(0.0)
potentials[env_index] = -wp.length(to_targets[env_index]) / dt

(where targets would need to be added as a kernel input)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — reset_root now takes the targets array and computes to_targets = targets - root_pose correctly.

Comment on lines +775 to +781
@abstractmethod
def _get_observations(self) -> None:
"""Compute the observations for the environment.

Writes results into the observation buffers (e.g., ``self.obs_buf``).
"""
raise NotImplementedError(f"Please implement the '_get_observations' method for {self.__class__.__name__}.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_observations declared -> None but implementations return a dict relied on by vecenv_wrapper

The abstract signature declares a None return type and the docstring says "Writes results into the observation buffers", but every concrete implementation (CartpoleWarpEnv, LocomotionWarpEnv, etc.) returns {"policy": self.torch_obs_buf}. The updated RslRlVecEnvWrapper.get_observations() in this PR also calls _get_observations() and uses its return value directly:

obs_dict = self.unwrapped._get_observations()
return TensorDict(obs_dict, batch_size=[self.num_envs])

If a new subclass faithfully follows the declared -> None contract, get_observations() will raise a TypeError when constructing the TensorDict. The return type should be updated to make the contract explicit:

Suggested change
@abstractmethod
def _get_observations(self) -> None:
"""Compute the observations for the environment.
Writes results into the observation buffers (e.g., ``self.obs_buf``).
"""
raise NotImplementedError(f"Please implement the '_get_observations' method for {self.__class__.__name__}.")
def _get_observations(self) -> dict:

and the docstring updated to reflect that the method both writes buffers and returns the observation dict.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — changed abstract signature to -> dict with updated docstring.

"numpy",
"prettytable==3.3.0",
"toml",
"hidapi",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up — the Kit-only [python.pipapi] section was removed. File now has only [package], [dependencies], [core], and [[python.module]].

"toml",
"hidapi",
"gymnasium==0.29.0",
"trimesh"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up — the Kit-only [python.pipapi] section was removed. File now has only [package], [dependencies], [core], and [[python.module]].

[python.pipapi]
requirements = [
"numpy",
"prettytable==3.3.0",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up — removed the entire [python.pipapi] section (Kit-only, not needed for standalone package). Matched the isaaclab_newton extension.toml pattern.

"numpy",
"prettytable",
"toml",
"hid",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up — the Kit-only [python.pipapi] section was removed. File now has only [package], [dependencies], [core], and [[python.module]].

"toml",
"hid",
"gymnasium",
"trimesh"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up — the Kit-only [python.pipapi] section was removed. File now has only [package], [dependencies], [core], and [[python.module]].

Comment thread source/isaaclab_experimental/setup.py Outdated
"numpy>2",
"warp-lang>=1.9.0.dev20250825", # TODO: update to 1.11.0
"torch>=2.7",
"prettytable==3.3.0",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed — INSTALL_REQUIRES stripped to just toml. All other deps inherited transitively from isaaclab.

Comment thread source/isaaclab_experimental/setup.py Outdated
INSTALL_REQUIRES = [
# generic
"numpy>2",
"warp-lang>=1.9.0.dev20250825", # TODO: update to 1.11.0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should point to warp1.12?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stripped all redundant deps — INSTALL_REQUIRES now only has toml (needed by setup.py itself). numpy, torch, warp, prettytable are all inherited transitively from isaaclab.

Comment thread source/isaaclab_experimental/setup.py Outdated
# generic
"numpy>2",
"warp-lang>=1.9.0.dev20250825", # TODO: update to 1.11.0
"torch>=2.7",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Torch2.10

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed — deps stripped. See above.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already have a modified timer in develop. Do we need that one too?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched direct_rl_env_warp.py to import from isaaclab.utils.timer instead. The API is compatible (msg, name, enable kwargs all match). The experimental timer file is kept in place but no longer imported by the base class.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed now.

Comment on lines +38 to +39
with contextlib.suppress(ImportError):
import isaaclab_tasks_experimental # noqa: F401
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — this triggers gym registration for experimental tasks (Isaac-Cartpole-Direct-Warp-v0, etc.). The contextlib.suppress(ImportError) makes it optional so users without isaaclab_tasks_experimental installed are unaffected.

@AntoineRichard
Copy link
Copy Markdown
Collaborator

@hujc7 Can you double check that ant doesn't work after the newton update + our internal fix?

@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 10, 2026

@hujc7 Can you double check that ant doesn't work after the newton update + our internal fix?

develop TOT was not working for my other PRs so I used an older commit. Will check again.

@hujc7 hujc7 force-pushed the jichuanh/direct-warp-envs branch from 0d6b502 to 3d60b9d Compare March 11, 2026 04:03
@hujc7 hujc7 requested a review from hhansen-bdai as a code owner March 11, 2026 04:03
@hujc7 hujc7 force-pushed the jichuanh/direct-warp-envs branch 2 times, most recently from cec15c7 to 495ec7a Compare March 11, 2026 06:10
@hujc7 hujc7 mentioned this pull request Mar 11, 2026
5 tasks
@hujc7 hujc7 force-pushed the jichuanh/direct-warp-envs branch 3 times, most recently from 135024c to 4e54fdd Compare March 11, 2026 06:51
@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 11, 2026

@AntoineRichard Confirmed — all envs work now. Rebased onto latest develop, aligned solver configs to stable PresetCfg values, and referenced stable agent configs directly. All three envs (cartpole, ant, humanoid) pass 300 iterations at 4096 envs.

@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 11, 2026

@greptileai Review

Comment on lines +88 to +93

goal_rot[env_id] = randomize_rotation(rand0, rand1, x_unit_vec, y_unit_vec)
reset_goal_buf[env_id] = False

# Warp-native addition: goal position in world frame.
goal_pos_w[env_id] = goal_pos[env_id] + env_origins[env_id]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goal_pos_w update runs for every env on every step, not just masked ones

The assignment on line 93 is outside the if env_mask[env_id]: guard, so it executes for all environments on every invocation of _reset_target_pose. Because _reset_target_pose is called from _get_rewards (which is captured inside the CUDA graph via _step_warp_end_pre), this means goal_pos_w is recomputed for all envs every step, regardless of reset_goal_buf.

While functionally harmless today (because goal_pos and env_origins are constant), this is semantically incorrect: the intent is clearly to update goal_pos_w only for environments whose goal is being reset. Any future change that makes goal_pos per-env-mutable (e.g., curriculum) would silently overwrite the world-frame goal for non-resetting envs each step.

Suggested change
goal_rot[env_id] = randomize_rotation(rand0, rand1, x_unit_vec, y_unit_vec)
reset_goal_buf[env_id] = False
# Warp-native addition: goal position in world frame.
goal_pos_w[env_id] = goal_pos[env_id] + env_origins[env_id]
env_id = wp.tid()
if env_mask[env_id]:
rand0 = wp.randf(rng_state[env_id], wp.float32(-1.0), wp.float32(1.0))
rng_state[env_id] += wp.uint32(1)
rand1 = wp.randf(rng_state[env_id], wp.float32(-1.0), wp.float32(1.0))
rng_state[env_id] += wp.uint32(1)
goal_rot[env_id] = randomize_rotation(rand0, rand1, x_unit_vec, y_unit_vec)
reset_goal_buf[env_id] = False
# Warp-native addition: goal position in world frame.
goal_pos_w[env_id] = goal_pos[env_id] + env_origins[env_id]

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — moved goal_pos_w update inside the env_mask guard.

Comment on lines +34 to +44
for deformable_object in self._deformable_objects.values():
deformable_object.reset(env_ids)
for rigid_object in self._rigid_objects.values():
rigid_object.reset(env_ids, env_mask=env_mask)
for surface_gripper in self._surface_grippers.values():
surface_gripper.reset(env_ids)
for rigid_object_collection in self._rigid_object_collections.values():
rigid_object_collection.reset(env_ids, env_mask=env_mask)
# -- sensors (no env_mask support)
for sensor in self._sensors.values():
sensor.reset(env_ids)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env_mask-only reset silently resets all deformable objects and surface grippers

When _reset_idx is called with only env_mask (and env_ids=None), the assets that do not yet support env_maskdeformable_object.reset(env_ids) and surface_gripper.reset(env_ids) — receive env_ids=None, which typically means reset all environments. This is the exact opposite of what the caller intended.

Current environments in this PR (Cartpole, Ant, Humanoid, InHand) happen to have no deformable objects or surface grippers, so there is no observable failure. However, any future environment that uses either asset type and calls _reset_idx with a partial mask (e.g., during per-env resets inside a CUDA-graph-captured path) will silently reset every environment on every step.

Consider either:

  • Converting env_maskenv_ids before delegating to these two asset types (consistent with how env_mask-unaware assets already work), or
  • Adding a guard/warning to document the limitation:
if env_mask is not None and env_ids is None:
    # deformable objects / surface grippers do not support env_mask;
    # passing env_ids=None will reset ALL environments.
    pass  # or raise / warn

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — latent bug. No current envs use deformable objects or surface grippers with warp env_mask path.


# reset
max_cart_pos = 3.0 # the cart is reset if it exceeds that position [m]
initial_pole_angle_range = [-0.25, 0.25] # the range in which the pole angle is sampled from on reset [rad]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config comment says [rad] but the reset kernel multiplies the range by wp.pi

The config field is documented as radians ([rad]), but the reset kernel on line 198–200 does:

wp.randf(state[env_index],
         initial_pose_angle_range[0] * wp.pi,
         initial_pose_angle_range[1] * wp.pi)

With the default [-0.25, 0.25], the actual reset range is [-0.25π, 0.25π] ≈ [-0.785, 0.785] rad — more than three times larger than the comment implies. The comment should be updated to clarify that the values are in units of π radians:

Suggested change
initial_pole_angle_range = [-0.25, 0.25] # the range in which the pole angle is sampled from on reset [rad]
initial_pole_angle_range = [-0.25, 0.25] # pole angle reset range in units of π [rad] (actual range ≈ ±0.785 rad)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — comment updated to [x pi rad] to match the kernel which multiplies by pi.

Comment on lines +41 to +45
newton_cfg = NewtonCfg(
solver_cfg=solver_cfg,
num_substeps=2,
debug_mode=False,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HumanoidWarpEnvCfg does not set use_cuda_graph=True unlike AntWarpEnvCfg

AntWarpEnvCfg.newton_cfg sets use_cuda_graph=True, but HumanoidWarpEnvCfg.newton_cfg omits this flag (defaults to False). Since the entire DirectRLEnvWarp infrastructure is designed around CUDA graph capture for performance, disabling Newton's own CUDA graph for Humanoid while enabling it for Ant creates an inconsistency that may hurt Humanoid training throughput.

If this is intentional (e.g., Humanoid requires update_data_interval=2 which conflicts with Newton's graph capture), the reason should be documented with a comment:

Suggested change
newton_cfg = NewtonCfg(
solver_cfg=solver_cfg,
num_substeps=2,
debug_mode=False,
)
newton_cfg = NewtonCfg(
solver_cfg=solver_cfg,
num_substeps=2,
debug_mode=False,
# use_cuda_graph omitted: Newton graph capture is incompatible with update_data_interval > 1
)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added use_cuda_graph=True to match stable and AntWarpEnvCfg.

@hujc7 hujc7 changed the title Cherry-pick direct warp envs from dev/newton [EXP] Cherry-pick direct warp envs from dev/newton Mar 11, 2026
@hujc7 hujc7 changed the title [EXP] Cherry-pick direct warp envs from dev/newton [Exp] Cherry-pick direct warp envs from dev/newton Mar 11, 2026
@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 11, 2026

Validation: Warp vs Stable (Newton) Direct Env Parity

300 iterations, 4096 envs, presets=newton for stable envs.

Final metrics (last iteration):

Env Version Mean Reward Mean Ep Length Training Time
Cartpole warp 296.64 299.00 68s
Cartpole stable 297.04 299.00 76s
Ant warp 9624.26 895.02 102s
Ant stable 8570.36 880.11 134s
Humanoid warp 7318.83 877.63 178s
Humanoid stable 8120.81 877.18 210s
Allegro warp 91.17 273.93 430s
Allegro stable 104.48 287.64 486s

Timing breakdown (last iteration):

Env Version Collection Learning Iter Total SPS
Cartpole warp 0.098s 0.183s 0.280s 233,105
Cartpole stable 0.091s 0.073s 0.160s 399,136
Ant warp 0.134s 0.128s 0.260s 499,177
Ant stable 0.222s 0.117s 0.340s 386,920
Humanoid warp 0.413s 0.129s 0.540s 241,926
Humanoid stable 0.505s 0.138s 0.640s 203,757
Allegro warp 1.349s 0.276s 1.620s 40,338
Allegro stable 1.463s 0.271s 1.730s 37,781

Observations:

  • Episode lengths (physics behavior) match closely across all 4 env pairs
  • Reward differences within normal RL training variance, no systematic divergence
  • Warp collection time is faster for larger envs (Ant +40%, Humanoid +18%), but Cartpole warp has a learning time regression (0.183s vs 0.073s) — likely a tensor transfer overhead in the warp→torch handoff for the PPO update
  • All 8 runs passed (4 warp + 4 stable with presets=newton)

@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 12, 2026

@greptileai Review

@hujc7 hujc7 force-pushed the jichuanh/direct-warp-envs branch from 4e54fdd to 03fe0e6 Compare March 12, 2026 08:03
@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 12, 2026

@greptileai Review

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? It should not be needed anymore.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed — now imports directly from stable isaaclab.envs.utils.spaces.

@hujc7 hujc7 force-pushed the jichuanh/direct-warp-envs branch from d8bca3f to b037b52 Compare March 12, 2026 22:21
Add isaaclab_experimental package with DirectRLEnvWarp base class,
InteractiveSceneWarp, and WarpGraphCache utility.

Add direct warp environments in isaaclab_tasks_experimental:
- Cartpole, Ant, Humanoid, Locomotion (base), InHand Manipulation,
  Allegro Hand, with agent configs for rsl_rl, rl_games, skrl, sb3.

Adapt to develop base class API:
- find_joints 2-value return (indices, names)
- episode_length_buf as property with in-place copy for warp sync
- _ALL_ENV_MASK on base env instead of articulation
- set_joint_effort_target_mask for CUDA graph compatibility
- _get_observations returns dict for rsl_rl wrapper

Align solver configs with stable develop PresetCfg values.
Add safe_normalize to guard against NaN from wp.normalize.
Fix reset_root to_targets computation to use actual targets.
Fix cartpole reset kernel to use parameterized joint indices.
Clean up extension.toml and setup.py dependencies.
Switch timer import to isaaclab.utils.timer.
@hujc7 hujc7 force-pushed the jichuanh/direct-warp-envs branch from b037b52 to c5259f7 Compare March 12, 2026 23:27
@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 12, 2026

@greptileai Review

1 similar comment
@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 13, 2026

@greptileai Review

Comment thread source/isaaclab_experimental/setup.py
@AntoineRichard AntoineRichard merged commit 92145f5 into isaac-sim:develop Mar 13, 2026
9 of 10 checks passed
daniela-hase pushed a commit to daniela-hase/IsaacLab that referenced this pull request Mar 30, 2026
## Summary

Adds experimental warp infrastructure and direct warp environments from
`dev/newton`, adapted for `develop`. Absorbs PR isaac-sim#4812 (inhand-cp).

### `isaaclab_experimental`
* `DirectRLEnvWarp` base class with CUDA graph capture via
`WarpGraphCache`
* `InteractiveSceneWarp` with warp-native env_mask reset support
* `episode_length_buf` property with in-place copy to preserve
warp/torch shared memory

### `isaaclab_tasks_experimental` (direct envs)
* **Cartpole** (`Isaac-Cartpole-Direct-Warp-v0`)
* **Ant** (`Isaac-Ant-Direct-Warp-v0`)
* **Humanoid** (`Isaac-Humanoid-Direct-Warp-v0`)
* **Locomotion** base warp env (shared by ant/humanoid)
* **InHand Manipulation** + **Allegro Hand**
* Agent configs reference stable `isaaclab_tasks.direct.<env>.agents`
directly — no duplication

### API adaptations for `develop`
* `find_joints` 2-value return (indices, names)
* `episode_length_buf` as property with in-place `copy_()` for
warp/torch shared memory
* `self._ALL_ENV_MASK` from base env
* `set_joint_effort_target_mask` for CUDA graph compatibility
* `_get_observations` returns `{"policy": tensor}` dict
* `safe_normalize` to guard `wp.normalize` on zero-length vectors
* Solver configs aligned with stable develop `PresetCfg` values

### Test results (rsl_rl, 4096 envs, 300 iterations, headless,
`newton==1.0.0`)
| Env | Status | Time |
|-----|--------|------|
| Cartpole | PASS | 70s |
| Ant | PASS | 98s |
| Humanoid | PASS | 172s |

## Test plan
- [x] Cartpole: 300 iteration training converges
- [x] Ant: 300 iteration training converges
- [x] Humanoid: 300 iteration training converges

---------

Signed-off-by: Antoine RICHARD <antoiner@nvidia.com>
Co-authored-by: Antoine RICHARD <antoiner@nvidia.com>
AntoineRichard added a commit that referenced this pull request Apr 16, 2026
…on (#4829)

## Summary

Cherry-pick of warp manager-based env infrastructure from `dev/newton`,
refactored for `develop`.

### `isaaclab_experimental`
* Added warp-compatible manager implementations (`ActionManager`,
`ObservationManager`, `EventManager`,
`CommandManager`, `TerminationManager`, `RewardManager`) with Warp
kernel execution and CUDA graph
  capture support.
* Added `ManagerCallSwitch` utility for per-manager eager/captured
dispatch, configured via
  `MANAGER_CALL_CONFIG` env var.
* Added `ManagerBasedEnvWarp` and `ManagerBasedRLEnvWarp` orchestration
env classes.
* Added warp MDP terms (observations, rewards, terminations, events,
joint actions).
* Added utility modules: buffers (circular buffer), modifiers, noise
models, warp kernels/helpers.
* Added experimental `SceneEntityCfg` with warp joint mask/ids for
kernel-level joint selection.
* Generalized configclass default materialization in `ManagerBase` for
automatic `SceneEntityCfg` resolution.

### `isaaclab_tasks_experimental`
* Added `Isaac-Cartpole-Warp-v0` task as reference environment for warp
manager-based workflow.

### `isaaclab_rl`
* Updated rsl_rl, rl_games, sb3, skrl wrappers to accept
`ManagerBasedRLEnvWarp` and `DirectRLEnvWarp`.

### `isaaclab`
* Fixed `SettingsManager` to catch `RuntimeError` when carb is
unavailable.
* Minor comment cleanup in `ObservationManager`.

## Dependencies

Must be merged **after**:
1. #4905 (merged)

## Validated base

Validated against develop at `7588fa9ed5f`.

## Known limitations
* `Scene_write_data_to_sim` capped to mode=1 (eager) via
`MAX_MODE_OVERRIDES` — articulation
`_apply_actuator_model` uses `wp.to_torch + torch indexing`, not CUDA
graph capture-safe.

## Test plan
- [x] `Isaac-Cartpole-Warp-v0` training (4096 envs, 300 iters, mode=2):
converges (reward 4.95, ep_len 300)

---------

Co-authored-by: Antoine RICHARD <antoiner@nvidia.com>
kellyguo11 added a commit that referenced this pull request Apr 25, 2026
…on (#4945)

## Summary

* Cherry-picks [Newton] Migrate more envs and mdps to warp
(#4690) onto develop
* Cherry-picks [Newton] Add capture safety guards and fix WrenchComposer
stale COM pose (#4779) onto
develop

### Changes included
- Warp-first MDP terms (observations, rewards, events, terminations,
actions) for manager-based envs
- Tested warp env configs: Ant, Humanoid, Cartpole, locomotion velocity
(A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), Franka/UR10 reach
- ManagerCallSwitch max_mode cap and scene capture config
- MDP kernels made graph-capturable with consolidated test
infrastructure
- capture_unsafe safety guards on lazy-evaluated derived properties in
articulation/rigid_object data
- WrenchComposer fix: use fresh COM pose buffers instead of stale cached
link poses

### Dropped
- G1-29-DOF warp env (Isaac-Velocity-Flat-G1-Warp-v1): removed because
the stable g1_29_dofs task config does not exist on develop (only on
dev/newton). Warp env PRs should only add warp frontends for envs that
already exist in the stable package.

## Dependencies

Must be merged **after** these PRs (in order):
1. #4905 (merged)
2. #4829

## Validated base

Validated against develop at 7588fa9.

## Test plan

- [x] Run warp env training sweep across all manager-based env configs
(14/14 pass, mode=2, 4096 envs, 300 iters)
- [ ] Run test_mdp_warp_parity.py and test_mdp_warp_parity_new_terms.py
- [ ] Run test_action_warp_parity.py
- [ ] Verify WrenchComposer COM pose is fresh (not stale) during graph
replay

---------

Co-authored-by: Antoine Richard <antoiner@nvidia.com>
Co-authored-by: Kelly Guo <kellyg@nvidia.com>
matthewtrepte pushed a commit to matthewtrepte/IsaacLab that referenced this pull request Apr 26, 2026
…on (isaac-sim#4945)

## Summary

* Cherry-picks [Newton] Migrate more envs and mdps to warp
(isaac-sim#4690) onto develop
* Cherry-picks [Newton] Add capture safety guards and fix WrenchComposer
stale COM pose (isaac-sim#4779) onto
develop

### Changes included
- Warp-first MDP terms (observations, rewards, events, terminations,
actions) for manager-based envs
- Tested warp env configs: Ant, Humanoid, Cartpole, locomotion velocity
(A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), Franka/UR10 reach
- ManagerCallSwitch max_mode cap and scene capture config
- MDP kernels made graph-capturable with consolidated test
infrastructure
- capture_unsafe safety guards on lazy-evaluated derived properties in
articulation/rigid_object data
- WrenchComposer fix: use fresh COM pose buffers instead of stale cached
link poses

### Dropped
- G1-29-DOF warp env (Isaac-Velocity-Flat-G1-Warp-v1): removed because
the stable g1_29_dofs task config does not exist on develop (only on
dev/newton). Warp env PRs should only add warp frontends for envs that
already exist in the stable package.

## Dependencies

Must be merged **after** these PRs (in order):
1. isaac-sim#4905 (merged)
2. isaac-sim#4829

## Validated base

Validated against develop at 7588fa9.

## Test plan

- [x] Run warp env training sweep across all manager-based env configs
(14/14 pass, mode=2, 4096 envs, 300 iters)
- [ ] Run test_mdp_warp_parity.py and test_mdp_warp_parity_new_terms.py
- [ ] Run test_action_warp_parity.py
- [ ] Verify WrenchComposer COM pose is fresh (not stale) during graph
replay

---------

Co-authored-by: Antoine Richard <antoiner@nvidia.com>
Co-authored-by: Kelly Guo <kellyg@nvidia.com>
mmichelis pushed a commit to mmichelis/IsaacLab that referenced this pull request Apr 29, 2026
…on (isaac-sim#4945)

## Summary

* Cherry-picks [Newton] Migrate more envs and mdps to warp
(isaac-sim#4690) onto develop
* Cherry-picks [Newton] Add capture safety guards and fix WrenchComposer
stale COM pose (isaac-sim#4779) onto
develop

### Changes included
- Warp-first MDP terms (observations, rewards, events, terminations,
actions) for manager-based envs
- Tested warp env configs: Ant, Humanoid, Cartpole, locomotion velocity
(A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), Franka/UR10 reach
- ManagerCallSwitch max_mode cap and scene capture config
- MDP kernels made graph-capturable with consolidated test
infrastructure
- capture_unsafe safety guards on lazy-evaluated derived properties in
articulation/rigid_object data
- WrenchComposer fix: use fresh COM pose buffers instead of stale cached
link poses

### Dropped
- G1-29-DOF warp env (Isaac-Velocity-Flat-G1-Warp-v1): removed because
the stable g1_29_dofs task config does not exist on develop (only on
dev/newton). Warp env PRs should only add warp frontends for envs that
already exist in the stable package.

## Dependencies

Must be merged **after** these PRs (in order):
1. isaac-sim#4905 (merged)
2. isaac-sim#4829

## Validated base

Validated against develop at 7588fa9.

## Test plan

- [x] Run warp env training sweep across all manager-based env configs
(14/14 pass, mode=2, 4096 envs, 300 iters)
- [ ] Run test_mdp_warp_parity.py and test_mdp_warp_parity_new_terms.py
- [ ] Run test_action_warp_parity.py
- [ ] Verify WrenchComposer COM pose is fresh (not stale) during graph
replay

---------

Co-authored-by: Antoine Richard <antoiner@nvidia.com>
Co-authored-by: Kelly Guo <kellyg@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation infrastructure isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants