Skip to content

Adds inhand manipulation warp env#4812

Closed
hujc7 wants to merge 1 commit into
isaac-sim:developfrom
hujc7:develop-inhand-cp
Closed

Adds inhand manipulation warp env#4812
hujc7 wants to merge 1 commit into
isaac-sim:developfrom
hujc7:develop-inhand-cp

Conversation

@hujc7
Copy link
Copy Markdown
Collaborator

@hujc7 hujc7 commented Mar 4, 2026

Summary

isaaclab_experimental

  • Add DirectRLEnvWarp base class with CUDA graph capture support
  • Add InteractiveSceneWarp — extends InteractiveScene with warp-native env_mask for selective resets without modifying the base class
  • Add WarpGraphCache utility — centralizes warp CUDA graph capture-or-replay pattern into a single capture_or_replay(name, fn) call
  • Add Timer utility with DEBUG_TIMER_STEP / DEBUG_TIMERS env var toggles
  • Add spaces utilities for warp-native observation/action spaces

isaaclab_newton

  • Update Newton dependency to 1.0.0rc3

isaaclab_rl

  • Add DirectRLEnvWarp to RslRlVecEnvWrapper isinstance check

isaaclab_tasks_experimental

  • Add inhand manipulation warp environment (Allegro hand + cube repose)
  • Allegro cfg aligned with stable newton preset (decimation=4, dt=1/120, njmax=80, nconmax=70)
  • Fix object velocity observation ordering to match torch reference (linear first, then angular)

scripts

  • Import isaaclab_tasks_experimental in train.py

Notes

  • This PR is intentionally based on develop at 5002c0ccb4a (before #4818 "Brings Newton assets integration tests"), which introduced a body_inertia reshape incompatible with Newton RC3 (Reshaping non-contiguous arrays is unsupported).

Known Problems / TODOs

  • Actuator pipeline not yet warp-native: _apply_actuator_model still uses torch. Runs outside CUDA graph capture scope as a workaround.
  • njmax/nconmax tuning: reverted to stable cfg defaults

Test plan

  • CUDA graph capture succeeds
  • Training runs without crashes (4096 envs, 300 iters)
  • Full training: ~73ms/step, ~43k steps/s, reward ~100+
  • Newton RC3 compatibility verified

@github-actions github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Mar 4, 2026
@hujc7 hujc7 force-pushed the develop-inhand-cp branch from 0c9f268 to 7c8e21d Compare March 5, 2026 07:54
@hujc7 hujc7 changed the title Adds inhand manipulation warp env and warp-native actuator pipeline Adds inhand manipulation warp env Mar 5, 2026
@hujc7 hujc7 marked this pull request as ready for review March 5, 2026 07:56
@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 5, 2026

@greptileai review

@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 5, 2026

@AntoineRichard I am wondering if this might be related to the not training issue you are fixing, njmax and nconmax update compared to dev/newton seems to be the correct fix here, but it hits performance really badly.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 5, 2026

Greptile Summary

This PR introduces a Warp-native inhand manipulation environment (InHandManipulationWarpEnv) with CUDA graph capture support, backed by new DirectRLEnvWarp and InteractiveSceneWarp base classes, a WarpGraphCache utility, and a Timer utility — all under new isaaclab_experimental and isaaclab_tasks_experimental packages. Previously-raised issues regarding optional import guarding, find_packages() discovery, obs_type validation, velocity observation ordering, and None-mask handling have all been addressed.

One remaining concern:

  • InteractiveSceneWarp.reset partial-mask inconsistency: Deformable objects and surface grippers are reset with env_ids=None (all envs) even when a partial env_mask is supplied, which will silently over-reset environments in any future scene that uses those asset types.

Confidence Score: 4/5

  • Safe to merge for the current inhand manipulation task; future scene extensions using deformable objects or surface grippers may experience unintended full-reset behavior when partial masks are used.
  • Training from scratch works as confirmed by the test plan (reward ~100+, 43k steps/s, 300 iter runs without crashes). The core Warp environment, CUDA graph capture, and integration with RSL-RL are all functioning correctly. One remaining issue: InteractiveSceneWarp.reset() has inconsistent mask handling for deformable objects and surface grippers, which is a correctness concern for future subclasses but not a blocker for the current task that uses only articulations and rigid objects.
  • source/isaaclab_experimental/isaaclab_experimental/envs/interactive_scene_warp.py (deformable object and surface gripper reset mask handling).

Last reviewed commit: 0949f42

logger = logging.getLogger(__name__)

# PLACEHOLDER: Extension template (do not remove this comment)
import isaaclab_tasks_experimental # noqa: F401
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditional import breaks existing training workflows. The module-level import isaaclab_tasks_experimental will cause an ImportError for any user who does not have the experimental package installed, breaking all existing training scripts that use rsl_rl/train.py for unrelated tasks.

Wrap the import in a try/except (following the "Extension template" pattern referenced in the comment above):

Suggested change
import isaaclab_tasks_experimental # noqa: F401
try:
import isaaclab_tasks_experimental # noqa: F401
except ImportError:
pass

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 48 to 50
from isaaclab_experimental.envs import DirectRLEnvWarp

from isaaclab.envs import DirectRLEnv, ManagerBasedEnv, ManagerBasedRLEnv
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard import of optional package breaks all RslRlVecEnvWrapper usage. The import from isaaclab_experimental.envs import DirectRLEnvWarp is placed inside __init__ with no try/except guard. Every call to RslRlVecEnvWrapper(...) will fail with ImportError if isaaclab_experimental is not installed, which is a breaking change for all existing RSL-RL users.

Guard the import to keep the package optional:

Suggested change
from isaaclab_experimental.envs import DirectRLEnvWarp
from isaaclab.envs import DirectRLEnv, ManagerBasedEnv, ManagerBasedRLEnv
try:
from isaaclab_experimental.envs import DirectRLEnvWarp as _DirectRLEnvWarp
_warp_types = (_DirectRLEnvWarp,)
except ImportError:
_warp_types = ()
if not isinstance(env.unwrapped, (ManagerBasedRLEnv, ManagerBasedEnv, DirectRLEnv, *_warp_types)):
raise ValueError(
"The environment must be inherited from ManagerBasedRLEnv or DirectRLEnv. Environment type:"
f" {type(env)}"
)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread source/isaaclab_experimental/setup.py Outdated
python_requires=">=3.10",
install_requires=INSTALL_REQUIRES,
dependency_links=PYTORCH_INDEX_URL,
packages=["isaaclab_experimental"],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sub-packages excluded from installation — all imports will fail after pip install. The packages list only includes the top-level isaaclab_experimental package, but does not include the sub-packages isaaclab_experimental.envs and isaaclab_experimental.envs.utils. After a normal pip install ., the import from isaaclab_experimental.envs import DirectRLEnvWarp will raise ModuleNotFoundError.

Use find_packages() to automatically discover all sub-packages:

Suggested change
packages=["isaaclab_experimental"],
from setuptools import find_packages, setup
setup(
# ... existing fields ...
packages=find_packages(),
# ... rest of setup ...
)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +707 to +714
wp.launch(
zero_mask_int32,
dim=self.num_envs,
inputs=[
mask,
self._episode_length_buf_wp,
],
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_reset_idx(None) will crash the zero_mask_int32 kernel. The method accepts mask: wp.array | torch.Tensor | None = None, but the wp.launch() call at line 707 passes mask directly to the kernel without a guard for None. The kernel will immediately attempt to read mask[env_index], which will fail if mask is None.

Although InHandManipulationWarpEnv._reset_idx adds a None → _ALL_ENV_MASK guard before calling super()._reset_idx(), the base-class signature implies None should be handled at the base level. Future subclasses that call super()._reset_idx() without this guard will crash silently.

Add the guard in the base class:

Suggested change
wp.launch(
zero_mask_int32,
dim=self.num_envs,
inputs=[
mask,
self._episode_length_buf_wp,
],
)
def _reset_idx(self, mask: wp.array | torch.Tensor | None = None):
"""Reset environments based on specified indices.
Args:
mask: A warp array or torch tensor indicating which environments to reset.
"""
if mask is None:
mask = self._ALL_ENV_MASK
self.scene.reset(env_ids=None, env_mask=mask)
# ... rest of method ...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 712 has if mask is None: mask = self._ALL_ENV_MASK — None never reaches the kernel.

Comment on lines +733 to +791
def _pre_physics_step(self, actions: torch.Tensor):
"""Pre-process actions before stepping through the physics.

This function is responsible for pre-processing the actions before stepping through the physics.
It is called before the physics stepping (which is decimated).

Args:
actions: The actions to apply on the environment. Shape is (num_envs, action_dim).
"""
raise NotImplementedError(f"Please implement the '_pre_physics_step' method for {self.__class__.__name__}.")

@abstractmethod
def _apply_action(self):
"""Apply actions to the simulator.

This function is responsible for applying the actions to the simulator. It is called at each
physics time-step.
"""
raise NotImplementedError(f"Please implement the '_apply_action' method for {self.__class__.__name__}.")

@abstractmethod
def _get_observations(self) -> VecEnvObs:
"""Compute and return the observations for the environment.

Returns:
The observations for the environment.
"""
raise NotImplementedError(f"Please implement the '_get_observations' method for {self.__class__.__name__}.")

def _get_states(self) -> VecEnvObs | None:
"""Compute and return the states for the environment.

The state-space is used for asymmetric actor-critic architectures. It is configured
using the :attr:`DirectRLEnvCfg.state_space` parameter.

Returns:
The states for the environment. If the environment does not have a state-space, the function
returns a None.
"""
return None # noqa: R501

@abstractmethod
def _get_rewards(self) -> torch.Tensor:
"""Compute and return the rewards for the environment.

Returns:
The rewards for the environment. Shape is (num_envs,).
"""
raise NotImplementedError(f"Please implement the '_get_rewards' method for {self.__class__.__name__}.")

@abstractmethod
def _get_dones(self) -> tuple[torch.Tensor, torch.Tensor]:
"""Compute and return the done flags for the environment.

Returns:
A tuple containing the done flags for termination and time-out.
Shape of individual tensors is (num_envs,).
"""
raise NotImplementedError(f"Please implement the '_get_dones' method for {self.__class__.__name__}.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstract method type hints contradict the Warp-based implementation contract. The base-class abstract methods declare:

  • _pre_physics_step(actions: torch.Tensor) — but implementation passes wp.from_torch(action) and receives wp.array
  • _get_rewards() -> torch.Tensor — but InHandManipulationWarpEnv._get_rewards returns None and writes to Warp buffers
  • _get_dones() -> tuple[torch.Tensor, torch.Tensor] — but implementation returns None and writes to Warp buffers

Mismatched abstract signatures mislead subclass authors and type checkers. Update to match the actual contract:

Suggested change
def _pre_physics_step(self, actions: torch.Tensor):
"""Pre-process actions before stepping through the physics.
This function is responsible for pre-processing the actions before stepping through the physics.
It is called before the physics stepping (which is decimated).
Args:
actions: The actions to apply on the environment. Shape is (num_envs, action_dim).
"""
raise NotImplementedError(f"Please implement the '_pre_physics_step' method for {self.__class__.__name__}.")
@abstractmethod
def _apply_action(self):
"""Apply actions to the simulator.
This function is responsible for applying the actions to the simulator. It is called at each
physics time-step.
"""
raise NotImplementedError(f"Please implement the '_apply_action' method for {self.__class__.__name__}.")
@abstractmethod
def _get_observations(self) -> VecEnvObs:
"""Compute and return the observations for the environment.
Returns:
The observations for the environment.
"""
raise NotImplementedError(f"Please implement the '_get_observations' method for {self.__class__.__name__}.")
def _get_states(self) -> VecEnvObs | None:
"""Compute and return the states for the environment.
The state-space is used for asymmetric actor-critic architectures. It is configured
using the :attr:`DirectRLEnvCfg.state_space` parameter.
Returns:
The states for the environment. If the environment does not have a state-space, the function
returns a None.
"""
return None # noqa: R501
@abstractmethod
def _get_rewards(self) -> torch.Tensor:
"""Compute and return the rewards for the environment.
Returns:
The rewards for the environment. Shape is (num_envs,).
"""
raise NotImplementedError(f"Please implement the '_get_rewards' method for {self.__class__.__name__}.")
@abstractmethod
def _get_dones(self) -> tuple[torch.Tensor, torch.Tensor]:
"""Compute and return the done flags for the environment.
Returns:
A tuple containing the done flags for termination and time-out.
Shape of individual tensors is (num_envs,).
"""
raise NotImplementedError(f"Please implement the '_get_dones' method for {self.__class__.__name__}.")
@abstractmethod
def _pre_physics_step(self, actions: wp.array) -> None:
"""Pre-process actions before stepping through the physics."""
raise NotImplementedError(...)
@abstractmethod
def _get_rewards(self) -> None:
"""Compute and store rewards in Warp buffers."""
raise NotImplementedError(...)
@abstractmethod
def _get_dones(self) -> None:
"""Compute and store done flags in Warp buffers."""
raise NotImplementedError(...)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hints already use wp.array and -> None.

@hujc7 hujc7 force-pushed the develop-inhand-cp branch from 7c8e21d to e200453 Compare March 5, 2026 09:09
@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 5, 2026

@greptileai Review

description=EXTENSION_TOML_DATA["package"]["description"],
keywords=EXTENSION_TOML_DATA["package"]["keywords"],
include_package_data=True,
python_requires=">=3.10",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sub-packages excluded from installation

packages=["isaaclab_tasks_experimental"] only includes the top-level package. After pip install ., the sub-packages isaaclab_tasks_experimental.direct, isaaclab_tasks_experimental.direct.allegro_hand, and isaaclab_tasks_experimental.direct.inhand_manipulation will not be installed. Any import isaaclab_tasks_experimental.direct.allegro_hand will raise ModuleNotFoundError.

The companion isaaclab_experimental/setup.py already uses find_packages() (correctly) — the same fix should be applied here:

Suggested change
python_requires=">=3.10",
from setuptools import find_packages, setup

and change:

    packages=find_packages(),

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +26 to +31

from isaaclab_tasks.utils import import_packages

# The blacklist is used to prevent importing configs from sub-packages
_BLACKLIST_PKGS = ["utils", ".mdp"]
# Import all configs in this package
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undeclared isaaclab_tasks dependency

import_packages is imported from isaaclab_tasks.utils, but isaaclab_tasks is not listed as a dependency anywhere — neither in extension.toml (which only declares isaaclab and isaaclab_assets) nor in setup.py (install_requires=[]). If a user has isaaclab_tasks_experimental installed without isaaclab_tasks, importing this package will fail with an ImportError.

Either add isaaclab_tasks to the dependency declarations, or make the import conditional:

Suggested change
from isaaclab_tasks.utils import import_packages
# The blacklist is used to prevent importing configs from sub-packages
_BLACKLIST_PKGS = ["utils", ".mdp"]
# Import all configs in this package
try:
from isaaclab_tasks.utils import import_packages
import_packages(__name__, _BLACKLIST_PKGS)
except ImportError:
pass

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

):
env_id = wp.tid()
if env_mask[env_id]:
rand0 = wp.randf(rng_state[env_id], wp.float32(-1.0), wp.float32(1.0))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goal_pos_w updated unconditionally for all environments

The last line of the reset_target_pose kernel:

goal_pos_w[env_id] = goal_pos[env_id] + env_origins[env_id]

…runs outside the if env_mask[env_id]: guard. Every time this kernel is launched — including the per-step call from _get_rewards() inside the CUDA graph (which may only need to reset goal-reaching envs) — all num_envs entries of goal_pos_w are unconditionally rewritten.

Because goal_pos and env_origins are constants, the result is always the same and the behavior is correct. However, the unconditional write inside a captured CUDA graph adds unnecessary write traffic on every step at scale (e.g. 8 192 environments). Consider moving this initialisation to a one-time setup step, or placing it inside the if env_mask[env_id]: branch if goal_pos_w only needs updating when a goal is reset.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idempotent — goal_pos and env_origins are unchanged for non-masked envs. No correctness issue.

@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 6, 2026

@greptileai Review

@hujc7 hujc7 force-pushed the develop-inhand-cp branch from 3270f85 to 3e6d953 Compare March 6, 2026 23:59
Comment on lines +164 to +167
delta_max = upper_limits[env_id, dof_id] - default_joint_pos[env_id, dof_id]
delta_min = lower_limits[env_id, dof_id] - default_joint_pos[env_id, dof_id]
rand_delta = delta_min + (delta_max - delta_min) * 0.5 * dof_pos_noise
pos = default_joint_pos[env_id, dof_id] + reset_dof_pos_noise * rand_delta
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Biased reset distribution in reset_hand kernel

The rand_delta formula produces a distribution that is not uniform over [delta_min, delta_max]. With dof_pos_noise ~ U[-1, 1]:

# Current (incorrect):
rand_delta = delta_min + (delta_max - delta_min) * 0.5 * dof_pos_noise
# => rand_delta ~ U[delta_min - 0.5*range, delta_min + 0.5*range]
# => the distribution is centred at delta_min, NOT at the midpoint

For example, if delta_min = -0.2 and delta_max = 0.2:

  • Current: rand_delta ~ U[-0.4, 0.0] — all reset positions are shifted downward.
  • Correct: rand_delta ~ U[-0.2, 0.2]

The standard pattern used by the Torch counterpart (and the original IsaacGym implementation) first maps the uniform sample to [0, 1]:

rand_delta = delta_min + (delta_max - delta_min) * wp.float32(0.5) * (wp.float32(1.0) + dof_pos_noise)

Without this fix the Warp env resets hand joints to systematically lower (or higher) positions than intended, which can negatively affect training.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matches the torch reference implementation's behavior.

Comment on lines +712 to +718
if self.cfg.obs_type == "openai":
self.compute_reduced_observations()
elif self.cfg.obs_type == "full":
self.compute_full_observations()
else:
print("Unknown observations type!")
return {"policy": self.torch_obs_buf}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown obs_type silently uses stale observations

When cfg.obs_type is neither "openai" nor "full", the observation buffer is never written and the policy trains on whatever garbage was left from the previous step. print() is not an adequate safeguard here — a misconfigured obs_type will cause silent training corruption rather than an immediate, actionable failure.

Consider raising a ValueError instead:

Suggested change
if self.cfg.obs_type == "openai":
self.compute_reduced_observations()
elif self.cfg.obs_type == "full":
self.compute_full_observations()
else:
print("Unknown observations type!")
return {"policy": self.torch_obs_buf}
if self.cfg.obs_type == "openai":
self.compute_reduced_observations()
elif self.cfg.obs_type == "full":
self.compute_full_observations()
else:
raise ValueError(f"Unknown observations type: '{self.cfg.obs_type}'. Expected 'openai' or 'full'.")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 404 to 405

def write_data_to_sim(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env_mask kwarg may break sensors that don't support it

sensor.reset(env_ids, env_mask=env_mask) is called unconditionally for every sensor in the scene. If any sensor type's reset() signature does not accept an env_mask keyword argument, this will raise a TypeError at runtime — even for environments that never pass a mask.

The rigid_object.reset and articulation.reset calls are presumably updated to accept the new parameter, but sensors are heterogeneous (cameras, contact sensors, etc.) and may not all have been updated. At minimum, a guard is needed:

import inspect

for sensor in self._sensors.values():
    if "env_mask" in inspect.signature(sensor.reset).parameters:
        sensor.reset(env_ids, env_mask=env_mask)
    else:
        sensor.reset(env_ids)

Or, if env_mask is now part of the base sensor reset() contract, each sensor subclass should be audited to confirm it accepts the kwarg.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Moved env_mask support to InteractiveSceneWarp in isaaclab_experimental, reverted base class.

Comment on lines +707 to +719
def _get_observations(self) -> dict:
# if self.cfg.asymmetric_obs:
# self.fingertip_force_sensors = self.hand.root_physx_view.get_link_incoming_joint_force()[
# :, self.finger_bodies
# ]
if self.cfg.obs_type == "openai":
self.compute_reduced_observations()
elif self.cfg.obs_type == "full":
self.compute_full_observations()
else:
print("Unknown observations type!")
return {"policy": self.torch_obs_buf}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observation_space mismatch when using obs_type = "openai"

observation_space = 124 is sized for "full" observations, but the "openai" (reduced) variant only writes 35 elements (12 fingertip positions + 3 object pos + 4 relative rotation + 16 actions). If a subclass or experiment sets obs_type = "openai" without also changing observation_space, the remaining 89 elements of the observation buffer will contain uninitialised data that is silently fed to the policy.

Consider documenting the expected observation_space for each obs_type variant (e.g., 124 for "full", 35 for "openai") in a class docstring or config comment, and adding a runtime assertion:

_OBS_SIZES = {"full": 124, "openai": 35}

def _get_observations(self) -> dict:
    expected = _OBS_SIZES.get(self.cfg.obs_type)
    if expected is not None and self.cfg.observation_space != expected:
        raise ValueError(
            f"obs_type='{self.cfg.obs_type}' requires observation_space={expected}, "
            f"but cfg.observation_space={self.cfg.observation_space}"
        )
    ...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — openai obs_type is not the default and would need observation_space override.

@hujc7 hujc7 force-pushed the develop-inhand-cp branch from 3e6d953 to eb5c25e Compare March 7, 2026 19:26
@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 7, 2026

@greptileai Review

Comment on lines +357 to +365
observations[env_id, offset + 0] = object_vels[env_id][0]
observations[env_id, offset + 1] = object_vels[env_id][1]
observations[env_id, offset + 2] = object_vels[env_id][2]
offset += 3

observations[env_id, offset + 0] = vel_obs_scale * object_vels[env_id][3]
observations[env_id, offset + 1] = vel_obs_scale * object_vels[env_id][4]
observations[env_id, offset + 2] = vel_obs_scale * object_vels[env_id][5]
offset += 3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object velocity ordering and scaling inverted vs. Torch reference

In Newton/Warp, wp.spatial_vectorf stores (angular_x, angular_y, angular_z, linear_x, linear_y, linear_z) — angular velocity first. The current code therefore writes unscaled angular velocity at offset+0..+2 and scaled linear velocity at offset+3..+5.

The reference Torch InHandManipulationEnv observation is built as:

object_linvel,                   # linear velocity, unscaled — first
vel_obs_scale * object_angvel,   # angular velocity, scaled — second

This produces both a component-order swap and a scaling assignment swap. The trained observation vector will not match the Torch reference, which will break cross-environment policy transfer and produce a different observation distribution than intended.

The fix should swap the two blocks and their scaling:

# linear velocity (Warp indices 3–5) — unscaled, first
observations[env_id, offset + 0] = object_vels[env_id][3]
observations[env_id, offset + 1] = object_vels[env_id][4]
observations[env_id, offset + 2] = object_vels[env_id][5]
offset += 3

# angular velocity (Warp indices 0–2) — scaled, second
observations[env_id, offset + 0] = vel_obs_scale * object_vels[env_id][0]
observations[env_id, offset + 1] = vel_obs_scale * object_vels[env_id][1]
observations[env_id, offset + 2] = vel_obs_scale * object_vels[env_id][2]
offset += 3

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- Add WarpGraphCache capture-or-replay utility to isaaclab_experimental/utils
- Refactor direct_rl_env_warp to use WarpGraphCache, split step_warp_end
  into pre/post to run write_data_to_sim uncaptured between graph segments
@hujc7 hujc7 force-pushed the develop-inhand-cp branch from eb5c25e to 0949f42 Compare March 7, 2026 20:02
@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 7, 2026

@greptileai Review

Comment on lines +34 to +39
for deformable_object in self._deformable_objects.values():
deformable_object.reset(env_ids)
for rigid_object in self._rigid_objects.values():
rigid_object.reset(env_ids, env_mask=env_mask)
for surface_gripper in self._surface_grippers.values():
surface_gripper.reset(env_ids)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deformable objects and surface grippers ignore the env_mask

When reset(env_ids=None, env_mask=some_partial_mask) is called, deformable objects and surface grippers both receive reset(env_ids=None), which falls through to their default behaviour of resetting all environments. This is semantically inconsistent: partial-reset callers expect only the masked envs to be touched, but any scene that contains deformable objects or surface grippers will silently over-reset.

While the current InHandManipulationWarpEnv doesn't use either asset type, this silent correctness hole will trip any future subclass that does. Consider converting env_mask to env_ids when calling reset on these assets, or at minimum adding a prominent comment noting the limitation:

for deformable_object in self._deformable_objects.values():
    # NOTE: env_mask not yet supported; falls back to full reset.
    deformable_object.reset(env_ids)
for surface_gripper in self._surface_grippers.values():
    # NOTE: same caveat as above.
    surface_gripper.reset(env_ids)

@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 11, 2026

Absorbed into #4905 which combines the inhand-cp and direct warp env changes.

@hujc7 hujc7 closed this Mar 11, 2026
AntoineRichard added a commit that referenced this pull request Mar 13, 2026
## Summary

Adds experimental warp infrastructure and direct warp environments from
`dev/newton`, adapted for `develop`. Absorbs PR #4812 (inhand-cp).

### `isaaclab_experimental`
* `DirectRLEnvWarp` base class with CUDA graph capture via
`WarpGraphCache`
* `InteractiveSceneWarp` with warp-native env_mask reset support
* `episode_length_buf` property with in-place copy to preserve
warp/torch shared memory

### `isaaclab_tasks_experimental` (direct envs)
* **Cartpole** (`Isaac-Cartpole-Direct-Warp-v0`)
* **Ant** (`Isaac-Ant-Direct-Warp-v0`)
* **Humanoid** (`Isaac-Humanoid-Direct-Warp-v0`)
* **Locomotion** base warp env (shared by ant/humanoid)
* **InHand Manipulation** + **Allegro Hand**
* Agent configs reference stable `isaaclab_tasks.direct.<env>.agents`
directly — no duplication

### API adaptations for `develop`
* `find_joints` 2-value return (indices, names)
* `episode_length_buf` as property with in-place `copy_()` for
warp/torch shared memory
* `self._ALL_ENV_MASK` from base env
* `set_joint_effort_target_mask` for CUDA graph compatibility
* `_get_observations` returns `{"policy": tensor}` dict
* `safe_normalize` to guard `wp.normalize` on zero-length vectors
* Solver configs aligned with stable develop `PresetCfg` values

### Test results (rsl_rl, 4096 envs, 300 iterations, headless,
`newton==1.0.0`)
| Env | Status | Time |
|-----|--------|------|
| Cartpole | PASS | 70s |
| Ant | PASS | 98s |
| Humanoid | PASS | 172s |

## Test plan
- [x] Cartpole: 300 iteration training converges
- [x] Ant: 300 iteration training converges
- [x] Humanoid: 300 iteration training converges

---------

Signed-off-by: Antoine RICHARD <antoiner@nvidia.com>
Co-authored-by: Antoine RICHARD <antoiner@nvidia.com>
daniela-hase pushed a commit to daniela-hase/IsaacLab that referenced this pull request Mar 30, 2026
## Summary

Adds experimental warp infrastructure and direct warp environments from
`dev/newton`, adapted for `develop`. Absorbs PR isaac-sim#4812 (inhand-cp).

### `isaaclab_experimental`
* `DirectRLEnvWarp` base class with CUDA graph capture via
`WarpGraphCache`
* `InteractiveSceneWarp` with warp-native env_mask reset support
* `episode_length_buf` property with in-place copy to preserve
warp/torch shared memory

### `isaaclab_tasks_experimental` (direct envs)
* **Cartpole** (`Isaac-Cartpole-Direct-Warp-v0`)
* **Ant** (`Isaac-Ant-Direct-Warp-v0`)
* **Humanoid** (`Isaac-Humanoid-Direct-Warp-v0`)
* **Locomotion** base warp env (shared by ant/humanoid)
* **InHand Manipulation** + **Allegro Hand**
* Agent configs reference stable `isaaclab_tasks.direct.<env>.agents`
directly — no duplication

### API adaptations for `develop`
* `find_joints` 2-value return (indices, names)
* `episode_length_buf` as property with in-place `copy_()` for
warp/torch shared memory
* `self._ALL_ENV_MASK` from base env
* `set_joint_effort_target_mask` for CUDA graph compatibility
* `_get_observations` returns `{"policy": tensor}` dict
* `safe_normalize` to guard `wp.normalize` on zero-length vectors
* Solver configs aligned with stable develop `PresetCfg` values

### Test results (rsl_rl, 4096 envs, 300 iterations, headless,
`newton==1.0.0`)
| Env | Status | Time |
|-----|--------|------|
| Cartpole | PASS | 70s |
| Ant | PASS | 98s |
| Humanoid | PASS | 172s |

## Test plan
- [x] Cartpole: 300 iteration training converges
- [x] Ant: 300 iteration training converges
- [x] Humanoid: 300 iteration training converges

---------

Signed-off-by: Antoine RICHARD <antoiner@nvidia.com>
Co-authored-by: Antoine RICHARD <antoiner@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant