Adds inhand manipulation warp env#4812
Conversation
|
@greptileai review |
|
@AntoineRichard I am wondering if this might be related to the not training issue you are fixing, njmax and nconmax update compared to dev/newton seems to be the correct fix here, but it hits performance really badly. |
Greptile SummaryThis PR introduces a Warp-native inhand manipulation environment ( One remaining concern:
Confidence Score: 4/5
Last reviewed commit: 0949f42 |
| logger = logging.getLogger(__name__) | ||
|
|
||
| # PLACEHOLDER: Extension template (do not remove this comment) | ||
| import isaaclab_tasks_experimental # noqa: F401 |
There was a problem hiding this comment.
Unconditional import breaks existing training workflows. The module-level import isaaclab_tasks_experimental will cause an ImportError for any user who does not have the experimental package installed, breaking all existing training scripts that use rsl_rl/train.py for unrelated tasks.
Wrap the import in a try/except (following the "Extension template" pattern referenced in the comment above):
| import isaaclab_tasks_experimental # noqa: F401 | |
| try: | |
| import isaaclab_tasks_experimental # noqa: F401 | |
| except ImportError: | |
| pass |
| from isaaclab_experimental.envs import DirectRLEnvWarp | ||
|
|
||
| from isaaclab.envs import DirectRLEnv, ManagerBasedEnv, ManagerBasedRLEnv |
There was a problem hiding this comment.
Hard import of optional package breaks all RslRlVecEnvWrapper usage. The import from isaaclab_experimental.envs import DirectRLEnvWarp is placed inside __init__ with no try/except guard. Every call to RslRlVecEnvWrapper(...) will fail with ImportError if isaaclab_experimental is not installed, which is a breaking change for all existing RSL-RL users.
Guard the import to keep the package optional:
| from isaaclab_experimental.envs import DirectRLEnvWarp | |
| from isaaclab.envs import DirectRLEnv, ManagerBasedEnv, ManagerBasedRLEnv | |
| try: | |
| from isaaclab_experimental.envs import DirectRLEnvWarp as _DirectRLEnvWarp | |
| _warp_types = (_DirectRLEnvWarp,) | |
| except ImportError: | |
| _warp_types = () | |
| if not isinstance(env.unwrapped, (ManagerBasedRLEnv, ManagerBasedEnv, DirectRLEnv, *_warp_types)): | |
| raise ValueError( | |
| "The environment must be inherited from ManagerBasedRLEnv or DirectRLEnv. Environment type:" | |
| f" {type(env)}" | |
| ) |
| python_requires=">=3.10", | ||
| install_requires=INSTALL_REQUIRES, | ||
| dependency_links=PYTORCH_INDEX_URL, | ||
| packages=["isaaclab_experimental"], |
There was a problem hiding this comment.
Sub-packages excluded from installation — all imports will fail after pip install. The packages list only includes the top-level isaaclab_experimental package, but does not include the sub-packages isaaclab_experimental.envs and isaaclab_experimental.envs.utils. After a normal pip install ., the import from isaaclab_experimental.envs import DirectRLEnvWarp will raise ModuleNotFoundError.
Use find_packages() to automatically discover all sub-packages:
| packages=["isaaclab_experimental"], | |
| from setuptools import find_packages, setup | |
| setup( | |
| # ... existing fields ... | |
| packages=find_packages(), | |
| # ... rest of setup ... | |
| ) |
| wp.launch( | ||
| zero_mask_int32, | ||
| dim=self.num_envs, | ||
| inputs=[ | ||
| mask, | ||
| self._episode_length_buf_wp, | ||
| ], | ||
| ) |
There was a problem hiding this comment.
_reset_idx(None) will crash the zero_mask_int32 kernel. The method accepts mask: wp.array | torch.Tensor | None = None, but the wp.launch() call at line 707 passes mask directly to the kernel without a guard for None. The kernel will immediately attempt to read mask[env_index], which will fail if mask is None.
Although InHandManipulationWarpEnv._reset_idx adds a None → _ALL_ENV_MASK guard before calling super()._reset_idx(), the base-class signature implies None should be handled at the base level. Future subclasses that call super()._reset_idx() without this guard will crash silently.
Add the guard in the base class:
| wp.launch( | |
| zero_mask_int32, | |
| dim=self.num_envs, | |
| inputs=[ | |
| mask, | |
| self._episode_length_buf_wp, | |
| ], | |
| ) | |
| def _reset_idx(self, mask: wp.array | torch.Tensor | None = None): | |
| """Reset environments based on specified indices. | |
| Args: | |
| mask: A warp array or torch tensor indicating which environments to reset. | |
| """ | |
| if mask is None: | |
| mask = self._ALL_ENV_MASK | |
| self.scene.reset(env_ids=None, env_mask=mask) | |
| # ... rest of method ... |
There was a problem hiding this comment.
Line 712 has if mask is None: mask = self._ALL_ENV_MASK — None never reaches the kernel.
| def _pre_physics_step(self, actions: torch.Tensor): | ||
| """Pre-process actions before stepping through the physics. | ||
|
|
||
| This function is responsible for pre-processing the actions before stepping through the physics. | ||
| It is called before the physics stepping (which is decimated). | ||
|
|
||
| Args: | ||
| actions: The actions to apply on the environment. Shape is (num_envs, action_dim). | ||
| """ | ||
| raise NotImplementedError(f"Please implement the '_pre_physics_step' method for {self.__class__.__name__}.") | ||
|
|
||
| @abstractmethod | ||
| def _apply_action(self): | ||
| """Apply actions to the simulator. | ||
|
|
||
| This function is responsible for applying the actions to the simulator. It is called at each | ||
| physics time-step. | ||
| """ | ||
| raise NotImplementedError(f"Please implement the '_apply_action' method for {self.__class__.__name__}.") | ||
|
|
||
| @abstractmethod | ||
| def _get_observations(self) -> VecEnvObs: | ||
| """Compute and return the observations for the environment. | ||
|
|
||
| Returns: | ||
| The observations for the environment. | ||
| """ | ||
| raise NotImplementedError(f"Please implement the '_get_observations' method for {self.__class__.__name__}.") | ||
|
|
||
| def _get_states(self) -> VecEnvObs | None: | ||
| """Compute and return the states for the environment. | ||
|
|
||
| The state-space is used for asymmetric actor-critic architectures. It is configured | ||
| using the :attr:`DirectRLEnvCfg.state_space` parameter. | ||
|
|
||
| Returns: | ||
| The states for the environment. If the environment does not have a state-space, the function | ||
| returns a None. | ||
| """ | ||
| return None # noqa: R501 | ||
|
|
||
| @abstractmethod | ||
| def _get_rewards(self) -> torch.Tensor: | ||
| """Compute and return the rewards for the environment. | ||
|
|
||
| Returns: | ||
| The rewards for the environment. Shape is (num_envs,). | ||
| """ | ||
| raise NotImplementedError(f"Please implement the '_get_rewards' method for {self.__class__.__name__}.") | ||
|
|
||
| @abstractmethod | ||
| def _get_dones(self) -> tuple[torch.Tensor, torch.Tensor]: | ||
| """Compute and return the done flags for the environment. | ||
|
|
||
| Returns: | ||
| A tuple containing the done flags for termination and time-out. | ||
| Shape of individual tensors is (num_envs,). | ||
| """ | ||
| raise NotImplementedError(f"Please implement the '_get_dones' method for {self.__class__.__name__}.") |
There was a problem hiding this comment.
Abstract method type hints contradict the Warp-based implementation contract. The base-class abstract methods declare:
_pre_physics_step(actions: torch.Tensor)— but implementation passeswp.from_torch(action)and receiveswp.array_get_rewards() -> torch.Tensor— butInHandManipulationWarpEnv._get_rewardsreturnsNoneand writes to Warp buffers_get_dones() -> tuple[torch.Tensor, torch.Tensor]— but implementation returnsNoneand writes to Warp buffers
Mismatched abstract signatures mislead subclass authors and type checkers. Update to match the actual contract:
| def _pre_physics_step(self, actions: torch.Tensor): | |
| """Pre-process actions before stepping through the physics. | |
| This function is responsible for pre-processing the actions before stepping through the physics. | |
| It is called before the physics stepping (which is decimated). | |
| Args: | |
| actions: The actions to apply on the environment. Shape is (num_envs, action_dim). | |
| """ | |
| raise NotImplementedError(f"Please implement the '_pre_physics_step' method for {self.__class__.__name__}.") | |
| @abstractmethod | |
| def _apply_action(self): | |
| """Apply actions to the simulator. | |
| This function is responsible for applying the actions to the simulator. It is called at each | |
| physics time-step. | |
| """ | |
| raise NotImplementedError(f"Please implement the '_apply_action' method for {self.__class__.__name__}.") | |
| @abstractmethod | |
| def _get_observations(self) -> VecEnvObs: | |
| """Compute and return the observations for the environment. | |
| Returns: | |
| The observations for the environment. | |
| """ | |
| raise NotImplementedError(f"Please implement the '_get_observations' method for {self.__class__.__name__}.") | |
| def _get_states(self) -> VecEnvObs | None: | |
| """Compute and return the states for the environment. | |
| The state-space is used for asymmetric actor-critic architectures. It is configured | |
| using the :attr:`DirectRLEnvCfg.state_space` parameter. | |
| Returns: | |
| The states for the environment. If the environment does not have a state-space, the function | |
| returns a None. | |
| """ | |
| return None # noqa: R501 | |
| @abstractmethod | |
| def _get_rewards(self) -> torch.Tensor: | |
| """Compute and return the rewards for the environment. | |
| Returns: | |
| The rewards for the environment. Shape is (num_envs,). | |
| """ | |
| raise NotImplementedError(f"Please implement the '_get_rewards' method for {self.__class__.__name__}.") | |
| @abstractmethod | |
| def _get_dones(self) -> tuple[torch.Tensor, torch.Tensor]: | |
| """Compute and return the done flags for the environment. | |
| Returns: | |
| A tuple containing the done flags for termination and time-out. | |
| Shape of individual tensors is (num_envs,). | |
| """ | |
| raise NotImplementedError(f"Please implement the '_get_dones' method for {self.__class__.__name__}.") | |
| @abstractmethod | |
| def _pre_physics_step(self, actions: wp.array) -> None: | |
| """Pre-process actions before stepping through the physics.""" | |
| raise NotImplementedError(...) | |
| @abstractmethod | |
| def _get_rewards(self) -> None: | |
| """Compute and store rewards in Warp buffers.""" | |
| raise NotImplementedError(...) | |
| @abstractmethod | |
| def _get_dones(self) -> None: | |
| """Compute and store done flags in Warp buffers.""" | |
| raise NotImplementedError(...) |
There was a problem hiding this comment.
Type hints already use wp.array and -> None.
|
@greptileai Review |
| description=EXTENSION_TOML_DATA["package"]["description"], | ||
| keywords=EXTENSION_TOML_DATA["package"]["keywords"], | ||
| include_package_data=True, | ||
| python_requires=">=3.10", |
There was a problem hiding this comment.
Sub-packages excluded from installation
packages=["isaaclab_tasks_experimental"] only includes the top-level package. After pip install ., the sub-packages isaaclab_tasks_experimental.direct, isaaclab_tasks_experimental.direct.allegro_hand, and isaaclab_tasks_experimental.direct.inhand_manipulation will not be installed. Any import isaaclab_tasks_experimental.direct.allegro_hand will raise ModuleNotFoundError.
The companion isaaclab_experimental/setup.py already uses find_packages() (correctly) — the same fix should be applied here:
| python_requires=">=3.10", | |
| from setuptools import find_packages, setup |
and change:
packages=find_packages(),|
|
||
| from isaaclab_tasks.utils import import_packages | ||
|
|
||
| # The blacklist is used to prevent importing configs from sub-packages | ||
| _BLACKLIST_PKGS = ["utils", ".mdp"] | ||
| # Import all configs in this package |
There was a problem hiding this comment.
Undeclared isaaclab_tasks dependency
import_packages is imported from isaaclab_tasks.utils, but isaaclab_tasks is not listed as a dependency anywhere — neither in extension.toml (which only declares isaaclab and isaaclab_assets) nor in setup.py (install_requires=[]). If a user has isaaclab_tasks_experimental installed without isaaclab_tasks, importing this package will fail with an ImportError.
Either add isaaclab_tasks to the dependency declarations, or make the import conditional:
| from isaaclab_tasks.utils import import_packages | |
| # The blacklist is used to prevent importing configs from sub-packages | |
| _BLACKLIST_PKGS = ["utils", ".mdp"] | |
| # Import all configs in this package | |
| try: | |
| from isaaclab_tasks.utils import import_packages | |
| import_packages(__name__, _BLACKLIST_PKGS) | |
| except ImportError: | |
| pass |
| ): | ||
| env_id = wp.tid() | ||
| if env_mask[env_id]: | ||
| rand0 = wp.randf(rng_state[env_id], wp.float32(-1.0), wp.float32(1.0)) |
There was a problem hiding this comment.
goal_pos_w updated unconditionally for all environments
The last line of the reset_target_pose kernel:
goal_pos_w[env_id] = goal_pos[env_id] + env_origins[env_id]…runs outside the if env_mask[env_id]: guard. Every time this kernel is launched — including the per-step call from _get_rewards() inside the CUDA graph (which may only need to reset goal-reaching envs) — all num_envs entries of goal_pos_w are unconditionally rewritten.
Because goal_pos and env_origins are constants, the result is always the same and the behavior is correct. However, the unconditional write inside a captured CUDA graph adds unnecessary write traffic on every step at scale (e.g. 8 192 environments). Consider moving this initialisation to a one-time setup step, or placing it inside the if env_mask[env_id]: branch if goal_pos_w only needs updating when a goal is reset.
There was a problem hiding this comment.
Idempotent — goal_pos and env_origins are unchanged for non-masked envs. No correctness issue.
|
@greptileai Review |
| delta_max = upper_limits[env_id, dof_id] - default_joint_pos[env_id, dof_id] | ||
| delta_min = lower_limits[env_id, dof_id] - default_joint_pos[env_id, dof_id] | ||
| rand_delta = delta_min + (delta_max - delta_min) * 0.5 * dof_pos_noise | ||
| pos = default_joint_pos[env_id, dof_id] + reset_dof_pos_noise * rand_delta |
There was a problem hiding this comment.
Biased reset distribution in reset_hand kernel
The rand_delta formula produces a distribution that is not uniform over [delta_min, delta_max]. With dof_pos_noise ~ U[-1, 1]:
# Current (incorrect):
rand_delta = delta_min + (delta_max - delta_min) * 0.5 * dof_pos_noise
# => rand_delta ~ U[delta_min - 0.5*range, delta_min + 0.5*range]
# => the distribution is centred at delta_min, NOT at the midpointFor example, if delta_min = -0.2 and delta_max = 0.2:
- Current:
rand_delta ~ U[-0.4, 0.0]— all reset positions are shifted downward. - Correct:
rand_delta ~ U[-0.2, 0.2]
The standard pattern used by the Torch counterpart (and the original IsaacGym implementation) first maps the uniform sample to [0, 1]:
rand_delta = delta_min + (delta_max - delta_min) * wp.float32(0.5) * (wp.float32(1.0) + dof_pos_noise)Without this fix the Warp env resets hand joints to systematically lower (or higher) positions than intended, which can negatively affect training.
There was a problem hiding this comment.
Matches the torch reference implementation's behavior.
| if self.cfg.obs_type == "openai": | ||
| self.compute_reduced_observations() | ||
| elif self.cfg.obs_type == "full": | ||
| self.compute_full_observations() | ||
| else: | ||
| print("Unknown observations type!") | ||
| return {"policy": self.torch_obs_buf} |
There was a problem hiding this comment.
Unknown obs_type silently uses stale observations
When cfg.obs_type is neither "openai" nor "full", the observation buffer is never written and the policy trains on whatever garbage was left from the previous step. print() is not an adequate safeguard here — a misconfigured obs_type will cause silent training corruption rather than an immediate, actionable failure.
Consider raising a ValueError instead:
| if self.cfg.obs_type == "openai": | |
| self.compute_reduced_observations() | |
| elif self.cfg.obs_type == "full": | |
| self.compute_full_observations() | |
| else: | |
| print("Unknown observations type!") | |
| return {"policy": self.torch_obs_buf} | |
| if self.cfg.obs_type == "openai": | |
| self.compute_reduced_observations() | |
| elif self.cfg.obs_type == "full": | |
| self.compute_full_observations() | |
| else: | |
| raise ValueError(f"Unknown observations type: '{self.cfg.obs_type}'. Expected 'openai' or 'full'.") |
|
|
||
| def write_data_to_sim(self): |
There was a problem hiding this comment.
env_mask kwarg may break sensors that don't support it
sensor.reset(env_ids, env_mask=env_mask) is called unconditionally for every sensor in the scene. If any sensor type's reset() signature does not accept an env_mask keyword argument, this will raise a TypeError at runtime — even for environments that never pass a mask.
The rigid_object.reset and articulation.reset calls are presumably updated to accept the new parameter, but sensors are heterogeneous (cameras, contact sensors, etc.) and may not all have been updated. At minimum, a guard is needed:
import inspect
for sensor in self._sensors.values():
if "env_mask" in inspect.signature(sensor.reset).parameters:
sensor.reset(env_ids, env_mask=env_mask)
else:
sensor.reset(env_ids)Or, if env_mask is now part of the base sensor reset() contract, each sensor subclass should be audited to confirm it accepts the kwarg.
There was a problem hiding this comment.
Done. Moved env_mask support to InteractiveSceneWarp in isaaclab_experimental, reverted base class.
| def _get_observations(self) -> dict: | ||
| # if self.cfg.asymmetric_obs: | ||
| # self.fingertip_force_sensors = self.hand.root_physx_view.get_link_incoming_joint_force()[ | ||
| # :, self.finger_bodies | ||
| # ] | ||
| if self.cfg.obs_type == "openai": | ||
| self.compute_reduced_observations() | ||
| elif self.cfg.obs_type == "full": | ||
| self.compute_full_observations() | ||
| else: | ||
| print("Unknown observations type!") | ||
| return {"policy": self.torch_obs_buf} | ||
|
|
There was a problem hiding this comment.
observation_space mismatch when using obs_type = "openai"
observation_space = 124 is sized for "full" observations, but the "openai" (reduced) variant only writes 35 elements (12 fingertip positions + 3 object pos + 4 relative rotation + 16 actions). If a subclass or experiment sets obs_type = "openai" without also changing observation_space, the remaining 89 elements of the observation buffer will contain uninitialised data that is silently fed to the policy.
Consider documenting the expected observation_space for each obs_type variant (e.g., 124 for "full", 35 for "openai") in a class docstring or config comment, and adding a runtime assertion:
_OBS_SIZES = {"full": 124, "openai": 35}
def _get_observations(self) -> dict:
expected = _OBS_SIZES.get(self.cfg.obs_type)
if expected is not None and self.cfg.observation_space != expected:
raise ValueError(
f"obs_type='{self.cfg.obs_type}' requires observation_space={expected}, "
f"but cfg.observation_space={self.cfg.observation_space}"
)
...There was a problem hiding this comment.
Acknowledged — openai obs_type is not the default and would need observation_space override.
|
@greptileai Review |
| observations[env_id, offset + 0] = object_vels[env_id][0] | ||
| observations[env_id, offset + 1] = object_vels[env_id][1] | ||
| observations[env_id, offset + 2] = object_vels[env_id][2] | ||
| offset += 3 | ||
|
|
||
| observations[env_id, offset + 0] = vel_obs_scale * object_vels[env_id][3] | ||
| observations[env_id, offset + 1] = vel_obs_scale * object_vels[env_id][4] | ||
| observations[env_id, offset + 2] = vel_obs_scale * object_vels[env_id][5] | ||
| offset += 3 |
There was a problem hiding this comment.
Object velocity ordering and scaling inverted vs. Torch reference
In Newton/Warp, wp.spatial_vectorf stores (angular_x, angular_y, angular_z, linear_x, linear_y, linear_z) — angular velocity first. The current code therefore writes unscaled angular velocity at offset+0..+2 and scaled linear velocity at offset+3..+5.
The reference Torch InHandManipulationEnv observation is built as:
object_linvel, # linear velocity, unscaled — first
vel_obs_scale * object_angvel, # angular velocity, scaled — secondThis produces both a component-order swap and a scaling assignment swap. The trained observation vector will not match the Torch reference, which will break cross-environment policy transfer and produce a different observation distribution than intended.
The fix should swap the two blocks and their scaling:
# linear velocity (Warp indices 3–5) — unscaled, first
observations[env_id, offset + 0] = object_vels[env_id][3]
observations[env_id, offset + 1] = object_vels[env_id][4]
observations[env_id, offset + 2] = object_vels[env_id][5]
offset += 3
# angular velocity (Warp indices 0–2) — scaled, second
observations[env_id, offset + 0] = vel_obs_scale * object_vels[env_id][0]
observations[env_id, offset + 1] = vel_obs_scale * object_vels[env_id][1]
observations[env_id, offset + 2] = vel_obs_scale * object_vels[env_id][2]
offset += 3- Add WarpGraphCache capture-or-replay utility to isaaclab_experimental/utils - Refactor direct_rl_env_warp to use WarpGraphCache, split step_warp_end into pre/post to run write_data_to_sim uncaptured between graph segments
|
@greptileai Review |
| for deformable_object in self._deformable_objects.values(): | ||
| deformable_object.reset(env_ids) | ||
| for rigid_object in self._rigid_objects.values(): | ||
| rigid_object.reset(env_ids, env_mask=env_mask) | ||
| for surface_gripper in self._surface_grippers.values(): | ||
| surface_gripper.reset(env_ids) |
There was a problem hiding this comment.
Deformable objects and surface grippers ignore the env_mask
When reset(env_ids=None, env_mask=some_partial_mask) is called, deformable objects and surface grippers both receive reset(env_ids=None), which falls through to their default behaviour of resetting all environments. This is semantically inconsistent: partial-reset callers expect only the masked envs to be touched, but any scene that contains deformable objects or surface grippers will silently over-reset.
While the current InHandManipulationWarpEnv doesn't use either asset type, this silent correctness hole will trip any future subclass that does. Consider converting env_mask to env_ids when calling reset on these assets, or at minimum adding a prominent comment noting the limitation:
for deformable_object in self._deformable_objects.values():
# NOTE: env_mask not yet supported; falls back to full reset.
deformable_object.reset(env_ids)
for surface_gripper in self._surface_grippers.values():
# NOTE: same caveat as above.
surface_gripper.reset(env_ids)|
Absorbed into #4905 which combines the inhand-cp and direct warp env changes. |
## Summary Adds experimental warp infrastructure and direct warp environments from `dev/newton`, adapted for `develop`. Absorbs PR #4812 (inhand-cp). ### `isaaclab_experimental` * `DirectRLEnvWarp` base class with CUDA graph capture via `WarpGraphCache` * `InteractiveSceneWarp` with warp-native env_mask reset support * `episode_length_buf` property with in-place copy to preserve warp/torch shared memory ### `isaaclab_tasks_experimental` (direct envs) * **Cartpole** (`Isaac-Cartpole-Direct-Warp-v0`) * **Ant** (`Isaac-Ant-Direct-Warp-v0`) * **Humanoid** (`Isaac-Humanoid-Direct-Warp-v0`) * **Locomotion** base warp env (shared by ant/humanoid) * **InHand Manipulation** + **Allegro Hand** * Agent configs reference stable `isaaclab_tasks.direct.<env>.agents` directly — no duplication ### API adaptations for `develop` * `find_joints` 2-value return (indices, names) * `episode_length_buf` as property with in-place `copy_()` for warp/torch shared memory * `self._ALL_ENV_MASK` from base env * `set_joint_effort_target_mask` for CUDA graph compatibility * `_get_observations` returns `{"policy": tensor}` dict * `safe_normalize` to guard `wp.normalize` on zero-length vectors * Solver configs aligned with stable develop `PresetCfg` values ### Test results (rsl_rl, 4096 envs, 300 iterations, headless, `newton==1.0.0`) | Env | Status | Time | |-----|--------|------| | Cartpole | PASS | 70s | | Ant | PASS | 98s | | Humanoid | PASS | 172s | ## Test plan - [x] Cartpole: 300 iteration training converges - [x] Ant: 300 iteration training converges - [x] Humanoid: 300 iteration training converges --------- Signed-off-by: Antoine RICHARD <antoiner@nvidia.com> Co-authored-by: Antoine RICHARD <antoiner@nvidia.com>
## Summary Adds experimental warp infrastructure and direct warp environments from `dev/newton`, adapted for `develop`. Absorbs PR isaac-sim#4812 (inhand-cp). ### `isaaclab_experimental` * `DirectRLEnvWarp` base class with CUDA graph capture via `WarpGraphCache` * `InteractiveSceneWarp` with warp-native env_mask reset support * `episode_length_buf` property with in-place copy to preserve warp/torch shared memory ### `isaaclab_tasks_experimental` (direct envs) * **Cartpole** (`Isaac-Cartpole-Direct-Warp-v0`) * **Ant** (`Isaac-Ant-Direct-Warp-v0`) * **Humanoid** (`Isaac-Humanoid-Direct-Warp-v0`) * **Locomotion** base warp env (shared by ant/humanoid) * **InHand Manipulation** + **Allegro Hand** * Agent configs reference stable `isaaclab_tasks.direct.<env>.agents` directly — no duplication ### API adaptations for `develop` * `find_joints` 2-value return (indices, names) * `episode_length_buf` as property with in-place `copy_()` for warp/torch shared memory * `self._ALL_ENV_MASK` from base env * `set_joint_effort_target_mask` for CUDA graph compatibility * `_get_observations` returns `{"policy": tensor}` dict * `safe_normalize` to guard `wp.normalize` on zero-length vectors * Solver configs aligned with stable develop `PresetCfg` values ### Test results (rsl_rl, 4096 envs, 300 iterations, headless, `newton==1.0.0`) | Env | Status | Time | |-----|--------|------| | Cartpole | PASS | 70s | | Ant | PASS | 98s | | Humanoid | PASS | 172s | ## Test plan - [x] Cartpole: 300 iteration training converges - [x] Ant: 300 iteration training converges - [x] Humanoid: 300 iteration training converges --------- Signed-off-by: Antoine RICHARD <antoiner@nvidia.com> Co-authored-by: Antoine RICHARD <antoiner@nvidia.com>
Summary
isaaclab_experimental
DirectRLEnvWarpbase class with CUDA graph capture supportInteractiveSceneWarp— extendsInteractiveScenewith warp-nativeenv_maskfor selective resets without modifying the base classWarpGraphCacheutility — centralizes warp CUDA graph capture-or-replay pattern into a singlecapture_or_replay(name, fn)callTimerutility withDEBUG_TIMER_STEP/DEBUG_TIMERSenv var togglesspacesutilities for warp-native observation/action spacesisaaclab_newton
1.0.0rc3isaaclab_rl
DirectRLEnvWarptoRslRlVecEnvWrapperisinstance checkisaaclab_tasks_experimental
decimation=4,dt=1/120,njmax=80,nconmax=70)scripts
isaaclab_tasks_experimentalintrain.pyNotes
developat5002c0ccb4a(before #4818 "Brings Newton assets integration tests"), which introduced abody_inertiareshape incompatible with Newton RC3 (Reshaping non-contiguous arrays is unsupported).Known Problems / TODOs
_apply_actuator_modelstill uses torch. Runs outside CUDA graph capture scope as a workaround.njmax/nconmax tuning: reverted to stable cfg defaultsTest plan