Skip to content

[Exp] Slim warp velocity envs and enable Rough variants for all robots#5088

Open
hujc7 wants to merge 2 commits into
isaac-sim:developfrom
hujc7:dev-velocity-rouge
Open

[Exp] Slim warp velocity envs and enable Rough variants for all robots#5088
hujc7 wants to merge 2 commits into
isaac-sim:developfrom
hujc7:dev-velocity-rouge

Conversation

@hujc7
Copy link
Copy Markdown
Collaborator

@hujc7 hujc7 commented Mar 23, 2026

Summary

  • Slim warp locomotion-velocity configs to mirror the stable structure (single source of truth for Newton tuning).
  • Enable Isaac-Velocity-Rough-{Robot}-Warp-v0 and -Play-v0 for all 9 warp velocity robots; was Flat-only or commented out.
  • Result: 36 warp velocity envs registered (9 robots × Flat/Rough × regular/Play), matching the stable env's set.

Background

This PR was originally scoped to "Add Anymal-D rough terrain warp velocity environment". The supporting infrastructure has since merged via:

The remaining gap was on the warp side: per-robot Rough envs were not registered (commented out for Anymal-D/G1/H1 pending isaaclab_physx, and missing entirely for the other six quadrupeds + Cassie). Configs were also drifting from stable (different layout, dead height_scanner setters, redundant manager_call_max_mode overrides).

Changes

velocity_env_cfg.py

  • Mirror stable's nested ObservationsCfg.PolicyCfg layout.
  • Add EventsCfg.physics_material / add_base_mass / base_com slots, all None until warp MDP gets randomize_rigid_body_material / randomize_rigid_body_mass and a capture-safe randomize_rigid_body_com. Slot names match stable so per-robot overrides can introspect them.
  • Replace the hand-rolled ContactSensorCfg with a Newton sensor preset built locally; values mirror stable's VelocityEnvContactSensorCfg.newton.
  • Set sim.physics to a local ROUGH_NEWTON_CFG mirroring stable's RoughPhysicsCfg.newton (njmax=200, nconmax=100, pyramidal cone, 2.5M triangle pairs, 1cm shape margin).

Per-robot flat_env_cfg.py

For all 9 robots (a1, anymal_b/c/d, cassie, g1, go1/go2, h1):

  • Replace inline NewtonCfg with a local _FLAT_NEWTON_CFG whose values mirror the matching stable PhysicsCfg.newton.
  • Drop dead self.scene.height_scanner = None / self.observations.policy.height_scan = None setters (warp env has no RayCaster).

Per-robot rough_env_cfg.py

  • anymal_c, go1: drop redundant self.manager_call_max_mode = {'Scene': 1} overrides; the cap is already enforced via ManagerCallSwitch.MAX_MODE_OVERRIDES.

Per-robot __init__.py

Add Isaac-Velocity-Rough-{Robot}-Warp-v0 and -Play-v0 registrations for every robot, reusing the matching stable rsl_rl / skrl / rl_games / rsl_rl_with_symmetry agent config entry points.

Why inline the Newton settings rather than import stable's preset?

hydra_task_config resolves env cfgs before SimulationApp initialises Kit. Importing stable's RoughPhysicsCfg / per-robot PhysicsCfg chains through isaaclab_physx which from pxr import … at module load. Early pxr loading leaves USD wrappers half-initialised and segfaults Kit during AppLauncher startup. Each inlined block has a comment naming the constraint and asking future edits to keep values in sync with stable.

Verification

All 36 warp velocity env cfgs construct cleanly (gym.spec(...) + cls() for every Flat/Rough × regular/Play × 9 robots). Pre-commit (./isaaclab.sh -f) passes.

Parity vs torch newton (stable env with presets=newton)

4096 envs, L40, headless. Each pair: warp run → stable+newton run on the same GPU.

Task Iter Warp (reward / ep_len / terrain) Stable + Newton (reward / ep_len / terrain)
Anymal-D Flat 300 22.57 / 1000 / — 21.88 / 997 / —
G1 Flat 300 2.15 / 998 / — 3.78 / 994 / —
Anymal-D Rough 1500 14.47 / 881 / 5.32 crash at startup — see "Stable + Newton rough crash"
Cassie Rough 1500 NaN at iter ≈500 (terrain 0.88) crash at startup — see "Stable + Newton rough crash"
  • Anymal-D Flat: parity at 300 iter. 22.57 vs 21.88 (~3 % delta), ep_len identical at 1000-step max; warp ~20 % faster wall time (149 s vs 187 s).
  • G1 Flat: warp behind newton at 300 iter. Episode length identical (~997 — robot survives); the gap is velocity-tracking reward and is consistent with the missing domain-randomisation events on the warp side. Default agent cfg is 1500 iter, so 300 iter is short for a biped.
  • Anymal-D Rough: warp converges; stable+newton crash blocks the side-by-side number (root cause below).
  • Cassie Rough: warp NaN'd at iter ≈500 (biped+rough is the hardest combination); stable+newton hits the same crash.

Stable + Newton rough crash — known upstream bug

Stable + Newton rough Anymal-D / Cassie crash with Warp CUDA error 700: illegal memory access in wp_memcpy_d2h during SimulationApp setup at both 4096 and 1024 envs, before any training step. Not introduced by this PR.

Root cause is newton-physics/newton#2675 (open, filed 2026-05-02): newton PR #2393 added a fast-path tid → cid cache in SolverMuJoCo (use_mujoco_contacts=False) that wasn't invalidated for BODY_INERTIAL_PROPERTIES notifies. Stable's EventsCfg.add_base_mass (randomize_rigid_body_mass) fires at startup and triggers exactly this notify; the next solver step writes via stale cid values, corrupting GPU memory and surfacing as the CUDA 700 on the next D2H sync. Fix: newton-physics/newton#2678 (open).

The warp envs in this PR happen to dodge the bug because warp MDP doesn't have randomize_rigid_body_mass yet (EventsCfg.add_base_mass = None), so the corruption-triggering notify never fires.

Once newton#2678 lands or warp gets randomize_rigid_body_mass (which itself depends on the upstream fix), real warp ↔ stable+newton parity numbers can be produced for Rough envs.

All 9 Rough warp envs trained (300 iter, 4096 envs, L40)

Demonstrates each registration is end-to-end functional. No NaN, no crash for any of the 9.

Robot Reward Ep len Terrain lvl Timeout Base contact Train time Notes
Unitree A1 0.52 21.0 1.02 55 % 11 % 187 s Stuck early; 300 iter too short for A1
Anymal-B 13.21 897.8 2.25 81 % 19 % 191 s
Anymal-C 17.03 950.8 2.82 92 % 8 % 193 s
Anymal-D 15.58 906.6 2.28 90 % 10 % 210 s (also 1500 iter: reward 14.47, terrain_lvl 5.32, success 85 %)
Cassie 10.66 946.3 0.003 80 % 20 % 151 s Survives but doesn't advance terrain at 300 iter
G1 -4.07 912.7 0.14 85 % 15 % 222 s Default agent cfg is 1500-3000 for biped
Unitree Go1 18.59 911.6 2.18 85 % 15 % 215 s
Unitree Go2 16.68 962.6 1.00 95 % 5 % 220 s Most stable (lowest base contact)
H1 20.26 964.6 3.36 96 % 4 % 173 s Best terrain advance + reward

Aggregated log: /home/horde/workspaces/data/pr5088/all_rough_train.txt
Per-task logs: /home/horde/workspaces/data/pr5088/logs/all_rough_*.log

Out of scope (follow-ups)

  1. Warp MDP doesn't yet have randomize_rigid_body_material, randomize_rigid_body_mass, reset_root_state_from_terrain, or height_scan. The cfg slots are wired in (physics_material, add_base_mass, base_com, plus the height_scannerheight_scan pair) so dropping in implementations later is mechanical. Once randomize_rigid_body_mass lands on warp, both backends will hit newton#2675 until newton#2678 (or equivalent) is merged.
  2. Stable + Newton rough crash is tracked upstream (newton-physics/newton#2675 / newton-physics/newton#2678); not addressed here. Once that fix lands, the parity-table rows currently marked "crash" can be filled in.
  3. Cassie-rough warp NaN at ≈500 iter and G1-rough negative reward at 300 iter — biped+rough stability work (incl. the add_base_mass asymmetric (1.0, 1.25) override that PR [Rough Locomotion] Part 2: H1/Cassie bipeds on Newton #5298 introduced for stable Cassie) is gated on item 1.
  4. A1-rough did not advance terrain at 300 iter; likely a per-robot reward-shape / armature issue, not warp-specific. Worth comparing against stable A1 in a follow-up.

Type of change

  • Bug fix (env-cfg drift between warp and stable backends; redundant overrides)
  • New feature (Rough warp registrations for all 9 robots)

@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label Mar 23, 2026
@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 23, 2026

WIP but set to review for bot review

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot — PR #5088

"[Exp] Add Anymal-D rough terrain warp velocity environment" by @hujc7


Summary

This PR adds Anymal-D rough terrain locomotion on Newton/Warp with the Newton collision pipeline. It is stacked on two prior PRs (#4829 warp manager infra, #4945 warp MDP migration) and introduces the top commit(s) for rough terrain support. The PR spans 118 files across several packages:

  1. Experimental warp-first manager infrastructure (isaaclab_experimental): Complete manager-based env orchestration with Warp-native managers (Action, Observation, Event, Reward, Termination, Command), MDP terms, utilities (buffers, modifiers, noise, warp kernels), and ManagerCallSwitch for eager/captured dispatch.

  2. Anymal-D rough terrain env config (isaaclab_tasks_experimental): LocomotionVelocityBaseEnvCfgLocomotionVelocityRoughEnvCfgAnymalDRoughEnvCfg hierarchy with Newton solver settings and terrain-aware spawning.

  3. Newton physics fixes: Zero contact margin fix, collision pipeline buffer overflow fix (max_triangle_pairs 1M→2M), capture_unsafe decorator on lazy ArticulationData properties, and state kernel helpers.

  4. Stable Newton preset: RoughPhysicsCfg added to the stable Anymal-D env with pyramidal cone + Newton collision pipeline settings.

  5. Dependency bumps: mujoco/mujoco-warp 3.5→3.6.0, Newton pinned to 141baffff9.

  6. RL wrapper updates: rsl_rl, rl_games, sb3, skrl wrappers accept ManagerBasedRLEnvWarp types.


Design Assessment

Architecture: Well-structured for experimental iteration. The env config hierarchy (Base → Rough/Flat) cleanly separates shared Newton solver settings from terrain-specific overrides (buffer sizes, friction cone, CCD iterations). The ManagerCallSwitch provides a clean 3-mode dispatch (stable/warp-eager/warp-captured) per manager, which is a pragmatic approach for incremental migration.

Warp-first MDP terms follow a consistent pattern. Observation/reward/termination terms all follow func(env, out, **params) -> None writing into pre-allocated Warp arrays. The decorator system (@generic_io_descriptor_warp) mirrors the stable IO descriptor system while handling the key difference that Warp-first terms cannot produce output during inspection.

The upstream workarounds are well-documented. The PR body clearly explains the MuJoCo Warp float32 divergence, zero contact margin, and buffer overflow issues with links to upstream bug reports. The code workarounds are appropriately placed (Newton manager for contact margin and pipeline buffer size).


Architecture Impact

  • Low risk to stable code. The stable changes are minimal: a Newton physics preset for Anymal-D rough env, capture_unsafe decorators on ArticulationData lazy properties, zero margin fix in newton_manager.py, and RL wrapper type acceptance. The bulk of the PR lives in isaaclab_experimental and isaaclab_tasks_experimental.

  • Dependency pinning concern: Newton is pinned to a specific commit hash (141baffff9), not a release tag. This is acceptable for experimental work but should be tracked for eventual release stabilization.


Implementation Verdict

APPROVE with advisory findings. The architecture is sound, the upstream workarounds are well-motivated, and the experimental env follows existing patterns correctly. The findings below are advisory improvements, not blockers.


Test Coverage

The PR includes comprehensive parity tests in isaaclab_experimental/test/envs/mdp/:

  • test_actions_warp_parity.py (227 lines)
  • test_events_warp_parity.py (340 lines)
  • test_observations_warp_parity.py (458 lines)
  • test_rewards_warp_parity.py (567 lines)
  • test_terminations_warp_parity.py (349 lines)

Tests verify warp-first implementations match stable torch baselines across three execution modes (stable, warp uncaptured, warp CUDA-graph captured). Shared test fixtures are extracted into parity_helpers.py.

Training validation: 500 iterations at 4096 envs with zero NaN (reward 10.71, episode length 824). Ablation tests confirm contact margin fix and buffer size increase are essential.


CI Status

labeler — pass (4s)

Note: Only the labeler check ran. No lint, build, or test CI checks are configured for this PR.


Findings

1. [Medium] Function-level state caching on reset_root_state_from_terrain is fragile

File: isaaclab_experimental/envs/mdp/events.py

The function caches Warp arrays and parsed range values as attributes on the function object itself (reset_root_state_from_terrain._scratch_pose, etc.). This pattern has several issues:

  • Multiple environments: If two environments use this function with different terrains/configs, they share the same cached state. The first caller's terrain data wins.
  • Re-creation: If the env is destroyed and recreated, the stale function-level state persists because the function object is module-level.

Recommendation: Move cached state to the environment instance (e.g., env._reset_terrain_cache) or use a per-asset-cfg cache key. Same issue applies to generated_commands, track_lin_vel_xy_exp, and track_ang_vel_z_exp in observations and rewards.

2. [Medium] spatial_vectorf component ordering assumes [lin, ang] but Warp uses [ang, lin]

File: isaaclab_experimental/envs/mdp/events.py_reset_root_state_from_terrain_kernel

The kernel writes velocity as:

vel_out[env_id] = wp.spatial_vectorf(
    default_vel[0] + wp.randf(state, vel_lin_lo[0], vel_lin_hi[0]),  # lin_x
    default_vel[1] + wp.randf(state, vel_lin_lo[1], vel_lin_hi[1]),  # lin_y
    default_vel[2] + wp.randf(state, vel_lin_lo[2], vel_lin_hi[2]),  # lin_z
    default_vel[3] + wp.randf(state, vel_ang_lo[0], vel_ang_hi[0]),  # ang_x
    default_vel[4] + wp.randf(state, vel_ang_lo[1], vel_ang_hi[1]),  # ang_y
    default_vel[5] + wp.randf(state, vel_ang_lo[2], vel_ang_hi[2]),  # ang_z
)

Warp's spatial_vectorf convention is [angular(0:3), linear(3:6)] per wp.spatial_top (angular) / wp.spatial_bottom (linear). If default_root_vel follows [lin, ang] ordering from ArticulationData, the random perturbation would be applied to the wrong components. Verify that the convention matches what write_root_velocity_to_sim_mask expects. If the asset data is stored as [lin, ang] then the raw default_vel[0..5] access is correct for a raw float array, but the vel_lin_lo/vel_ang_lo labeling is misleading.

3. [Low] COM randomization disabled with a None assignment instead of removal

File: isaaclab_tasks_experimental/manager_based/locomotion/velocity/velocity_env_cfg.py

base_com = None  # FIXME(warp-migration): ...causes NaNs

The commented-out FIXME is good documentation, but setting base_com = None in the config means that any subclass trying to override this field will need to know about this workaround. Consider adding a tracking issue reference.

4. [Low] Pre-commit checks not run

The PR body explicitly notes [ ] Run pre-commit checks is unchecked. Since the only CI check that ran is the labeler, lint/formatting issues may exist.


Summary

A well-structured, large PR that brings Anymal-D rough terrain locomotion to the Newton/Warp experimental pipeline. The upstream physics workarounds are properly documented and justified. The experimental manager infrastructure is cleanly separated from stable code. The function-level state caching pattern (Finding 1) and spatial vector component ordering (Finding 2) warrant attention but are not blockers for an experimental feature.

@hujc7 hujc7 force-pushed the dev-velocity-rouge branch from 7589193 to 4f4e142 Compare March 30, 2026 23:58
@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented Mar 31, 2026

@greptileai Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 31, 2026

Greptile Summary

This PR adds an experimental Anymal-D rough terrain locomotion environment running on Newton/MuJoCo Warp, along with the supporting Warp-first infrastructure (managers, MDP terms, CUDA graph capture, warm-up run) and several Newton-level workarounds for mesh-terrain stability issues (contact margin fix, increased buffer sizes, pyramidal-cone solver). It also bumps mujoco/mujoco-warp to 3.6.0 and pins Newton to a specific commit, and extends all major RL framework wrappers (RSL-RL, RL-Games, SB3, SKRL) to accept the new ManagerBasedRLEnvWarp family.

Key changes:

  • NewtonManager.start_simulation() now enforces a minimum 0.01 contact margin on all zero-margin shapes before finalize(), and _initialize_contacts() sets max_triangle_pairs=2_000_000 for mesh terrain — both are essential workarounds confirmed by ablation.
  • WarpGraphCache adds an eager warm-up run before CUDA graph capture so that first-call allocations (scratch buffers, dtype-conversion caches) happen outside the capture context.
  • reset_root_state_from_terrain is a new Warp kernel that samples spawn positions from pre-computed flat terrain patches; however, the cached _terrain_levels_wp / _terrain_types_wp buffers are only populated at warm-up time and never updated when the terrain curriculum changes level assignments, meaning spawning does not actually follow the curriculum (see inline comment).
  • apply_external_force_torque applies forces to all robot bodies regardless of the body_names field in asset_cfg, diverging from the stable implementation. This is latent (current configs use zero ranges) but would be incorrect with non-zero perturbation ranges.
  • Event function state is stored at function-object level (fn._is_warmed_up, scratch buffers), which is not safe if more than one environment instance exists in a process.

Confidence Score: 4/5

  • Safe to merge as an experimental module, but the terrain-curriculum correctness issue should be addressed or explicitly documented before wider adoption.
  • One P1 finding: the terrain level/type warp buffers in reset_root_state_from_terrain are cached at initialization and never refreshed, so the terrain curriculum has no effect on spawn positions after the first warm-up — contradicting the PR's stated goal of enabling curriculum progression. This is worth fixing or at minimum clearly documenting. The remaining findings are P2 style/design issues (body-name filtering, function-level state, commit-hash pin) that do not block experimental use.
  • source/isaaclab_experimental/isaaclab_experimental/envs/mdp/events.py — terrain level caching and body-name filtering bugs.

Important Files Changed

Filename Overview
source/isaaclab_experimental/isaaclab_experimental/envs/mdp/events.py New Warp-first event functions for locomotion training; contains two bugs: (1) terrain levels are cached at warm-up and never refreshed from curriculum updates, undermining terrain-aware spawning; (2) apply_external_force_torque ignores body_names and affects all bodies.
source/isaaclab_experimental/isaaclab_experimental/utils/warp_graph_cache.py Adds warm-up run before CUDA graph capture; design is correct — warm-up flushes first-call allocations so only steady-state kernel launches are captured.
source/isaaclab_newton/isaaclab_newton/physics/newton_manager.py Adds zero-contact-margin fix before finalize() and increases max_triangle_pairs to 2M for mesh terrain; logic is correct and well-documented with upstream issue references.
source/isaaclab_experimental/isaaclab_experimental/utils/manager_call_switch.py New manager routing utility that dispatches stable/warp/captured execution paths; logic is sound with proper max-mode capping, JSON config loading, and dynamic capturability registration.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/manager_based/locomotion/velocity/velocity_env_cfg.py New shared base env config with rough/flat subclasses; correctly overrides contact solver parameters for mesh terrain and wires up flat-patch sampling for terrain-aware spawning.
source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/anymal_d/rough_env_cfg.py Adds RoughPhysicsCfg Newton preset with pyramidal cone, correct buffer sizes, and Newton collision pipeline; well-commented with upstream issue references.
source/isaaclab_newton/setup.py Bumps mujoco/mujoco-warp to 3.6.0 and pins newton to an unreleased commit SHA; version bumps are well-motivated but the commit-hash pin is not semantically versioned.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/manager_based/locomotion/velocity/config/anymal_d/rough_env_cfg.py Minimal experimental Anymal-D rough env config; correctly inherits from LocomotionVelocityRoughEnvCfg and caps Scene manager to WARP_NOT_CAPTURED.

Sequence Diagram

sequenceDiagram
    participant RL as RL Training Loop
    participant Env as ManagerBasedRLEnvWarp
    participant MCS as ManagerCallSwitch
    participant WGC as WarpGraphCache
    participant EM as EventManager (Warp)
    participant Kernel as CUDA Kernel
    participant Terrain as TerrainImporter

    RL->>Env: step(actions)
    Env->>MCS: call_stage("EventManager_reset", ...)
    MCS->>MCS: get_mode_for_manager → WARP_CAPTURED (2)
    MCS->>WGC: capture_or_replay("EventManager_reset", fn)

    alt First call (warm-up + capture)
        WGC->>EM: fn() [warm-up — eager]
        EM->>Terrain: read terrain_levels (int64→int32 copy)
        EM->>Kernel: wp.launch reset_root_state_from_terrain
        WGC->>EM: fn() [capture — inside ScopedCapture]
        EM->>Kernel: wp.launch (recorded into CUDA graph)
        WGC->>WGC: cache graph + result
    else Subsequent calls
        WGC->>WGC: wp.capture_launch(cached_graph)
        Note over WGC,Terrain: ⚠ terrain_levels_wp NOT updated<br/>Curriculum changes to terrain.terrain_levels<br/>are invisible to the captured kernel
    end

    RL->>Env: curriculum update
    Env->>Terrain: update_env_origins(env_ids, move_up, move_down)
    Terrain->>Terrain: terrain_levels[env_ids] updated in-place
    Note over Terrain,Kernel: Warp buffers still hold stale initial values
Loading

Reviews (1): Last reviewed commit: "Standardize _is_warmed_up flag for reset..." | Re-trigger Greptile

Comment on lines +698 to +723
reset_root_state_from_terrain._terrain_levels_wp = wp.zeros(n_envs, dtype=wp.int32, device=env.device)
reset_root_state_from_terrain._terrain_types_wp = wp.zeros(n_envs, dtype=wp.int32, device=env.device)
levels_i32 = terrain.terrain_levels.to(dtype=torch.int32, copy=True).contiguous()
types_i32 = terrain.terrain_types.to(dtype=torch.int32, copy=True).contiguous()
wp.copy(reset_root_state_from_terrain._terrain_levels_wp, wp.from_torch(levels_i32, dtype=wp.int32))
wp.copy(reset_root_state_from_terrain._terrain_types_wp, wp.from_torch(types_i32, dtype=wp.int32))
reset_root_state_from_terrain._num_patches = valid_positions.shape[2]

# Scratch buffers
reset_root_state_from_terrain._scratch_pose = wp.zeros((env.num_envs,), dtype=wp.transformf, device=env.device)
reset_root_state_from_terrain._scratch_vel = wp.zeros(
(env.num_envs,), dtype=wp.spatial_vectorf, device=env.device
)

# Pre-parse orientation range (roll, pitch, yaw)
r = [pose_range.get(key, (0.0, 0.0)) for key in ["roll", "pitch", "yaw"]]
reset_root_state_from_terrain._rot_lo = wp.vec3f(r[0][0], r[1][0], r[2][0])
reset_root_state_from_terrain._rot_hi = wp.vec3f(r[0][1], r[1][1], r[2][1])

# Pre-parse velocity range
v = [velocity_range.get(key, (0.0, 0.0)) for key in ["x", "y", "z", "roll", "pitch", "yaw"]]
reset_root_state_from_terrain._vel_lin_lo = wp.vec3f(v[0][0], v[1][0], v[2][0])
reset_root_state_from_terrain._vel_lin_hi = wp.vec3f(v[0][1], v[1][1], v[2][1])
reset_root_state_from_terrain._vel_ang_lo = wp.vec3f(v[3][0], v[4][0], v[5][0])
reset_root_state_from_terrain._vel_ang_hi = wp.vec3f(v[3][1], v[4][1], v[5][1])
reset_root_state_from_terrain._is_warmed_up = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Terrain levels cached once, never updated by curriculum

_terrain_levels_wp and _terrain_types_wp are initialized from terrain.terrain_levels / terrain.terrain_types during the first warm-up call and then the _is_warmed_up guard prevents any subsequent re-initialization. The terrain curriculum (terrain_levels_vel) regularly calls terrain.update_env_origins(env_ids, move_up, move_down) which updates the torch tensors in-place, but the warp buffers are separate GPU allocations (created via to(dtype=torch.int32, copy=True)) and are never refreshed afterwards.

The result is that reset_root_state_from_terrain always spawns each robot at its initial terrain level assignment, regardless of how the curriculum has since promoted or demoted that environment. The PR's stated goal — "correctly spawns on the assigned level, enabling curriculum progression" — is not achieved: the apparent "curriculum level 4.51" in the training results reflects the initial distribution of terrain level assignments, not dynamic curriculum progression.

A straightforward fix would be to update _terrain_levels_wp / _terrain_types_wp unconditionally on every call (outside the capture region, if CUDA graphs are in use). Since the torch tensors are already on device, a wp.copy of the int32-cast view is cheap and can be done before launching the kernel:

# After _is_warmed_up block, before wp.launch:
levels_i32 = terrain.terrain_levels.to(dtype=torch.int32).contiguous()
types_i32  = terrain.terrain_types.to(dtype=torch.int32).contiguous()
wp.copy(reset_root_state_from_terrain._terrain_levels_wp, wp.from_torch(levels_i32, dtype=wp.int32))
wp.copy(reset_root_state_from_terrain._terrain_types_wp,  wp.from_torch(types_i32,  dtype=wp.int32))

If this function runs inside a CUDA graph, the copy would need to happen in a non-captured scope (the warm-up or a pre-step hook).

Comment on lines +159 to +188
if not getattr(apply_external_force_torque, "_is_warmed_up", False):
apply_external_force_torque._scratch_forces = wp.zeros(
(env.num_envs, asset.num_bodies), dtype=wp.vec3f, device=env.device
)
apply_external_force_torque._scratch_torques = wp.zeros(
(env.num_envs, asset.num_bodies), dtype=wp.vec3f, device=env.device
)
apply_external_force_torque._is_warmed_up = True

wp.launch(
kernel=_apply_external_force_torque_kernel,
dim=env.num_envs,
inputs=[
env_mask,
env.rng_state_wp,
apply_external_force_torque._scratch_forces,
apply_external_force_torque._scratch_torques,
force_range[0],
force_range[1],
torque_range[0],
torque_range[1],
],
device=env.device,
)

asset.permanent_wrench_composer.set_forces_and_torques_mask(
forces=apply_external_force_torque._scratch_forces,
torques=apply_external_force_torque._scratch_torques,
env_mask=env_mask,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 apply_external_force_torque ignores configured body names

The stable isaaclab.envs.mdp.events.apply_external_force_torque uses asset_cfg.body_ids to restrict which bodies receive forces. This warp version allocates scratch buffers sized (num_envs, asset.num_bodies) and the kernel iterates for b in range(force_out.shape[1]), applying random forces to all bodies regardless of asset_cfg.body_names.

This is currently harmless in the locomotion configs because force_range=(0.0, 0.0) and torque_range=(-0.0, 0.0) produce zero forces for all bodies. However, if a downstream config sets non-zero ranges (e.g. to randomize perturbation on the base only), the behavior would silently diverge from the stable implementation — applying forces to every link instead of just the configured ones.

Consider adding a body_ids parameter (similar to joint_ids in reset_joints_by_offset) so the kernel can be restricted to the configured body subset.

Comment on lines +81 to +86
if not getattr(fn, "_is_warmed_up", False) or fn._asset_name != asset_cfg.name:
fn._asset_name = asset_cfg.name
r = [com_range.get(key, (0.0, 0.0)) for key in ["x", "y", "z"]]
fn._com_lo = wp.vec3f(r[0][0], r[1][0], r[2][0])
fn._com_hi = wp.vec3f(r[0][1], r[1][1], r[2][1])
fn._is_warmed_up = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Function-level state is shared across all callers of the same function

All event functions in this file store their "warm-up" state as attributes directly on the function object (e.g. fn._is_warmed_up, fn._scratch_forces, fn._terrain_levels_wp). This is not instance-scoped: if two environments are created in the same process, or if the same function is registered to two different event terms with different asset configs / num_envs, the second caller will silently reuse stale/wrong-sized buffers from the first.

For example, randomize_rigid_body_com checks fn._asset_name != asset_cfg.name to partially guard against this, but that guard does not cover num_envs changes. The other functions (apply_external_force_torque, push_by_setting_velocity, reset_root_state_from_terrain, etc.) do not have even this partial check.

This design is reasonable for the single-env experimental use-case described in this PR, but it would be worth adding a comment explicitly documenting the single-environment assumption to prevent surprising failures in future multi-env or serialized-training scenarios.

Comment thread source/isaaclab_newton/setup.py Outdated
"mujoco-warp==3.6.0",
"PyOpenGL-accelerate==3.1.10",
"newton==1.0.0",
"newton @ git+https://github.com/newton-physics/newton.git@141baffff9",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Newton pinned to an unreleased commit hash

"newton @ git+https://github.com/newton-physics/newton.git@141baffff9",

Pinning to a raw, unreleased commit SHA is fragile: the commit is not immutable in the same way a signed tag or package release would be — history can be force-pushed, the SHA is not semantically versioned, and it is difficult to audit what changed between revisions. It also makes the dependency invisible to security-scanning tooling that knows about released packages.

Consider whether Newton will publish a versioned release that includes 141baffff9 so the dependency can be expressed as a proper version pin (e.g., newton>=x.y.z), or at minimum document the expected upstream release that will supersede this commit in a code comment.

@hujc7 hujc7 force-pushed the dev-velocity-rouge branch from 4f4e142 to 7d68d05 Compare May 4, 2026 08:05
@hujc7 hujc7 changed the title [Exp] Add Anymal-D rough terrain warp velocity environment [Exp] Slim warp velocity envs and enable Rough variants for all robots May 4, 2026
hujc7 added 2 commits May 4, 2026 08:08
Bring the warp locomotion velocity configs structurally in line with
their stable counterparts so the two backends can share Newton tuning
and avoid drift.

velocity_env_cfg.py:
- Mirror stable's nested ObservationsCfg.PolicyCfg layout.
- Add EventsCfg.physics_material/add_base_mass/base_com slots (None
  until warp MDP gets randomize_rigid_body_material/_mass and a
  capture-safe randomize_rigid_body_com); slot names match stable so
  per-robot overrides can introspect them.
- Replace the hand-rolled ContactSensorCfg with a Newton sensor preset
  built locally; values mirror stable's VelocityEnvContactSensorCfg.newton.
- Set sim physics to a local ROUGH_NEWTON_CFG that mirrors stable's
  RoughPhysicsCfg.newton (njmax=200, nconmax=100, pyramidal cone,
  2.5M triangle pairs, 1cm shape margin).

Per-robot flat_env_cfg.py (a1, anymal_b/c/d, cassie, g1, go1/go2, h1):
- Replace inline NewtonCfg with a local _FLAT_NEWTON_CFG whose values
  mirror the matching stable PhysicsCfg.newton.
- Drop dead height_scanner=None / observations.policy.height_scan=None
  setters (warp env has no RayCaster).

Per-robot rough_env_cfg.py (anymal_b/c, anymal_d, go1):
- Drop redundant manager_call_max_mode={'Scene': 1} overrides; the
  cap is already enforced via ManagerCallSwitch.MAX_MODE_OVERRIDES.

Why inlined instead of imported from stable: hydra_task_config resolves
env cfgs before SimulationApp initialises Kit. Importing stable's
RoughPhysicsCfg / per-robot PhysicsCfg chains through isaaclab_physx
which 'from pxr import ...' at module load; early pxr loading leaves
USD wrappers half-initialised and segfaults Kit during AppLauncher.
Comments in each file flag the constraint and ask future edits to
keep values in sync with stable.
Register `Isaac-Velocity-Rough-{Robot}-Warp-v0` and `-Play-v0` for all
9 warp velocity robots. Rough was previously commented out for
anymal_d/g1/h1 (pending isaaclab_physx availability, which has since
landed) and missing entirely for the other 6 quadrupeds + Cassie.

After this change there are 36 warp velocity envs registered (9 robots
× Flat/Rough × regular/Play), matching the stable env's set.

Each Rough registration reuses the corresponding stable rsl_rl /
skrl / rl_games / rsl_rl_with_symmetry agent config entry points where
they exist; warp envs share agent configs with stable so symmetry and
network architecture stay in sync.
@hujc7 hujc7 force-pushed the dev-velocity-rouge branch from 7d68d05 to 1ecf2e9 Compare May 4, 2026 08:09
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This follow-up PR refactors warp velocity environment configs to mirror stable's structure (nested ObservationsCfg.PolicyCfg, inlined Newton presets with sync-comments) and enables Rough terrain variants for all 9 warp velocity robots. The changes are config-only—no logic changes to MDP terms or managers.

Architecture Impact

Self-contained. The changes affect only isaaclab_tasks_experimental config files. The inlining of Newton physics presets (instead of importing from stable) is a deliberate workaround for the hydra_task_config / pxr early-load issue, well-documented in comments.

Previous Findings Status

  • Finding 1 (function-level state caching): Not addressed in this PR, but that code lives in isaaclab_experimental/envs/mdp/events.py which is not touched here. Still valid for future work.
  • Finding 2 (spatial_vectorf ordering): Same as above—not in scope for this config-only PR.
  • Finding 3 (COM randomization None assignment): The base_com = None pattern is now consistently documented across all event configs with proper docstrings explaining the warp MDP gap. Improved.
  • Finding 4 (pre-commit): ✅ CI shows pre-commit: success.

Implementation Verdict

Ship it. Clean config refactoring with consistent structure across all 9 robots.

Test Coverage

Config construction tests and smoke training (2 iterations, 64 envs) verified for all 36 env combinations per PR description. No new MDP logic to unit test.

CI Status

pre-commit: success
Build Wheel: success
⏳ Other checks pending but non-blocking for config changes.

Findings

No new issues. The inlined _FLAT_NEWTON_CFG / ROUGH_NEWTON_CFG presets with "keep values in sync with stable" comments are an acceptable pattern given the pxr import constraint. The removal of dead height_scanner = None setters and redundant manager_call_max_mode overrides is correct cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant