[Exp] Slim warp velocity envs and enable Rough variants for all robots#5088
[Exp] Slim warp velocity envs and enable Rough variants for all robots#5088hujc7 wants to merge 2 commits into
Conversation
|
WIP but set to review for bot review |
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — PR #5088
"[Exp] Add Anymal-D rough terrain warp velocity environment" by @hujc7
Summary
This PR adds Anymal-D rough terrain locomotion on Newton/Warp with the Newton collision pipeline. It is stacked on two prior PRs (#4829 warp manager infra, #4945 warp MDP migration) and introduces the top commit(s) for rough terrain support. The PR spans 118 files across several packages:
-
Experimental warp-first manager infrastructure (
isaaclab_experimental): Complete manager-based env orchestration with Warp-native managers (Action, Observation, Event, Reward, Termination, Command), MDP terms, utilities (buffers, modifiers, noise, warp kernels), andManagerCallSwitchfor eager/captured dispatch. -
Anymal-D rough terrain env config (
isaaclab_tasks_experimental):LocomotionVelocityBaseEnvCfg→LocomotionVelocityRoughEnvCfg→AnymalDRoughEnvCfghierarchy with Newton solver settings and terrain-aware spawning. -
Newton physics fixes: Zero contact margin fix, collision pipeline buffer overflow fix (
max_triangle_pairs1M→2M),capture_unsafedecorator on lazy ArticulationData properties, and state kernel helpers. -
Stable Newton preset:
RoughPhysicsCfgadded to the stable Anymal-D env with pyramidal cone + Newton collision pipeline settings. -
Dependency bumps: mujoco/mujoco-warp 3.5→3.6.0, Newton pinned to
141baffff9. -
RL wrapper updates: rsl_rl, rl_games, sb3, skrl wrappers accept
ManagerBasedRLEnvWarptypes.
Design Assessment
Architecture: Well-structured for experimental iteration. The env config hierarchy (Base → Rough/Flat) cleanly separates shared Newton solver settings from terrain-specific overrides (buffer sizes, friction cone, CCD iterations). The ManagerCallSwitch provides a clean 3-mode dispatch (stable/warp-eager/warp-captured) per manager, which is a pragmatic approach for incremental migration.
Warp-first MDP terms follow a consistent pattern. Observation/reward/termination terms all follow func(env, out, **params) -> None writing into pre-allocated Warp arrays. The decorator system (@generic_io_descriptor_warp) mirrors the stable IO descriptor system while handling the key difference that Warp-first terms cannot produce output during inspection.
The upstream workarounds are well-documented. The PR body clearly explains the MuJoCo Warp float32 divergence, zero contact margin, and buffer overflow issues with links to upstream bug reports. The code workarounds are appropriately placed (Newton manager for contact margin and pipeline buffer size).
Architecture Impact
-
Low risk to stable code. The stable changes are minimal: a Newton physics preset for Anymal-D rough env,
capture_unsafedecorators on ArticulationData lazy properties, zero margin fix innewton_manager.py, and RL wrapper type acceptance. The bulk of the PR lives inisaaclab_experimentalandisaaclab_tasks_experimental. -
Dependency pinning concern: Newton is pinned to a specific commit hash (
141baffff9), not a release tag. This is acceptable for experimental work but should be tracked for eventual release stabilization.
Implementation Verdict
APPROVE with advisory findings. The architecture is sound, the upstream workarounds are well-motivated, and the experimental env follows existing patterns correctly. The findings below are advisory improvements, not blockers.
Test Coverage
The PR includes comprehensive parity tests in isaaclab_experimental/test/envs/mdp/:
test_actions_warp_parity.py(227 lines)test_events_warp_parity.py(340 lines)test_observations_warp_parity.py(458 lines)test_rewards_warp_parity.py(567 lines)test_terminations_warp_parity.py(349 lines)
Tests verify warp-first implementations match stable torch baselines across three execution modes (stable, warp uncaptured, warp CUDA-graph captured). Shared test fixtures are extracted into parity_helpers.py.
Training validation: 500 iterations at 4096 envs with zero NaN (reward 10.71, episode length 824). Ablation tests confirm contact margin fix and buffer size increase are essential.
CI Status
✅ labeler — pass (4s)
Note: Only the labeler check ran. No lint, build, or test CI checks are configured for this PR.
Findings
1. [Medium] Function-level state caching on reset_root_state_from_terrain is fragile
File: isaaclab_experimental/envs/mdp/events.py
The function caches Warp arrays and parsed range values as attributes on the function object itself (reset_root_state_from_terrain._scratch_pose, etc.). This pattern has several issues:
- Multiple environments: If two environments use this function with different terrains/configs, they share the same cached state. The first caller's terrain data wins.
- Re-creation: If the env is destroyed and recreated, the stale function-level state persists because the function object is module-level.
Recommendation: Move cached state to the environment instance (e.g., env._reset_terrain_cache) or use a per-asset-cfg cache key. Same issue applies to generated_commands, track_lin_vel_xy_exp, and track_ang_vel_z_exp in observations and rewards.
2. [Medium] spatial_vectorf component ordering assumes [lin, ang] but Warp uses [ang, lin]
File: isaaclab_experimental/envs/mdp/events.py — _reset_root_state_from_terrain_kernel
The kernel writes velocity as:
vel_out[env_id] = wp.spatial_vectorf(
default_vel[0] + wp.randf(state, vel_lin_lo[0], vel_lin_hi[0]), # lin_x
default_vel[1] + wp.randf(state, vel_lin_lo[1], vel_lin_hi[1]), # lin_y
default_vel[2] + wp.randf(state, vel_lin_lo[2], vel_lin_hi[2]), # lin_z
default_vel[3] + wp.randf(state, vel_ang_lo[0], vel_ang_hi[0]), # ang_x
default_vel[4] + wp.randf(state, vel_ang_lo[1], vel_ang_hi[1]), # ang_y
default_vel[5] + wp.randf(state, vel_ang_lo[2], vel_ang_hi[2]), # ang_z
)Warp's spatial_vectorf convention is [angular(0:3), linear(3:6)] per wp.spatial_top (angular) / wp.spatial_bottom (linear). If default_root_vel follows [lin, ang] ordering from ArticulationData, the random perturbation would be applied to the wrong components. Verify that the convention matches what write_root_velocity_to_sim_mask expects. If the asset data is stored as [lin, ang] then the raw default_vel[0..5] access is correct for a raw float array, but the vel_lin_lo/vel_ang_lo labeling is misleading.
3. [Low] COM randomization disabled with a None assignment instead of removal
File: isaaclab_tasks_experimental/manager_based/locomotion/velocity/velocity_env_cfg.py
base_com = None # FIXME(warp-migration): ...causes NaNsThe commented-out FIXME is good documentation, but setting base_com = None in the config means that any subclass trying to override this field will need to know about this workaround. Consider adding a tracking issue reference.
4. [Low] Pre-commit checks not run
The PR body explicitly notes [ ] Run pre-commit checks is unchecked. Since the only CI check that ran is the labeler, lint/formatting issues may exist.
Summary
A well-structured, large PR that brings Anymal-D rough terrain locomotion to the Newton/Warp experimental pipeline. The upstream physics workarounds are properly documented and justified. The experimental manager infrastructure is cleanly separated from stable code. The function-level state caching pattern (Finding 1) and spatial vector component ordering (Finding 2) warrant attention but are not blockers for an experimental feature.
7589193 to
4f4e142
Compare
|
@greptileai Review |
Greptile SummaryThis PR adds an experimental Anymal-D rough terrain locomotion environment running on Newton/MuJoCo Warp, along with the supporting Warp-first infrastructure (managers, MDP terms, CUDA graph capture, warm-up run) and several Newton-level workarounds for mesh-terrain stability issues (contact margin fix, increased buffer sizes, pyramidal-cone solver). It also bumps Key changes:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant RL as RL Training Loop
participant Env as ManagerBasedRLEnvWarp
participant MCS as ManagerCallSwitch
participant WGC as WarpGraphCache
participant EM as EventManager (Warp)
participant Kernel as CUDA Kernel
participant Terrain as TerrainImporter
RL->>Env: step(actions)
Env->>MCS: call_stage("EventManager_reset", ...)
MCS->>MCS: get_mode_for_manager → WARP_CAPTURED (2)
MCS->>WGC: capture_or_replay("EventManager_reset", fn)
alt First call (warm-up + capture)
WGC->>EM: fn() [warm-up — eager]
EM->>Terrain: read terrain_levels (int64→int32 copy)
EM->>Kernel: wp.launch reset_root_state_from_terrain
WGC->>EM: fn() [capture — inside ScopedCapture]
EM->>Kernel: wp.launch (recorded into CUDA graph)
WGC->>WGC: cache graph + result
else Subsequent calls
WGC->>WGC: wp.capture_launch(cached_graph)
Note over WGC,Terrain: ⚠ terrain_levels_wp NOT updated<br/>Curriculum changes to terrain.terrain_levels<br/>are invisible to the captured kernel
end
RL->>Env: curriculum update
Env->>Terrain: update_env_origins(env_ids, move_up, move_down)
Terrain->>Terrain: terrain_levels[env_ids] updated in-place
Note over Terrain,Kernel: Warp buffers still hold stale initial values
Reviews (1): Last reviewed commit: "Standardize _is_warmed_up flag for reset..." | Re-trigger Greptile |
| reset_root_state_from_terrain._terrain_levels_wp = wp.zeros(n_envs, dtype=wp.int32, device=env.device) | ||
| reset_root_state_from_terrain._terrain_types_wp = wp.zeros(n_envs, dtype=wp.int32, device=env.device) | ||
| levels_i32 = terrain.terrain_levels.to(dtype=torch.int32, copy=True).contiguous() | ||
| types_i32 = terrain.terrain_types.to(dtype=torch.int32, copy=True).contiguous() | ||
| wp.copy(reset_root_state_from_terrain._terrain_levels_wp, wp.from_torch(levels_i32, dtype=wp.int32)) | ||
| wp.copy(reset_root_state_from_terrain._terrain_types_wp, wp.from_torch(types_i32, dtype=wp.int32)) | ||
| reset_root_state_from_terrain._num_patches = valid_positions.shape[2] | ||
|
|
||
| # Scratch buffers | ||
| reset_root_state_from_terrain._scratch_pose = wp.zeros((env.num_envs,), dtype=wp.transformf, device=env.device) | ||
| reset_root_state_from_terrain._scratch_vel = wp.zeros( | ||
| (env.num_envs,), dtype=wp.spatial_vectorf, device=env.device | ||
| ) | ||
|
|
||
| # Pre-parse orientation range (roll, pitch, yaw) | ||
| r = [pose_range.get(key, (0.0, 0.0)) for key in ["roll", "pitch", "yaw"]] | ||
| reset_root_state_from_terrain._rot_lo = wp.vec3f(r[0][0], r[1][0], r[2][0]) | ||
| reset_root_state_from_terrain._rot_hi = wp.vec3f(r[0][1], r[1][1], r[2][1]) | ||
|
|
||
| # Pre-parse velocity range | ||
| v = [velocity_range.get(key, (0.0, 0.0)) for key in ["x", "y", "z", "roll", "pitch", "yaw"]] | ||
| reset_root_state_from_terrain._vel_lin_lo = wp.vec3f(v[0][0], v[1][0], v[2][0]) | ||
| reset_root_state_from_terrain._vel_lin_hi = wp.vec3f(v[0][1], v[1][1], v[2][1]) | ||
| reset_root_state_from_terrain._vel_ang_lo = wp.vec3f(v[3][0], v[4][0], v[5][0]) | ||
| reset_root_state_from_terrain._vel_ang_hi = wp.vec3f(v[3][1], v[4][1], v[5][1]) | ||
| reset_root_state_from_terrain._is_warmed_up = True |
There was a problem hiding this comment.
Terrain levels cached once, never updated by curriculum
_terrain_levels_wp and _terrain_types_wp are initialized from terrain.terrain_levels / terrain.terrain_types during the first warm-up call and then the _is_warmed_up guard prevents any subsequent re-initialization. The terrain curriculum (terrain_levels_vel) regularly calls terrain.update_env_origins(env_ids, move_up, move_down) which updates the torch tensors in-place, but the warp buffers are separate GPU allocations (created via to(dtype=torch.int32, copy=True)) and are never refreshed afterwards.
The result is that reset_root_state_from_terrain always spawns each robot at its initial terrain level assignment, regardless of how the curriculum has since promoted or demoted that environment. The PR's stated goal — "correctly spawns on the assigned level, enabling curriculum progression" — is not achieved: the apparent "curriculum level 4.51" in the training results reflects the initial distribution of terrain level assignments, not dynamic curriculum progression.
A straightforward fix would be to update _terrain_levels_wp / _terrain_types_wp unconditionally on every call (outside the capture region, if CUDA graphs are in use). Since the torch tensors are already on device, a wp.copy of the int32-cast view is cheap and can be done before launching the kernel:
# After _is_warmed_up block, before wp.launch:
levels_i32 = terrain.terrain_levels.to(dtype=torch.int32).contiguous()
types_i32 = terrain.terrain_types.to(dtype=torch.int32).contiguous()
wp.copy(reset_root_state_from_terrain._terrain_levels_wp, wp.from_torch(levels_i32, dtype=wp.int32))
wp.copy(reset_root_state_from_terrain._terrain_types_wp, wp.from_torch(types_i32, dtype=wp.int32))If this function runs inside a CUDA graph, the copy would need to happen in a non-captured scope (the warm-up or a pre-step hook).
| if not getattr(apply_external_force_torque, "_is_warmed_up", False): | ||
| apply_external_force_torque._scratch_forces = wp.zeros( | ||
| (env.num_envs, asset.num_bodies), dtype=wp.vec3f, device=env.device | ||
| ) | ||
| apply_external_force_torque._scratch_torques = wp.zeros( | ||
| (env.num_envs, asset.num_bodies), dtype=wp.vec3f, device=env.device | ||
| ) | ||
| apply_external_force_torque._is_warmed_up = True | ||
|
|
||
| wp.launch( | ||
| kernel=_apply_external_force_torque_kernel, | ||
| dim=env.num_envs, | ||
| inputs=[ | ||
| env_mask, | ||
| env.rng_state_wp, | ||
| apply_external_force_torque._scratch_forces, | ||
| apply_external_force_torque._scratch_torques, | ||
| force_range[0], | ||
| force_range[1], | ||
| torque_range[0], | ||
| torque_range[1], | ||
| ], | ||
| device=env.device, | ||
| ) | ||
|
|
||
| asset.permanent_wrench_composer.set_forces_and_torques_mask( | ||
| forces=apply_external_force_torque._scratch_forces, | ||
| torques=apply_external_force_torque._scratch_torques, | ||
| env_mask=env_mask, | ||
| ) |
There was a problem hiding this comment.
apply_external_force_torque ignores configured body names
The stable isaaclab.envs.mdp.events.apply_external_force_torque uses asset_cfg.body_ids to restrict which bodies receive forces. This warp version allocates scratch buffers sized (num_envs, asset.num_bodies) and the kernel iterates for b in range(force_out.shape[1]), applying random forces to all bodies regardless of asset_cfg.body_names.
This is currently harmless in the locomotion configs because force_range=(0.0, 0.0) and torque_range=(-0.0, 0.0) produce zero forces for all bodies. However, if a downstream config sets non-zero ranges (e.g. to randomize perturbation on the base only), the behavior would silently diverge from the stable implementation — applying forces to every link instead of just the configured ones.
Consider adding a body_ids parameter (similar to joint_ids in reset_joints_by_offset) so the kernel can be restricted to the configured body subset.
| if not getattr(fn, "_is_warmed_up", False) or fn._asset_name != asset_cfg.name: | ||
| fn._asset_name = asset_cfg.name | ||
| r = [com_range.get(key, (0.0, 0.0)) for key in ["x", "y", "z"]] | ||
| fn._com_lo = wp.vec3f(r[0][0], r[1][0], r[2][0]) | ||
| fn._com_hi = wp.vec3f(r[0][1], r[1][1], r[2][1]) | ||
| fn._is_warmed_up = True |
There was a problem hiding this comment.
Function-level state is shared across all callers of the same function
All event functions in this file store their "warm-up" state as attributes directly on the function object (e.g. fn._is_warmed_up, fn._scratch_forces, fn._terrain_levels_wp). This is not instance-scoped: if two environments are created in the same process, or if the same function is registered to two different event terms with different asset configs / num_envs, the second caller will silently reuse stale/wrong-sized buffers from the first.
For example, randomize_rigid_body_com checks fn._asset_name != asset_cfg.name to partially guard against this, but that guard does not cover num_envs changes. The other functions (apply_external_force_torque, push_by_setting_velocity, reset_root_state_from_terrain, etc.) do not have even this partial check.
This design is reasonable for the single-env experimental use-case described in this PR, but it would be worth adding a comment explicitly documenting the single-environment assumption to prevent surprising failures in future multi-env or serialized-training scenarios.
| "mujoco-warp==3.6.0", | ||
| "PyOpenGL-accelerate==3.1.10", | ||
| "newton==1.0.0", | ||
| "newton @ git+https://github.com/newton-physics/newton.git@141baffff9", |
There was a problem hiding this comment.
Newton pinned to an unreleased commit hash
"newton @ git+https://github.com/newton-physics/newton.git@141baffff9",Pinning to a raw, unreleased commit SHA is fragile: the commit is not immutable in the same way a signed tag or package release would be — history can be force-pushed, the SHA is not semantically versioned, and it is difficult to audit what changed between revisions. It also makes the dependency invisible to security-scanning tooling that knows about released packages.
Consider whether Newton will publish a versioned release that includes 141baffff9 so the dependency can be expressed as a proper version pin (e.g., newton>=x.y.z), or at minimum document the expected upstream release that will supersede this commit in a code comment.
Bring the warp locomotion velocity configs structurally in line with
their stable counterparts so the two backends can share Newton tuning
and avoid drift.
velocity_env_cfg.py:
- Mirror stable's nested ObservationsCfg.PolicyCfg layout.
- Add EventsCfg.physics_material/add_base_mass/base_com slots (None
until warp MDP gets randomize_rigid_body_material/_mass and a
capture-safe randomize_rigid_body_com); slot names match stable so
per-robot overrides can introspect them.
- Replace the hand-rolled ContactSensorCfg with a Newton sensor preset
built locally; values mirror stable's VelocityEnvContactSensorCfg.newton.
- Set sim physics to a local ROUGH_NEWTON_CFG that mirrors stable's
RoughPhysicsCfg.newton (njmax=200, nconmax=100, pyramidal cone,
2.5M triangle pairs, 1cm shape margin).
Per-robot flat_env_cfg.py (a1, anymal_b/c/d, cassie, g1, go1/go2, h1):
- Replace inline NewtonCfg with a local _FLAT_NEWTON_CFG whose values
mirror the matching stable PhysicsCfg.newton.
- Drop dead height_scanner=None / observations.policy.height_scan=None
setters (warp env has no RayCaster).
Per-robot rough_env_cfg.py (anymal_b/c, anymal_d, go1):
- Drop redundant manager_call_max_mode={'Scene': 1} overrides; the
cap is already enforced via ManagerCallSwitch.MAX_MODE_OVERRIDES.
Why inlined instead of imported from stable: hydra_task_config resolves
env cfgs before SimulationApp initialises Kit. Importing stable's
RoughPhysicsCfg / per-robot PhysicsCfg chains through isaaclab_physx
which 'from pxr import ...' at module load; early pxr loading leaves
USD wrappers half-initialised and segfaults Kit during AppLauncher.
Comments in each file flag the constraint and ask future edits to
keep values in sync with stable.
Register `Isaac-Velocity-Rough-{Robot}-Warp-v0` and `-Play-v0` for all
9 warp velocity robots. Rough was previously commented out for
anymal_d/g1/h1 (pending isaaclab_physx availability, which has since
landed) and missing entirely for the other 6 quadrupeds + Cassie.
After this change there are 36 warp velocity envs registered (9 robots
× Flat/Rough × regular/Play), matching the stable env's set.
Each Rough registration reuses the corresponding stable rsl_rl /
skrl / rl_games / rsl_rl_with_symmetry agent config entry points where
they exist; warp envs share agent configs with stable so symmetry and
network architecture stay in sync.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This follow-up PR refactors warp velocity environment configs to mirror stable's structure (nested ObservationsCfg.PolicyCfg, inlined Newton presets with sync-comments) and enables Rough terrain variants for all 9 warp velocity robots. The changes are config-only—no logic changes to MDP terms or managers.
Architecture Impact
Self-contained. The changes affect only isaaclab_tasks_experimental config files. The inlining of Newton physics presets (instead of importing from stable) is a deliberate workaround for the hydra_task_config / pxr early-load issue, well-documented in comments.
Previous Findings Status
- Finding 1 (function-level state caching): Not addressed in this PR, but that code lives in
isaaclab_experimental/envs/mdp/events.pywhich is not touched here. Still valid for future work. - Finding 2 (spatial_vectorf ordering): Same as above—not in scope for this config-only PR.
- Finding 3 (COM randomization None assignment): The
base_com = Nonepattern is now consistently documented across all event configs with proper docstrings explaining the warp MDP gap. Improved. - Finding 4 (pre-commit): ✅ CI shows
pre-commit: success.
Implementation Verdict
Ship it. Clean config refactoring with consistent structure across all 9 robots.
Test Coverage
Config construction tests and smoke training (2 iterations, 64 envs) verified for all 36 env combinations per PR description. No new MDP logic to unit test.
CI Status
✅ pre-commit: success
✅ Build Wheel: success
⏳ Other checks pending but non-blocking for config changes.
Findings
No new issues. The inlined _FLAT_NEWTON_CFG / ROUGH_NEWTON_CFG presets with "keep values in sync with stable" comments are an acceptable pattern given the pxr import constraint. The removal of dead height_scanner = None setters and redundant manager_call_max_mode overrides is correct cleanup.
Summary
Isaac-Velocity-Rough-{Robot}-Warp-v0and-Play-v0for all 9 warp velocity robots; was Flat-only or commented out.Background
This PR was originally scoped to "Add Anymal-D rough terrain warp velocity environment". The supporting infrastructure has since merged via:
The remaining gap was on the warp side: per-robot Rough envs were not registered (commented out for Anymal-D/G1/H1 pending
isaaclab_physx, and missing entirely for the other six quadrupeds + Cassie). Configs were also drifting from stable (different layout, deadheight_scannersetters, redundantmanager_call_max_modeoverrides).Changes
velocity_env_cfg.pyObservationsCfg.PolicyCfglayout.EventsCfg.physics_material/add_base_mass/base_comslots, allNoneuntil warp MDP getsrandomize_rigid_body_material/randomize_rigid_body_massand a capture-saferandomize_rigid_body_com. Slot names match stable so per-robot overrides can introspect them.ContactSensorCfgwith a Newton sensor preset built locally; values mirror stable'sVelocityEnvContactSensorCfg.newton.sim.physicsto a localROUGH_NEWTON_CFGmirroring stable'sRoughPhysicsCfg.newton(njmax=200, nconmax=100, pyramidal cone, 2.5M triangle pairs, 1cm shape margin).Per-robot
flat_env_cfg.pyFor all 9 robots (a1, anymal_b/c/d, cassie, g1, go1/go2, h1):
NewtonCfgwith a local_FLAT_NEWTON_CFGwhose values mirror the matching stablePhysicsCfg.newton.self.scene.height_scanner = None/self.observations.policy.height_scan = Nonesetters (warp env has no RayCaster).Per-robot
rough_env_cfg.pyself.manager_call_max_mode = {'Scene': 1}overrides; the cap is already enforced viaManagerCallSwitch.MAX_MODE_OVERRIDES.Per-robot
__init__.pyAdd
Isaac-Velocity-Rough-{Robot}-Warp-v0and-Play-v0registrations for every robot, reusing the matching stable rsl_rl / skrl / rl_games /rsl_rl_with_symmetryagent config entry points.Why inline the Newton settings rather than import stable's preset?
hydra_task_configresolves env cfgs beforeSimulationAppinitialises Kit. Importing stable'sRoughPhysicsCfg/ per-robotPhysicsCfgchains throughisaaclab_physxwhichfrom pxr import …at module load. Earlypxrloading leaves USD wrappers half-initialised and segfaults Kit during AppLauncher startup. Each inlined block has a comment naming the constraint and asking future edits to keep values in sync with stable.Verification
All 36 warp velocity env cfgs construct cleanly (
gym.spec(...)+cls()for every Flat/Rough × regular/Play × 9 robots). Pre-commit (./isaaclab.sh -f) passes.Parity vs
torch newton(stable env withpresets=newton)4096 envs, L40, headless. Each pair: warp run → stable+newton run on the same GPU.
Stable + Newton rough crash — known upstream bug
Stable + Newton rough Anymal-D / Cassie crash with
Warp CUDA error 700: illegal memory accessinwp_memcpy_d2hduringSimulationAppsetup at both 4096 and 1024 envs, before any training step. Not introduced by this PR.Root cause is
newton-physics/newton#2675(open, filed 2026-05-02): newton PR #2393 added a fast-pathtid → cidcache inSolverMuJoCo(use_mujoco_contacts=False) that wasn't invalidated forBODY_INERTIAL_PROPERTIESnotifies. Stable'sEventsCfg.add_base_mass(randomize_rigid_body_mass) fires at startup and triggers exactly this notify; the next solver step writes via stalecidvalues, corrupting GPU memory and surfacing as the CUDA 700 on the next D2H sync. Fix:newton-physics/newton#2678(open).The warp envs in this PR happen to dodge the bug because warp MDP doesn't have
randomize_rigid_body_massyet (EventsCfg.add_base_mass = None), so the corruption-triggering notify never fires.Once
newton#2678lands or warp getsrandomize_rigid_body_mass(which itself depends on the upstream fix), real warp ↔ stable+newton parity numbers can be produced for Rough envs.All 9 Rough warp envs trained (300 iter, 4096 envs, L40)
Demonstrates each registration is end-to-end functional. No NaN, no crash for any of the 9.
Aggregated log:
/home/horde/workspaces/data/pr5088/all_rough_train.txtPer-task logs:
/home/horde/workspaces/data/pr5088/logs/all_rough_*.logOut of scope (follow-ups)
randomize_rigid_body_material,randomize_rigid_body_mass,reset_root_state_from_terrain, orheight_scan. The cfg slots are wired in (physics_material,add_base_mass,base_com, plus theheight_scanner⟷height_scanpair) so dropping in implementations later is mechanical. Oncerandomize_rigid_body_masslands on warp, both backends will hitnewton#2675untilnewton#2678(or equivalent) is merged.newton-physics/newton#2675/newton-physics/newton#2678); not addressed here. Once that fix lands, the parity-table rows currently marked "crash" can be filled in.add_base_massasymmetric(1.0, 1.25)override that PR [Rough Locomotion] Part 2: H1/Cassie bipeds on Newton #5298 introduced for stable Cassie) is gated on item 1.Type of change