Skip to content

ovphysx backend integration#4852

Merged
AntoineRichard merged 31 commits into
developfrom
fix/malesiani/ovphysx_poc_integration_backend
Apr 20, 2026
Merged

ovphysx backend integration#4852
AntoineRichard merged 31 commits into
developfrom
fix/malesiani/ovphysx_poc_integration_backend

Conversation

@marcodiiga
Copy link
Copy Markdown

Description

Add ovphysx backend support.

@github-actions github-actions Bot added bug Something isn't working isaac-lab Related to Isaac Lab team labels Mar 6, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 6, 2026

Test Results Summary

2 188 tests   1 549 ✅  4h 6m 55s ⏱️
   43 suites    633 💤
    1 files        5 ❌  1 🔥

For more details on these failures and errors, see this check.

Results for commit ac4d421.

♻️ This comment has been updated with latest results.

Copy link
Copy Markdown
Collaborator

@AntoineRichard AntoineRichard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's missing dostrings everywhere, it could be good to review the tests, and maybe pull in the regular one from PhysX.

Comment thread source/isaaclab/isaaclab/assets/asset_base.py Outdated
Comment thread source/isaaclab/isaaclab/scene/interactive_scene.py Outdated
Comment thread source/isaaclab/isaaclab/utils/backend_utils.py
Comment thread source/isaaclab/test/assets/test_articulation_iface.py
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/assets/articulation/articulation_data.py Outdated
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/assets/articulation/articulation.py Outdated
marcodiiga added a commit that referenced this pull request Mar 7, 2026
Fix joint_acc finite-difference using dt instead of 1/dt, remove
incorrect command-buffer zeroing from reset(), fix mock binding
indexed-write shape mismatch, and make _write_scratch lazily
initialized for mock-constructed articulations.

Whole-word startswith() backend detection in backend_utils,
asset_base, and interactive_scene.  Add comprehensive docstrings
with shape/dtype/units to articulation_data.py and articulation.py
matching PhysX style.  Add shape/dtype comments to tensor_types.py.

Extract _configure_physx_scene_prim helper from ovphysx_manager.
Remove duplicate mock bindings, add mock_interfaces exports.
Delete superseded tests, move test_gpu_zero_copy to test/physics/.
@marcodiiga marcodiiga force-pushed the fix/malesiani/ovphysx_poc_integration_backend branch from 44457ac to 66500b2 Compare March 16, 2026 17:32
@AntoineRichard AntoineRichard marked this pull request as ready for review March 19, 2026 15:37
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 19, 2026

Greptile Summary

This PR introduces the isaaclab_ovphysx package — a new Kit-free physics backend for IsaacLab built on the ovphysx Python wheel. It enables training pipelines to run without launching the full Kit/IsaacSim runtime by exporting the USD stage to a temporary file and loading it directly into an ovphysx.PhysX instance. Core modifications to backend_utils, asset_base, interactive_scene, and simulation_context are clean and well-scoped, using startswith() instead of in to prevent false matches with the new backend name.

Key findings from the review:

  • scripts/run_ovphysx.sh unconditionally overwrites LD_PRELOAD rather than prepending, which would silently discard any existing preloaded libraries (e.g. ASAN, LSAN, or user debug shims).
  • OvPhysxManager._warmup_and_load registers a new atexit handler on every invocation. Each handler calls os._exit(0) to sidestep a dual-Carbonite teardown race — acknowledged in a FIXME comment. Because os._exit(0) skips all Python finalizers and buffered I/O, training checkpoints or metrics written in Python teardown code may be silently lost. The handler should be registered at most once.
  • _apply_external_wrenches in articulation.py calls inst.add_forces_and_torques_index(forces=perm.composed_force, ...) whenever inst.active is True, without guarding against perm.active. If WrenchComposer.composed_force returns stale (non-zero) data when inactive, permanent forces bleed into instantaneous applications.
  • sim_launcher._is_kitless_physics uses type(node).__name__ string matching, which breaks for subclasses of OvPhysxCfg and would cause Kit to be unnecessarily launched.
  • Kitless test detection in test_articulation_iface.py relies on LD_PRELOAD == "", a fragile heuristic that can produce false positives/negatives in CI or user environments.

Confidence Score: 3/5

  • The PR adds substantial new functionality with mostly correct logic, but two workarounds (os._exit in atexit handler and sys.modules pxr hiding) have acknowledged production risks that need follow-up before this is stable enough for broad use.
  • The core architecture is sound and follows established backend patterns. The changes to existing files are minimal and correct. However, the os._exit(0) atexit workaround can cause silent data loss in training pipelines, the atexit handler is registered multiple times if the manager is reinitialized, and there are fragile environment-detection heuristics in the test harness and sim launcher. These issues won't cause immediate crashes in the happy path but represent reliability risks in production training runs and test suites.
  • source/isaaclab_ovphysx/isaaclab_ovphysx/physics/ovphysx_manager.py (atexit/os._exit workaround), scripts/run_ovphysx.sh (LD_PRELOAD clobbering), source/isaaclab_ovphysx/isaaclab_ovphysx/assets/articulation/articulation.py (wrench accumulation logic).

Important Files Changed

Filename Overview
source/isaaclab_ovphysx/isaaclab_ovphysx/physics/ovphysx_manager.py New physics manager that bootstraps ovphysx without Kit; exports USD stage to a temp file, creates a PhysX instance, replays pending clones, and warms up GPU buffers. Contains a os._exit(0) atexit workaround for a dual-Carbonite teardown race and a sys.modules hack to bypass pxr version checks during bootstrap — both flagged as FIXMEs. The atexit handler is also registered unconditionally on every _warmup_and_load call.
source/isaaclab_ovphysx/isaaclab_ovphysx/assets/articulation/articulation.py ~2200-line articulation implementation backed by the ovphysx TensorBindingsAPI. Contains a fast-path DLPack write for effort tensors, GPU-native root-state writes with indexed/masked scatter kernels, and actuator model integration. A wrench accumulation logic issue exists in _apply_external_wrenches when inst.active is true but perm.active is false.
source/isaaclab_ovphysx/isaaclab_ovphysx/assets/articulation/articulation_data.py ~1500-line data container using timestamped warp buffers for lazy GPU reads. Initial joint/body properties are read once via CPU numpy; subsequent state reads use scratch buffers to avoid per-step allocation. Correctly separates CPU-only property tensors from GPU state tensors.
scripts/run_ovphysx.sh Shell script that bootstraps ovphysx standalone (without Kit). LD_PRELOAD is set to ovphysx's bundled libcarb.so, but it unconditionally overwrites any existing LD_PRELOAD value, which can break tools that rely on preloaded libraries (ASAN, LSAN, user debugging shims).
source/isaaclab/test/assets/test_articulation_iface.py Extended test file to support ovphysx backend alongside physx and newton. Kitless detection uses LD_PRELOAD == "" and "EXP_PATH" not in os.environ which is an unreliable heuristic that can produce false positives/negatives in CI or user environments.
source/isaaclab/isaaclab/scene/interactive_scene.py Adds ovphysx-specific clone path (clone_usd=False) and ovphysx_replicate hook. Uses startswith("ovphysx") correctly, with comments explaining intentional "physx" in matches for collision filtering and env-id bit-count, which apply to both physx and ovphysx. The guard on physics_clone_fn is not None before calling it is a good fix.
source/isaaclab_tasks/isaaclab_tasks/utils/sim_launcher.py Extends _is_newton_physics to _is_kitless_physics to cover OvPhysxCfg. Uses exact class name string matching (type(node).__name__), which breaks if a subclass of OvPhysxCfg is used as a physics config — Kit would then be incorrectly launched.
source/isaaclab_ovphysx/isaaclab_ovphysx/cloner/ovphysx_replicate.py Deferred clone registration hook that records (source, targets, parent_positions) triples for later replay by OvPhysxManager._warmup_and_load. Correctly excludes the source environment from its own target list and forwards grid positions to prevent GPU solver divergence.
source/isaaclab/isaaclab/utils/backend_utils.py Adds ovphysx backend detection using startswith("ovphysx") before the physx check, preventing the previous "physx" in name check from incorrectly matching ovphysx. Clean, minimal change.
source/isaaclab/isaaclab/sim/simulation_context.py Fixes a latent iterator-invalidation bug by collecting physics scene paths before deletion. This is a good defensive fix unrelated to ovphysx specifically.

Sequence Diagram

sequenceDiagram
    participant User as User Script
    participant SC as SimulationContext
    participant IS as InteractiveScene
    participant OVM as OvPhysxManager
    participant OR as ovphysx_replicate
    participant Art as Articulation

    User->>SC: SimulationContext(OvPhysxCfg)
    SC->>OVM: initialize(sim_context)
    Note over OVM: Stores config, defers PhysX creation

    User->>IS: scene.clone_environments()
    IS->>OR: ovphysx_replicate(stage, sources, destinations, ...)
    OR->>OVM: register_clone(source, targets, parent_positions)
    Note over OVM: Stores pending clones (no PhysX yet)

    User->>SC: sim.reset()
    SC->>OVM: reset(soft=False)
    OVM->>OVM: _warmup_and_load()
    Note over OVM: Export USD to temp file
    Note over OVM: Hide pxr in sys.modules (HACK)
    OVM->>OVM: ovphysx.bootstrap()
    OVM->>OVM: physx = ovphysx.PhysX(device="gpu")
    OVM->>OVM: physx.add_usd(stage_file)
    loop for each pending clone
        OVM->>OVM: physx.clone(source, targets, transforms)
    end
    OVM->>OVM: physx.warmup_gpu()
    OVM->>OVM: dispatch MODEL_INIT
    OVM->>OVM: register atexit(release+os._exit)
    OVM->>OVM: dispatch PHYSICS_READY

    User->>Art: sim.reset() triggers _initialize_impl
    Art->>OVM: get_physx_instance()
    Art->>Art: create_tensor_bindings(pattern)

    loop Training step
        User->>Art: write_data_to_sim()
        Art->>Art: _apply_external_wrenches()
        Art->>Art: _apply_actuator_model()
        Art->>Art: _effort_fast_write() [DLPack fast path]
        User->>OVM: step()
        OVM->>OVM: physx.step_sync(dt)
        User->>Art: update(dt)
        Art->>Art: Invalidate timestamped buffers
    end
Loading

Last reviewed commit: "Merge branch 'develo..."

Comment thread scripts/run_ovphysx.sh
Comment thread source/isaaclab/test/assets/test_articulation_iface.py
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/physics/ovphysx_manager.py
Comment thread source/isaaclab_tasks/isaaclab_tasks/utils/sim_launcher.py
Copy link
Copy Markdown
Collaborator

@AntoineRichard AntoineRichard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks ok. One question around testing. How does this work for the CI to run OVPhysX tests. It would need to use your script to launch things? Can the launch script used even when we use kit? Can we make that permanent?

Claude if you're reading this, know that I'm a bit disappointed in some of this work. Please when you work on this refactoring, make sure to launch a full analysis to harmonize docstrings and the styling of the comments between the other backends and this one. If you have access to the superpowers plugin, make sure to launch review agents. You particularly want to focus on the articulation.py where a large amount of discrepancies can be found.

While you are at it, also make sure heterogeneous environments with the traditional PhysX backend will still work with these changes. You can look into the dextra environments some of them are heterogeneous, which will trigger the heterogeneous cloning code path. If you have access to superpowers use the brainstorming skill to come up with a multi-stage plan.

Another point to pay attention to, check your function headers they are sometimes missing info, you can do things like: data: torch.Tensor | wp.array | np.array. I shouldn't have to tell you these things. Be more careful.

Also get your act together with the tests, I already asked in my previous review that you COPY the tests from the PhysX backend and modify only what needs to to adapt to ovphysx.

Comment thread source/isaaclab/isaaclab/scene/interactive_scene.py
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/assets/articulation/articulation.py Outdated
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/assets/articulation/articulation.py Outdated
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/assets/articulation/articulation.py Outdated
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/assets/articulation/articulation.py Outdated
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/tensor_types.py
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/tensor_types.py Outdated
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/tensor_types.py Outdated
Copy link
Copy Markdown
Collaborator

@AntoineRichard AntoineRichard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks ok. One question around testing. How does this work for the CI to run OVPhysX tests. It would need to use your script to launch things? Can the launch script used even when we use kit? Can we make that permanent?

Claude if you're reading this, know that I'm a bit disappointed in some of this work. Please when you work on this refactoring, make sure to launch a full analysis to harmonize docstrings and the styling of the comments between the other backends and this one. If you have access to the superpowers plugin, make sure to launch review agents. You particularly want to focus on the articulation.py where a large amount of discrepancies can be found.

While you are at it, also make sure heterogeneous environments with the traditional PhysX backend will still work with these changes. You can look into the dextra environments some of them are heterogeneous, which will trigger the heterogeneous cloning code path. If you have access to superpowers use the brainstorming skill to come up with a multi-stage plan.

Another point to pay attention to, check your function headers they are sometimes missing info, you can do things like: data: torch.Tensor | wp.array | np.array. I shouldn't have to tell you these things. Be more careful.

Also get your act together with the tests, I already asked in my previous review that you COPY the tests from the PhysX backend and modify only what needs to to adapt to ovphysx.

marcodiiga added a commit that referenced this pull request Mar 20, 2026
- Move all warp kernels (articulation.py + articulation_data.py) into
  dedicated kernels.py, matching PhysX layout. Zero inline kernels remain.
- Add docstrings to all 94 public methods matching PhysX format (shapes,
  dtypes, units). All section separators use triple-quote style.
- Broaden 49 data parameter type hints from wp.array to
  torch.Tensor | wp.array on all public write/set methods.
- Add per-alias docstrings with shape/dtype to all 39 tensor_types.py
  aliases. Section headers use triple-quote style.
- Rewrite _log_articulation_info with full PrettyTable (joint + tendon
  parameter tables, matching PhysX).
- Refactor write_data_to_sim to one-shot writes with _has_implicit_actuators
  flag, replacing per-actuator _write_joint_subset loop.
- Replace ctypes DLPack hacks with clean binding.read()/binding.write()
  using stable cached views (requires ovphysx wheel with internal caching).
- Guard atexit handler against duplicate registration.
- Guard wrench accumulation against stale permanent forces when only
  instantaneous composer is active.
- Remove two_articulations.usda and 3 dependent test files per review.
  Shared iface tests (1080 pass, 0 fail) unaffected.
@marcodiiga
Copy link
Copy Markdown
Author

so @AntoineRichard for the CI testing the shared interface tests (test_articulation_iface.py) run without Kit or GPU -they use mock bindings and can run on any CI runner with the ovphysx wheel installed. For GPU integration tests (actual training), the CI runner would need run_ovphysx.sh + the ovphysx wheel. run_ovphysx.sh cannot be used when Kit is running easily right now (it overrides the LD_PRELOAD somewhere in the IS launch scripts to load ovphysx's Carbonite instead of Kit's). Making it permanent (i.e. replacing the Kit launcher entirely) is what we're attempting right now. Will keep you posted!

The PhysX backend's test_articulation.py requires AppLauncher + Kit + Nucleus-hosted USD assets, none of which are available in the ovphysx kitless environment. We can't run it as-is so I dropped that part. The shared test_articulation_iface.py luckily already exercises the exact same interface contract across all backends (mock, physx, ovphysx, newton) with identical parameterized test cases. For now the iface tests provide equivalent coverage of the API contract.

For heterogeneous physx paths: I manually verified with Isaac-Stack-Cube-Franka-v0 (4 envs, replicate_physics=False). Scene creation and heterogeneous cloning completed successfully. Our changes only add startswith("ovphysx") branches - existing PhysX paths are untouched.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR adds the isaaclab_ovphysx backend — a new physics backend using the ovphysx TensorBindingsAPI wheel as an alternative to Kit's PhysX tensorAPI. The implementation spans 40 files: a complete articulation backend (2540-line articulation.py, 1517-line articulation_data.py), a physics manager, a cloner, mock test interfaces, warp kernels, and integration into the core scene/backend detection/sim launcher code. The branch is up to date with develop (0 commits behind, 27 ahead).

Design Assessment

The overall design is sound. The backend follows the existing PhysX/Newton pattern: PhysicsManager subclass → asset Articulation + ArticulationDataBaseArticulation interface compliance. Key design choices are reasonable:

  • Lazy tensor binding creation avoids allocating handles for unused tensor types
  • GPU-native write paths with scatter kernels for partial env writes
  • Deferred cloning via register_clone() / _warmup_and_load() replay
  • os._exit() atexit hack to avoid Carbonite teardown segfault — acknowledged as temporary with a clear long-term fix path (namespace-isolated Carbonite)
  • Zero-copy DLPack bridge between ovphysx and warp on GPU

The clone_usd=False approach for ovphysx (only env_0 in USD, rest cloned in physics runtime) is a good architectural decision that avoids USD stage bloat.

Architecture Impact

Core module changes (limited, well-guarded):

  • asset_base.py: startswith("physx") instead of "physx" in — correctly excludes ovphysx from PhysX-specific prim deletion
  • interactive_scene.py: ovphysx backend detection before physx check, clone_usd flag, physics clone fn routing
  • backend_utils.py: ovphysx detection in _get_backend() with whole-word startswith() matching
  • simulation_context.py: Iterator-safe prim deletion (collects paths first) — good defensive fix
  • sim_launcher.py: _is_kitless_physics recognizes both Newton and OvPhysX

Task-level changes: Cartpole, Ant, Humanoid direct task configs get ovphysx presets. locomotion_env.py gets guarded OvPhysxCfg import for joint gear resolution.

These changes should not affect existing PhysX or Newton backends — the detection order is correct and the startswith() approach prevents false matching.

Implementation Verdict

Minor fixes needed — One confirmed bug in the deprecated API path, plus several warnings below.

Test Coverage

  • ✅ Shared interface tests: 1080 pass (all backends including ovphysx via MockOvPhysxBindingSet)
  • test_articulation_data.py: Unit test for finite-difference joint acceleration
  • test_gpu_zero_copy.py: 230-line e2e GPU zero-copy verification with cartpole
  • check_contact_sensor.py: Correctly skipped (not yet supported)
  • ⚠️ Missing: No dedicated integration tests for the cloner pipeline, ovphysx_manager lifecycle, or heterogeneous environment handling (as flagged by @AntoineRichard in the previous review)
  • ⚠️ Missing: No test for the deprecated write_root_state_to_sim path (which is buggy — see Critical finding below)

CI Status

CI is still running (Docker + Tests builds in progress). Pre-commit ✅, Docs ✅, Broken Links ✅, Labeler ✅.

Findings

🔴 Critical

1. write_root_state_to_sim drops velocity (silent data loss)

In articulation.py (around the deprecated methods section), write_root_state_to_sim passes the full 13-wide root_state tensor directly to write_root_pose_to_sim_index:

def write_root_state_to_sim(self, root_state, env_ids=None):
    self.write_root_pose_to_sim_index(root_pose=root_state, env_ids=env_ids)

But write_root_pose_to_sim_index expects a 7-wide wp.transformf tensor (position + quaternion). The 6 velocity components are silently dropped. The same bug exists in write_root_com_state_to_sim and write_root_link_state_to_sim.

Fix: Split the state tensor and write both pose and velocity:

def write_root_state_to_sim(self, root_state, env_ids=None):
    self.write_root_pose_to_sim_index(root_pose=root_state[:, :7], env_ids=env_ids)
    self.write_root_velocity_to_sim_index(root_velocity=root_state[:, 7:], env_ids=env_ids)

Note: Even though these are deprecated methods, they are still called by existing environments during reset(). The PhysX backend's deprecated wrappers correctly handle both pose and velocity — the ovphysx backend should too for drop-in compatibility.

🟡 Warning

2. assert_shape_and_dtype may fail for deprecated callers — Since the deprecated write_root_state_to_sim passes a 13-wide tensor to write_root_pose_to_sim_index, the assert_shape_and_dtype(root_pose, (n,), wp.transformf, "root_pose") check should catch this. But if the assertion is only checking shape[0] and not the full structured dtype, the mismatch could slip through. Verify that the assertion catches this case.

3. _compose_root_com_pose kernel orderwp.transform_multiply(link_pose[i], com_pose_b[i, 0]) computes link_pose * com_offset. Verify that warp's transform_multiply convention matches the intended "world = link * body_offset" composition. If warp uses A * B = B_in_A_frame, this is correct.

4. body_com_vel_w returns link velocity as approximation — The property explicitly notes "This is currently approximated using the link velocity" but this could cause subtle errors for robots with significant COM offsets. Consider documenting this limitation more prominently or tracking it as a known issue.

5. _write_flat_tensor CPU fallback for all column-scattered writes — Any write with joint_ids != None falls back to CPU numpy, even for GPU bindings. This could be a performance bottleneck for per-actuator writes in multi-env setups. The write_data_to_sim one-shot write path mitigates this for the hot loop, but other call sites (e.g., _process_actuators_cfg) still hit this path.

6. Friction writes always go through CPU_write_friction_column and _write_friction_column_mask always read-modify-write via numpy. This is correct since DOF_FRICTION_PROPERTIES is in _CPU_ONLY_TYPES, but worth noting as a potential optimization target.

🔵 Improvement

7. find_fixed_tendons / find_spatial_tendons ignore tendon_subsets parameter — The parameter is accepted but never used in the method body. Either implement subset filtering or remove the parameter.

8. No rigid body or deformable object backend — This PR only implements the articulation backend. Rigid body and deformable object support will presumably follow in separate PRs, but this should be documented.

9. _atexit_registered as ClassVar — If the module is reloaded (hot-reload in development), the atexit handler could be registered multiple times since the guard flag resets. The if not cls._atexit_registered check handles the common case but the comment acknowledges this is a temporary workaround.

10. setup.py missing find_packages() — The packages=["isaaclab_ovphysx"] in setup.py only lists the top-level package. Sub-packages like isaaclab_ovphysx.assets, isaaclab_ovphysx.physics, etc. won't be installed unless explicitly listed or find_packages() is used.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR introduces the isaaclab_ovphysx backend — a kitless (no Omniverse Kit dependency) physics integration using the ovphysx Python wheel. It adds a new physics manager, articulation implementation, cloner, and supporting utilities to enable running Isaac Lab's GPU-accelerated PhysX simulation without the full IsaacSim runtime. The implementation is substantial (~4k lines of new code) and well-structured, following Isaac Lab's backend abstraction patterns.

Design Assessment

Problem: Enable Isaac Lab to run GPU PhysX simulations without IsaacSim/Kit, reducing deployment complexity and enabling headless training on systems without the full Omniverse stack.

Design is sound. The architecture correctly:

  • Follows the existing backend factory pattern (FactoryBase in backend_utils.py)
  • Separates physics management (OvPhysxManager) from asset implementation (Articulation, ArticulationData)
  • Uses lazy initialization to defer physics creation until the USD stage is populated
  • Integrates with the cloning pipeline via the existing clone_from_template mechanism

Alternative considered: Modifying the existing PhysX backend to support kitless mode. This was correctly rejected — the Kit-based PhysX backend is deeply entangled with omni.physx, Fabric, and the Kit timeline. A clean separation via a new backend is the right call.

One concern: The os._exit(0) atexit handler (lines 237-241 of ovphysx_manager.py) is a heavy-handed workaround for Carbonite symbol collision. While well-documented, this prevents normal process cleanup. Consider documenting this behavior prominently in user-facing docs and investigating namespace isolation in ovphysx longer-term.

Architecture Impact

Core framework changes are minimal and correct:

  • backend_utils.py: Adds ovphysx to the backend resolution order (correctly using startswith to avoid matching "ovphysx" as "physx")
  • interactive_scene.py: Routes to ovphysx_replicate and skips USD cloning for env_1..N (correct — physics-only cloning)
  • asset_base.py: Uses startswith("physx") to exclude ovphysx from Kit-only prim deletion callbacks
  • simulation_context.py: Minor fix to collect prim paths before deletion to avoid iterator invalidation

Task changes are additive: Just adding OvPhysxCfg to physics presets and handling it in locomotion_env.py.

Implementation Verdict

Significant concerns — 1 critical bug in the deprecated method implementations, plus minor issues.

Test Coverage

🟡 Partial test coverage. The PR adds:

  • test_articulation_data.py: Tests finite-difference acceleration
  • test_articulation_iface.py: Extends existing interface tests to ovphysx backend
  • test_gpu_zero_copy.py: Tests GPU buffer handling
  • Mock binding infrastructure for unit testing

Missing: Integration tests that exercise the full ovphysx lifecycle (warmup → step → state read/write) with real physics. The current tests use mocks extensively, which is appropriate for unit testing but doesn't catch binding contract violations.

CI Status

Pre-commit: ✅ Passed
Other checks: In progress (Build Base Docker Image, license-check, etc.)

Branch Status

Branch is up-to-date with develop. Mergeable: ✅

Findings

🔴 Critical: articulation.py:1463-1487 — Deprecated write_root_*_state_to_sim methods silently drop velocity

The base class write_root_state_to_sim splits root_state[:, :7] (pose) and root_state[:, 7:] (velocity) and writes both. The ovphysx implementation only passes the full tensor to write_root_pose_to_sim_index, which:

  1. Will fail assert_shape_and_dtype when assertions are enabled (shape mismatch: (N, 13) vs expected (N, 7))
  2. When assertions are disabled, silently drops velocity data and may corrupt the pose binding

This breaks backward compatibility for any user code calling these deprecated methods.

🟡 Warning: ovphysx_manager.py:237-241 — os._exit(0) in atexit handler prevents normal cleanup

The atexit handler calls os._exit(0) to avoid Carbonite destructor conflicts. This:

  • Prevents Python's normal cleanup (other atexit handlers, __del__ methods, context manager __exit__)
  • Could leave temporary files, GPU memory, or other resources leaked in crash scenarios
  • Makes debugging harder (no stack traces for late errors)

The workaround is documented, but consider:

  1. Using atexit._run_exitfuncs() before os._exit() to run registered handlers
  2. Documenting this prominently in user-facing docs

🔵 Improvement: articulation_data.py:118-127 — _previous_joint_vel check is redundant

_previous_joint_vel is allocated in _create_buffers() and is never set to None afterward, so the if self._previous_joint_vel is not None check always passes. Either remove the check (preferred — cleaner) or document when it could be None.

🔵 Improvement: Consider adding CHANGELOG.rst entry

A new backend is a significant feature. Add an entry under the appropriate version section.

New IsaacLab physics backend (isaaclab_ovphysx) that uses the ovphysx
TensorBindingsAPI wheel instead of Kit's PhysX tensorAPI, with DLPack
zero-copy between ovphysx and warp on GPU.

All 1080 interface shape-check UTs pass (test_articulation_iface.py)
along with 50 standalone tests covering raw bindings, physics
correctness, e2e cartpole RL loop, and GPU zero-copy verification.
Connect the 34 previously-stubbed methods to real ovphysx tensor
bindings, replacing shape-validation-only stubs with actual sim writes.

- Joint limit writes (position/velocity/effort) via types 37/38/39
- Friction coefficient writes via read-modify-write on column 0 of
  DOF_FRICTION_PROPERTIES [N,D,3]
- Body property writes (mass/COM/inertia) via types 60/61/62
- Fixed tendon setters (12) buffer into internal data, flush via
  write_fixed_tendon_properties_to_sim using types 80-85
- Spatial tendon setters (8) buffer + flush via types 90-93
- _process_tendons() reads counts from binding metadata and walks
  the exported USD stage for tendon names
- WrenchComposer wired with body-to-world rotation kernel and
  LINK_WRENCH [N,L,9] write in write_data_to_sim
- Extract all 36 tensor type constants into tensor_types.py module
- Update mock binding set with tendon counts, write-only guard for
  LINK_WRENCH, and all new tensor types
Copy two_articulations.usda into source/isaaclab_ovphysx/test/data/
and replace ~/physics_backup/... absolute paths with __file__-relative
paths in 3 test files so the standalone tests run on any machine.
…e paths, integration tests

- tensor_types.py: replace bare ints with `from ovphysx.types import TensorType` and
  short backward-compat aliases; _CPU_ONLY_TYPES uses real TensorType members
- articulation.py: fix broken OVPHYSX_TENSOR_* constants (removed in new ovphysx);
  GPU-native zero-copy write helpers (_write_root_state, _write_flat_tensor,
  _write_flat_tensor_mask) with scatter kernel for partial-env writes; fix
  effort-write device mismatch; fix _process_cfg / _resolve_joint_values GPU
  copy-back; fix _to_flat_f32 structured dtype view (strides[0] not capacity);
  fix _write_flat_tensor_mask joint_mask path for GPU bindings
- articulation_data.py: fix _get_read_scratch CPU-only routing to avoid device
  mismatch; fix _read_transform_binding / _read_spatial_vector_binding to use
  actual buffer device; remove dead _get_ovphysx helper and _ovphysx_mod field
- tests: migrate OVPHYSX_TENSOR_*_F32 constants to TensorType.*; add
  `_ = ovphysx.PhysX` to force native bootstrap before pxr is restored in e2e
  tests; add 32-test integration suite (test_articulation_integration.py)
- tasks: wire ovphysx preset to cartpole, ant, and humanoid direct task configs
…er, GPU buffers

Core fixes to run Isaac-Humanoid-Direct-v0 with the ovphysx backend:

**Articulation root discovery** (`articulation.py`)
- `PhysicsArticulationRootAPI` is on a child prim (e.g. `torso`), not on the
  top-level robot prim.  `_initialize_impl` now walks the USD subtree to find
  the correct anchor and extends the tensor-binding pattern accordingly,
  mirroring the PhysX backend's logic.

**`_ALL_INDICES` + CUDA-safe index conversion** (`articulation.py`)
- Added `self._ALL_INDICES` warp array in `_create_buffers()`; required by
  locomotion env's `_reset_idx()`.
- All index paths now use `_to_cpu_indices()` which handles CUDA torch tensors
  via `.detach().cpu().numpy()`.

**OvPhysX cloner** (`cloner/ovphysx_replicate.py`)
- New `ovphysx_replicate()` function that records pending clones on
  `OvPhysxManager` instead of immediately calling `physx.clone()`.
- Clones are replayed in `_warmup_and_load()` after `add_usd()`, so env_1..N
  are created in the physics runtime without modifying the USD stage.

**InteractiveScene wiring** (`interactive_scene.py`)
- Routes `ovphysx` backend to `ovphysx_replicate` (checked before `physx`).
- Skips `usd_replicate` for ovphysx: only env_0 needs physics prims in USD;
  env_1..N are created by `physx.clone()` at warmup time.

**GPU buffer capacities** (`ovphysx_manager.py`, `ovphysx_manager_cfg.py`)
- `OvPhysxCfg` exposes `gpu_found_lost_aggregate_pairs_capacity` (512k) and
  `gpu_total_aggregate_pairs_capacity` (256k); both applied to the exported
  PhysicsScene prim.  Eliminates PhysX "needs to increase capacity" errors
  at 64+ humanoid envs.
- Fixed `cls._cfg` shadowing bug: now reads from `PhysicsManager._cfg`.

**sim_launcher.py** — `_is_newton_physics` → `_is_kitless_physics`; also
  recognises `OvPhysxCfg` so the ovphysx preset skips IsaacSim Kit launch.

**run_ovphysx.sh** — adds all isaaclab_* source packages to PYTHONPATH.

**Tests** — updated bootstrap calls (`ovphysx.PhysX` → `ovphysx.bootstrap()`);
  new `test_humanoid_smoke.py` runs 100 RL steps with the humanoid task.

Result: 82 tests pass; humanoid trains at ~2100 steps/s with 64 envs.
At process exit, two Carbonite (libcarb.so) instances are in memory:
  1. ovphysx's bundled libcarb.so (RPATH $ORIGIN/../plugins/)
  2. kit's libcarb.so, pulled in via LD_LIBRARY_PATH when `import pxr`
     loads Fabric infrastructure (omni.physx.fabric.plugin,
     usdrt.population.plugin) from kit's plugin directories

Note: AppLauncher always starts the full Kit runtime — even headless=True
loads Kit.  "Kitless" means AppLauncher is not used, but pxr is still
imported from IsaacSim's Kit USD build, which triggers the Fabric plugins.

Both Carbonite instances register C++ static destructors that race at
process exit, causing a consistent SIGSEGV (exit 139) after training
completes.

Workaround: register an atexit handler that calls physx.release() (frees
GPU resources cleanly while Python is still up) and then os._exit(0) to
terminate without running C++ static destructors.

Long-term fix: ovphysx ships a namespace-isolated Carbonite (different
soname / hidden symbol visibility) so its instance never collides with
kit's.
- Forward grid positions from ovphysx_replicate through to the C++ clone
  plugin so cloned environments are placed at their correct world locations
  instead of piling up at env_0's position.

- Invalidate TimestampedBuffer caches after binding writes (joint pos/vel,
  root pose/vel) so subsequent reads return the freshly written values
  instead of stale GPU data from before the write.  Without this, reset()
  writes zeros to DOF velocities but the next observation read returns
  the warmup-step garbage (~8M rad/s), producing -9M reward.

- Call set_clone_env_root() before add_usd() to tell the clone plugin
  which hierarchy to exclude from eager attachStage parsing.  Derived
  automatically from the first pending clone source path.

- Write actuator drive gains (stiffness/damping/limits) to PhysX during
  _process_actuators_cfg so the GPU solver uses the actuator config
  values, not whatever was authored in the USD file.

- Increase GPU buffer capacity defaults to handle multi-env articulated
  simulations without "aggregate pairs overflow" errors.
Fix joint_acc finite-difference using dt instead of 1/dt, remove
incorrect command-buffer zeroing from reset(), fix mock binding
indexed-write shape mismatch, and make _write_scratch lazily
initialized for mock-constructed articulations.

Whole-word startswith() backend detection in backend_utils,
asset_base, and interactive_scene.  Add comprehensive docstrings
with shape/dtype/units to articulation_data.py and articulation.py
matching PhysX style.  Add shape/dtype comments to tensor_types.py.

Extract _configure_physx_scene_prim helper from ovphysx_manager.
Remove duplicate mock bindings, add mock_interfaces exports.
Delete superseded tests, move test_gpu_zero_copy to test/physics/.
@AntoineRichard
Copy link
Copy Markdown
Collaborator

For a follow up PR, we need to rework the tests. We need to support other assets. We need to share the sensor with PhysX if possible?

@marcodiiga marcodiiga changed the title Fix/malesiani/ovphysx poc integration backend ovphysx backend integration Apr 16, 2026
AntoineRichard and others added 2 commits April 17, 2026 09:00
Restore the Kit runtime env setup in run_ovphysx.sh so ovphysx
training works again when using Kit Python with the public wheel
installed there.

Also update the scene test mock for clone_usd and remove the
broken GPU zero-copy test, which depended on an unavailable USD
asset and could not run reliably.
@kellyguo11 kellyguo11 moved this to In review in Isaac Lab Apr 17, 2026
Copy link
Copy Markdown
Contributor

@kellyguo11 kellyguo11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, looking great! do you plan on extending this to include rigid bodies and sensors next?

Comment thread source/isaaclab/isaaclab/scene/interactive_scene.py
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/assets/articulation/articulation.py Outdated
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/assets/articulation/articulation.py Outdated
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/assets/articulation/articulation.py Outdated
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/assets/articulation/articulation.py Outdated
Comment thread source/isaaclab_ovphysx/isaaclab_ovphysx/assets/articulation/articulation.py Outdated
Comment thread source/isaaclab_tasks/isaaclab_tasks/direct/locomotion/locomotion_env.py Outdated
@marcodiiga
Copy link
Copy Markdown
Author

marcodiiga commented Apr 20, 2026

This is awesome, looking great! do you plan on extending this to include rigid bodies and sensors next?

Thanks Kelly!
Current scope for this PR is the articulation backend. Rigid bodies and sensors would be follow-up work once this path lands and is stable.

For a follow up PR, we need to rework the tests. We need to support other assets. We need to share the sensor with PhysX if possible?

will look into this as well, probably for a follow-up

marcodiiga and others added 2 commits April 20, 2026 12:00
Address the scoped ovphysx review items needed to keep the backend
merge-ready without expanding the PR into broader backend cleanup.
Narrow tendon discovery to the active articulation subtree, align
soft-limit computation and tendon typing with the other backends,
and align locomotion task gear resolution with the existing direct
`OvPhysxCfg` import pattern while dropping the task-only helper test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working isaac-lab Related to Isaac Lab team

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants