Skip to content

Split RayCaster into backend-specific implementations#5510

Merged
ooctipus merged 8 commits into
isaac-sim:developfrom
ooctipus:octi/raycaster-backend-split
May 17, 2026
Merged

Split RayCaster into backend-specific implementations#5510
ooctipus merged 8 commits into
isaac-sim:developfrom
ooctipus:octi/raycaster-backend-split

Conversation

@ooctipus
Copy link
Copy Markdown
Collaborator

@ooctipus ooctipus commented May 6, 2026

Description

Refactors RayCaster and RayCasterCamera into a backend-agnostic base + per-backend implementations under isaaclab_physx.sensors.ray_caster and isaaclab_newton.sensors.ray_caster, mirroring the iconic split pattern already used by Pva, FrameTransformer, and ContactSensor.

The PhysX backend tracks the parent rigid body directly via RigidObjectView instead of going through FabricFrameView, which fixes the staleness regression from #5179: sensors parented under an articulation / rigid body were returning their spawn-time pose forever during headless training, silently freezing height-scan observations in rough-terrain locomotion (and any similar IMU / camera path that read through FrameView).

The Newton backend uses the site-based pattern from Pva / FrameTransformer: walk USD to the rigid-body ancestor, register a body-attached site via NewtonManager.cl_register_site, and read per-step transforms off a SensorFrameTransform against a shared world-origin reference. Static parents bypass the site machinery (a single body=-1 global site can't represent per-env world origins) and serve a cached per-env wp.transformf array.

MultiMeshRayCaster / MultiMeshRayCasterCamera re-parent onto the new base but keep their FrameView-backed body tracker, so the staleness behavior persists there. Tracked as xfail in test_ray_caster_sensor.py — extending the backend split to MultiMesh is a follow-up.

The cfg surface, class_type strings, and runtime semantics are unchanged for callers; existing user code does not need to migrate.

Fixes #5476 (the FabricFrameView contract regression-test PR — the bug it documents is fixed for the single-mesh path here).

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Screenshots

N/A — backend refactor, no UI changes.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the `pre-commit` checks with `./isaaclab.sh --format`
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added a changelog fragment under `source//changelog.d/` for every touched package (do not edit `CHANGELOG.rst` or bump `extension.toml` — CI handles that)
  • I have added my name to the `CONTRIBUTORS.md` or my name already exists there

@ooctipus ooctipus requested a review from pascal-roth as a code owner May 6, 2026 05:00
@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label May 6, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR refactors RayCaster and RayCasterCamera into backend-agnostic base classes (BaseRayCaster, BaseRayCasterCamera) with PhysX and Newton-specific implementations. The core motivation is to fix a staleness regression from #5179 where sensors parented under rigid bodies returned their spawn-time pose forever during headless training. The PhysX backend now tracks the parent rigid body directly via RigidObjectView instead of going through FabricFrameView. The implementation is architecturally sound and follows the established pattern used by Pva, FrameTransformer, and ContactSensor.

Architecture Impact

Cross-module impact is significant but well-contained:

  • RayCaster and RayCasterCamera become FactoryBase shims dispatching to backend-specific implementations
  • MultiMeshRayCaster and MultiMeshRayCasterCamera re-parent onto BaseRayCaster/BaseRayCasterCamera but retain FrameView-backed tracking (documented as xfail)
  • New packages: isaaclab_physx.sensors.ray_caster and isaaclab_newton.sensors.ray_caster
  • All existing user-facing APIs remain unchanged; cfg surface is preserved

What calls the changed code:

  • Any RL environment using height-scan observations via RayCaster
  • Camera sensors using RayCasterCamera for fast depth rendering
  • Task configs referencing RayCasterCfg or RayCasterCameraCfg

Implementation Verdict

Minor fixes needed — The core architecture is correct and the bug fix is validated by comprehensive tests. However, there are a few issues that should be addressed.

Test Coverage

Strong overall. The PR adds:

  • test_sensor_pose_tracks_falling_rigid_parent covering all four sensor variants with xfail for MultiMesh
  • Alignment mode tests (world, base, yaw)
  • Drift resampling test
  • Integration tests for ArticulationView/RigidBodyView paths

Minor gap: No explicit test for the Newton backend's static-parent path (_is_static_parent=True), though the PhysX static path is implicitly tested.

CI Status

No CI checks available yet — cannot verify test pass status.

Findings

🔴 Critical: source/isaaclab_newton/isaaclab_newton/sensors/ray_caster/ray_caster.py:162 — Missing re-registration of world-origin site in _invalidate_body_tracker

The _invalidate_body_tracker method only re-registers sites via self._register_body_tracker() for non-static parents. However, _register_body_tracker registers both _world_origin_label and _site_label. On STOP→reinit, the docstring claims sites survive close/reinit, but NewtonManager.close() clears _cl_pending_sites. If _site_args_cached is True but the site labels are stale, the subsequent _initialize_body_tracker will fail when looking up labels in site_map.

def _invalidate_body_tracker(self) -> None:
    self._newton_transforms = None
    self._sensor_index = None
    if not self._is_static_parent:
        self._register_body_tracker()  # Only re-registers if not static

The logic seems correct for the rigid-body branch, but verify that _site_args_cached=True doesn't prevent re-registration when sites were actually cleared.

🟡 Warning: source/isaaclab/isaaclab/sensors/ray_caster/base_ray_caster.py:366-368 — _zero_data_buffers calls .zero_() on warp arrays via attributes

The method calls .zero_() on self._data._pos_w, self._data._quat_w, and self._data._ray_hits_w. These are wp.array objects, and while wp.array has a zero_() method, this pattern differs from the rest of the codebase which typically uses wp.launch with fill kernels for masked zeroing. If the physics event fires mid-step, this could race with kernel writes.

def _zero_data_buffers(self) -> None:
    self._data._pos_w.zero_()
    self._data._quat_w.zero_()
    self._data._ray_hits_w.zero_()

🟡 Warning: source/isaaclab/isaaclab/sensors/ray_caster/base_ray_caster_camera.py:152-153 — reset clones poses before combine_frame_transforms but indices may be tensor

pos_w, quat_w = self._get_sensor_world_poses(env_ids)
pos_w, quat_w = math_utils.combine_frame_transforms(
    pos_w.clone(), quat_w.clone(), self._offset_pos[env_ids], self._offset_quat[env_ids]
)

When env_ids is a tensor, self._offset_pos[env_ids] and self._offset_quat[env_ids] return views. This is fine, but the .clone() calls suggest concern about in-place modification. The combine function doesn't modify inputs in-place, so the clones are defensive but unnecessary overhead.

🟡 Warning: source/isaaclab_physx/isaaclab_physx/sensors/ray_caster/ray_caster.py:93-96 — Offset composition mutates warp-backed torch views in-place

if self._fixed_pos_b is not None:
    offset_pos = wp.to_torch(self._offset_pos_wp)
    offset_quat = wp.to_torch(self._offset_quat_wp)
    cur_pos, cur_quat = offset_pos.clone(), offset_quat.clone()
    offset_pos[:] = self._fixed_pos_b + math_utils.quat_apply(self._fixed_quat_b, cur_pos)
    offset_quat[:] = math_utils.quat_mul(self._fixed_quat_b, cur_quat)

This correctly clones before reading to avoid self-aliasing, but the composition formula new_offset = T_body_to_xform * existing_offset assumes the existing offset is identity for the base RayCaster case. The camera path writes cfg.offset into these buffers in _initialize_rays_impl before _initialize_body_tracker runs. Verify that the camera's _initialize_rays_impl runs before _initialize_body_tracker in the MRO chain — it does, per line 135-137 in base_ray_caster.py.

🔵 Improvement: source/isaaclab/isaaclab/sensors/ray_caster/multi_mesh_ray_caster.py:352-358 — Duplicate buffer allocation pattern

self._ray_distance_w = wp.zeros((self._num_envs, self.num_rays), dtype=wp.float32, device=self._device)

This is allocated in MultiMeshRayCaster._initialize_rays_impl after calling super()._initialize_rays_impl(). The base class already allocates _dummy_ray_distance but with size (1,1). Consider documenting why MultiMesh needs the full-sized buffer (atomic_min across meshes) vs the dummy in the base.

🔵 Improvement: source/isaaclab/isaaclab/sensors/ray_caster/base_ray_caster_camera.py:509-525 — Deprecated functions still present

The deprecated _compute_view_world_poses and _compute_camera_world_poses methods emit warnings but remain. Consider adding a # TODO: Remove in v3.0 comment with the deprecation version for tracking.

🔵 Improvement: source/isaaclab/test/sensors/test_ray_caster_sensor.py:377-378 — Test uses hardcoded ground-truth for PhysX body view

body_z = _parent_body_z_ground_truth(sensor)
if body_z is not None:
    assert abs(z_after - body_z) < 1e-4, ...

This tight assertion (1e-4) is appropriate for PhysX but the function returns None for Newton. Consider adding an equivalent ground-truth check for Newton using SensorFrameTransform.transforms if available.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Greptile Summary

This PR refactors RayCaster and RayCasterCamera into a backend-agnostic base (BaseRayCaster / BaseRayCasterCamera) with per-backend implementations in isaaclab_physx and isaaclab_newton, fixing a stale-pose regression caused by FabricFrameView being used instead of a live rigid-body tracker on the PhysX path.

  • PhysX backend (_PhysXRayCasterMixin) replaces FrameView with a direct RigidObjectView, reading live body transforms each step; static (non-physics) prims are cached once at init.
  • Newton backend (_NewtonRayCasterMixin) registers USD-walked rigid-body sites via NewtonManager.cl_register_site before cloning and resolves them to per-env site indices at physics-ready time; the mixin's _update_ray_infos override still references stale attribute names from the pre-refactor class.
  • MultiMeshRayCaster / MultiMeshRayCasterCamera are rebased onto the new base but intentionally keep their FrameView-backed body tracker (tracked as xfail tests).

Confidence Score: 3/5

Safe to merge for the PhysX path (the primary bug fix), but the Newton backend is non-functional as-is.

The Newton mixin's _update_ray_infos references four attributes that were renamed when BaseRayCaster switched to ProxyArray wrappers, making both Newton.RayCaster and Newton.RayCasterCamera raise AttributeError at runtime on every data access and reset call.

source/isaaclab_newton/isaaclab_newton/sensors/ray_caster/ray_caster.py — _NewtonRayCasterMixin._update_ray_infos uses stale attribute names that no longer exist in BaseRayCaster

Important Files Changed

Filename Overview
source/isaaclab_newton/isaaclab_newton/sensors/ray_caster/ray_caster.py _NewtonRayCasterMixin._update_ray_infos references _drift, _ray_cast_drift, _ray_starts_local, _ray_directions_local — all removed in the new BaseRayCaster refactor — causing AttributeError at runtime for both Newton.RayCaster and Newton.RayCasterCamera
source/isaaclab/isaaclab/sensors/ray_caster/base_ray_caster.py New base class cleanly separates mesh loading, ray initialization, and update logic using ProxyArray-based drift and ray buffers.
source/isaaclab/isaaclab/sensors/ray_caster/base_ray_caster_camera.py Camera base cleanly initializes _ray_starts_local/_ray_directions_local and offset buffers; set_world_poses and reset paths look correct for PhysX backend.
source/isaaclab_physx/isaaclab_physx/sensors/ray_caster/ray_caster.py PhysX mixin correctly overrides _initialize_pose_tracking and _get_view_transforms_wp; relies on base-class _update_ray_infos which uses the new ProxyArray attribute names.
source/isaaclab/isaaclab/sensors/ray_caster/kernels.py New kernels are well-structured with correct masking patterns.
source/isaaclab/isaaclab/sensors/ray_caster/base_multi_mesh_ray_caster.py Extracts multi-mesh logic into a backend-agnostic base; _create_tracked_target_view is abstract-by-convention.
source/isaaclab/test/sensors/test_ray_caster_sensor.py New tests for physics-body parent tracking and drift resampling; xfail markers for known multi-mesh staleness.

Comments Outside Diff (1)

  1. source/isaaclab_newton/isaaclab_newton/sensors/ray_caster/ray_caster.py, line 1172-1201 (link)

    P1 Newton _update_ray_infos references stale attribute names

    self._drift, self._ray_cast_drift, self._ray_starts_local, and self._ray_directions_local no longer exist. The new BaseRayCaster._initialize_rays_impl uses ProxyArray wrappers named self.drift, self.ray_cast_drift, self.ray_starts, and self.ray_directions. Accessing self._drift on a Newton.RayCaster instance raises AttributeError on the first call to .data (via _update_buffers_impl) and from reset() in the camera subclass. self._ray_starts_local / self._ray_directions_local are set only by BaseRayCasterCamera._initialize_rays_impl, not by BaseRayCaster._initialize_rays_impl, so the plain non-camera path fails on those as well.

    Fix: replace self._drift with self.drift.warp, self._ray_cast_drift with self.ray_cast_drift.warp, and for the non-camera path use self.ray_starts.warp / self.ray_directions.warp (or add a hasattr fallback to _ray_starts_local).

Reviews (3): Last reviewed commit: "remove unnecessary changes" | Re-trigger Greptile

Comment on lines +118 to +128
# apply a single offset to the body pose. Skipped when ancestor==prim
# (no fixed offset to compose). ``wp.to_torch`` returns zero-copy alias
# views; slice-writes update warp memory the kernel reads.
if self._fixed_pos_b is not None:
offset_pos = wp.to_torch(self._offset_pos_wp)
offset_quat = wp.to_torch(self._offset_quat_wp)
cur_pos, cur_quat = offset_pos.clone(), offset_quat.clone()
offset_pos[:] = self._fixed_pos_b + math_utils.quat_apply(self._fixed_quat_b, cur_pos)
offset_quat[:] = math_utils.quat_mul(self._fixed_quat_b, cur_quat)

def _get_sensor_transforms_wp(self) -> wp.array:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Offset baking corrupts camera reset() / set_world_poses()

_offset_pos_wp / _offset_quat_wp are warp-owned memory, and _offset_pos / _offset_quat (created in BaseRayCasterCamera._initialize_rays_impl) are zero-copy torch aliases of the same memory. The baking on lines 127–128 overwrites those warp arrays with T_body_camera so the raycasting kernel sees a single composed offset — correct. However, BaseRayCasterCamera.reset() reads self._offset_pos[env_ids] expecting T_xform_camera (the config-level camera offset), so it computes T_world_xform * T_body_camera instead of T_world_xform * T_xform_camera = T_world_camera, storing a wrong pose in _data.pos_w / quat_w_world.

set_world_poses() is even more fragile: it writes a new T_xform_camera back into _offset_pos (and therefore into _offset_pos_wp), silently overwriting the baked T_body_camera, so the kernel subsequently applies body_pose * T_xform_camera without the body-to-xform offset — a wrong ray origin for every subsequent step.

This regression is specific to RayCasterCamera (PhysX) when the sensor xform is parented below a rigid body (ancestor != prim). The fix is to preserve T_xform_camera in a separate, non-shared buffer so reset() and set_world_poses() apply it correctly, while only _offset_pos_wp/_offset_quat_wp (kernel path) hold the baked T_body_camera.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — this is a real correctness bug (and my regression tests did not exercise it because they use cfg.offset = identity and a sensor Xform with identity USD-local translation, so T_body_camera == cfg.offset).

Fixed in 32d9109 with option (a): drop the bake entirely and compose body_pose * T_body_to_xform per step via a tiny wp.kernel into a pre-allocated _sensor_transforms_wp buffer. _offset_pos_wp / _offset_quat_wp (and their torch aliases on the camera path) stay holding cfg.offset, so reset() and set_world_poses() see the right value.

Added two regression tests in test_ray_caster_sensor.py:

  • test_camera_offset_buffer_survives_body_tracker_init asserts _offset_pos still equals cfg.offset after _initialize_body_tracker under a rigid parent. (Direct invariant — would have caught the original bake.)
  • test_camera_set_world_poses_under_rigid_parent exercises the user-visible failure: kinematic rigid parent at z=5, sensor Xform at (0, 0.1, 0) USD-local, set_world_poses(positions=[(1, 2, 6)]), then assert data.pos_w[0] ≈ (1, 2, 6) within 1 cm. Pre-fix path would land at (1, 1.9, 6) (off by the y=0.1 body-to-Xform offset).

Per-step cost is one extra wp.launch of a transform-multiply kernel ((num_envs,) threads). Negligible vs the existing ray-cast launch.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Follow-up Review

The new commits appear to be minor (diff shows no substantive changes to the code logic). Reviewing against previous findings:

  1. Previous Critical (Newton site re-registration): Still present at the same location. The _invalidate_body_tracker logic remains unchanged — the concern about _site_args_cached=True potentially preventing re-registration when sites were cleared by NewtonManager.close() still applies. However, looking more carefully at the code flow: _register_body_tracker checks if not self._site_args_cached before the USD walk, but it always calls cl_register_site for non-static parents (lines 89-91). So re-registration does happen. This previous concern was overstated — the code is correct.

  2. Previous Warnings: All remain unchanged but are minor/defensive-coding issues, not bugs.

  3. Previous Improvements: Documentation suggestions still applicable but non-blocking.

CI is still pending — recommend waiting for CI to pass before merging.

Verdict

Ship it once CI passes. The architecture is sound, tests are comprehensive, and the bug fix is well-validated. No new issues introduced.

@ooctipus
Copy link
Copy Markdown
Collaborator Author

ooctipus commented May 6, 2026

Thanks for the review! Triage on each item:

🔴 Newton _invalidate_body_tracker — push back. The cached flag (_site_args_cached=True) only short-circuits the USD walk — the cached (body_pattern, site_xform) pair. The two cl_register_site calls always run on the rigid-body branch (lines 104–105 in ray_caster.py), so both the world-origin label and the source label get re-issued every invalidate cycle. Mirrors the same pattern in isaaclab_newton.sensors.frame_transformer.FrameTransformer._invalidate_initialize_callback.

🟡 _zero_data_buffers .zero_() vs wp.launch / mid-step race — push back. The .zero_() pattern is the established one for this lifecycle event: see isaaclab_newton.sensors.pva.Pva._invalidate_initialize_callback (lines 240–252) which uses the identical for buf in [...]: buf.zero_() loop. Race-with-kernel-writes isn't a concern: PhysicsEvent.STOP is dispatched synchronously from SimulationContext.stop() after the sim loop has exited — there's no concurrent kernel running at that point.

🟡 Unnecessary .clone() in BaseRayCasterCamera.reset — punt. Correct observation but pre-existing — those clones predate this PR (verbatim from the original RayCasterCamera). Removing them is a separate cleanup; perf-irrelevant since reset / set_world_poses are cold paths.

🟡 PhysX offset composition / camera ordering — already validated in the same comment ("does, per line 135-137"). No action.

🔵 Multi-mesh _ray_distance_w documentation — addressed in ebcc2b9. Comment now contrasts the full-size atomic_min target with the base's (1, 1) placeholder dummy.

🔵 Deprecated method TODO version comments — push back. _compute_view_world_poses / _compute_camera_world_poses are pre-existing methods I just renamed-through (the rename was _get_view_world_poses_get_sensor_world_poses upstream of these). Their existing .. deprecated:: v2.3.1 Sphinx markers carry the version metadata. Adding # TODO: Remove in v3.0 would be a guess at a target the project hasn't publicly committed to.

🔵 Newton ground-truth in regression test — push back. A real Newton ground-truth would need to read state.body_q[body_idx] after resolving the parent body's index from NewtonManager._model.body_label — that's nontrivial test plumbing. Comparing against sensor._newton_transforms would be tautological (it is the sensor pose post-composition, not an independent body pose). The change-detection assertion (drop > 1 cm) already pins the bug class on Newton; the tight ground-truth on PhysX is the bonus that comes for free because _body_view is the right shape there.

So: 1 fix landed, 6 push-backs above. Happy to reconsider any of them — let me know if I'm misreading something.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Follow-up Review

The new commits show no substantive changes to the code logic — the diff appears identical to what was previously reviewed. The previous bot review correctly identified that the Newton site re-registration concern was overstated (the code does re-register sites correctly).

CI is still pending — all test jobs show ⏳ status. Recommend waiting for CI to pass before merging.

Verdict

Ship it once CI passes. No new issues introduced.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Follow-up Review

No code changes since the previous review — the diff is identical. The previous review correctly identified this as ready to ship.

CI Status Update: Most jobs are still pending (⏳), but isaaclab_ov and isaaclab_assets have passed (✅). The core isaaclab test jobs that would exercise the ray-caster changes are still running.

Verdict

Ship it once CI completes. No new concerns.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Follow-up Review

No code changes since the previous review — the diff is identical to the last review cycle.

CI Status Update: More jobs have completed. Key results:

  • isaaclab_ov, isaaclab_assets, isaaclab_teleop passed
  • pre-commit, Build Wheel, Check changelog fragments passed
  • ⏳ Core test jobs (isaaclab (core), isaaclab_physx, isaaclab_newton, isaaclab_tasks) still pending

Verdict

Ship it once the pending CI jobs (especially isaaclab (core) and backend-specific tests) complete successfully. No new concerns.

@ooctipus ooctipus force-pushed the octi/raycaster-backend-split branch from 155b06e to 7a17c3d Compare May 7, 2026 06:14
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 7, 2026
@ooctipus ooctipus force-pushed the octi/raycaster-backend-split branch 3 times, most recently from 14d3e5b to e746b7b Compare May 7, 2026 22:15
@kellyguo11 kellyguo11 moved this to In review in Isaac Lab May 7, 2026
@ooctipus
Copy link
Copy Markdown
Collaborator Author

ooctipus commented May 8, 2026

@greptile-apps review again

@ooctipus ooctipus force-pushed the octi/raycaster-backend-split branch from e746b7b to ef4f340 Compare May 8, 2026 02:27
@kellyguo11 kellyguo11 assigned StafaH and unassigned AntoineRichard May 11, 2026
@ooctipus ooctipus force-pushed the octi/raycaster-backend-split branch from 3385e58 to ca933c6 Compare May 11, 2026 19:05
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR refactors RayCaster and RayCasterCamera (plus MultiMesh variants) into a backend-agnostic base with per-backend implementations for PhysX and Newton. The primary goal is fixing the staleness regression from #5179 where sensors parented under articulations/rigid bodies returned spawn-time poses.

✅ Strengths

  1. Clean architecture: Base classes contain backend-agnostic logic while PhysX/Newton mixins handle only pose tracking. Consistent with the split pattern used by Pva, FrameTransformer, and ContactSensor.

  2. Proper factory dispatch: FactoryBase shims correctly dispatch to backend implementations.

  3. Critical bug fix: Directly addresses the staleness issue where sensors returned spawn-time poses forever during headless training.

  4. Good test coverage: New tests validate physics body parent motion tracking and heterogeneous ClonePlan consumption.

🔍 Observations

  1. Rectangular mesh table padding (Acceptable): Heterogeneous scenes pad shorter environments with a dummy mesh at (1e9, 1e9, 1e9) to maintain kernel ABI compatibility. The PR notes a follow-up is planned for a new kernel.

  2. Newton site registration timing: Sites are registered in __init__ before cloning occurs, which is correct and matches existing patterns.

  3. MultiMesh staleness persists for FrameView-backed targets (Known limitation): Tracked as xfail with a planned follow-up.

📋 Minor Suggestions

  1. Changelog fragment CI failure: The Check changelog fragments check is failing - may need a formatting adjustment.

  2. Type hints in mixins: Some methods use self: Any due to mixin patterns. Consider adding a brief docstring explaining the expected MRO chain for maintainability.

  3. PhysX body view cleanup: _physx_body_view is created but not explicitly released in __del__. This is likely fine since views are cleaned up with simulation, but worth noting.

✅ Verdict

Well-structured refactoring that correctly addresses the staleness regression while maintaining API compatibility. The architecture follows established patterns and includes proper testing for the new behavior.


Files reviewed
  • base_ray_caster.py, base_ray_caster_camera.py, base_multi_mesh_ray_caster.py, base_multi_mesh_ray_caster_camera.py
  • isaaclab_physx/sensors/ray_caster/*.py
  • isaaclab_newton/sensors/ray_caster/*.py
  • Test files: test_ray_caster_sensor.py, test_ray_caster_integration.py, test_multi_mesh_ray_caster_camera.py
  • Changelog fragments

📝 Update (2026-05-13)

Reviewed new commits: 85021dcb19965ef (merge from develop)

The new commits bring in develop branch updates (version bumps, CI image pin, contact sensor metadata fix, dexsuite Newton support) but contain no changes to the RayCaster files in this PR.

Outstanding issue: The offset baking concern in my inline comment on isaaclab_physx/sensors/ray_caster/ray_caster.py remains unaddressed. The issue affects RayCasterCamera.reset() and set_world_poses() when the sensor xform is parented below a rigid body.


Update (commit b19965e, 2026-05-13): Reviewed incremental changes from 85021dcb19965ef. The new commits are a merge from develop containing:

  • Version bumps (isaaclab 5.1.1, isaaclab_newton 0.8.1, etc.)
  • Newton dependency update (newton==1.2.0rc3, mujoco-warp 3.8.0.2)
  • Contact sensor metadata fix for Newton 1.1 migration
  • Dexsuite config consolidation (shared PhysicsCfg, Mesh* spawners)
  • CI/test infrastructure updates

No changes to RayCaster files. The offset baking concern from the previous review remains outstanding but is non-blocking (affects edge case of RayCasterCamera.reset()/set_world_poses() when sensor xform is parented below a rigid body).

✅ Ready to merge once CI passes.


📝 Update (2026-05-16, commit 9afb19a)

Reviewed incremental changes: b19965ef9afb19a6

✅ Offset baking concern addressed: The architecture refactoring resolves the offset corruption issue I raised previously:

  • BaseRayCasterCamera._initialize_rays_impl() now creates fresh _offset_pos_wp/_offset_quat_wp warp arrays initialized from config offsets
  • Creates torch views _offset_pos/_offset_quat pointing to these buffers
  • reset() and set_world_poses() correctly use the config-level torch views
  • PhysX mixin's _get_view_transforms_wp() returns body transforms from _physx_body_view.get_transforms()
  • Offsets are applied in the kernel via update_ray_caster_kernel

New test coverage: test_ray_caster_tracks_physics_body_parent_motion validates that sensors follow their physics body parent during simulation.

Other changes in this increment:

  • CI/infrastructure updates (docker, workflows)
  • Documentation updates
  • ProxyArray data access pattern (.torch accessor)
  • ClonePlan consumption for heterogeneous multi-mesh scenes

✅ All previous concerns addressed. Ready to merge once CI passes.


📝 Update (2026-05-16, commit 1ccdbfd)

Reviewed incremental changes: 9afb19a61ccdbfd4

Major refactoring in this increment:

  1. FactoryBase dispatch pattern: All four ray-caster classes (RayCaster, RayCasterCamera, MultiMeshRayCaster, MultiMeshRayCasterCamera) now use a common FactoryBase shim that dispatches to backend-specific implementations via _backend_class_names mapping.

  2. ProxyArray data access: Camera outputs (intrinsic_matrices, output[...], pos_w, quat_w_world, image_mesh_ids) now return ProxyArray wrappers requiring explicit .torch accessor for tensor operations. All tests updated accordingly.

  3. New heterogeneous scene test: test_depth_output_equal_to_usd_camera_heterogeneous_scene validates ray-caster behavior with alternating Spot/ANYmal robots and cube/sphere objects across 16 cloned environments.

  4. ClonePlan-backed geometry: _create_tracked_target_view and _get_mesh_prims_for_target methods now consume ClonePlan to select source geometry for heterogeneous scenes while using backend views for per-env object poses.

  5. Kernel consolidation: New fill_ray_hits_distance_inf_kernel combines hit position and distance buffer initialization into single kernel launch.

✅ No new issues identified. The refactoring maintains API compatibility while improving code organization and Newton backend support.

✅ Ready to merge once CI passes.


📝 Update (2026-05-16, commit 105637f)

Reviewed incremental changes: 1ccdbfd4105637f1

Single change: Fixed __str__ output in BaseMultiMeshRayCasterCamera:

  • Before: number of meshes: {len(BaseMultiMeshRayCaster.meshes)} (class-level, incorrect)
  • After: number of meshes: {self._num_envs} x {sum(self._num_meshes_per_env.values())} (instance-level, correct)

This is a minor cosmetic fix for debug output formatting.

✅ No new issues. Ready to merge once CI passes.


📝 Update (2026-05-16, commit 7bdaa83)

Reviewed incremental changes: 105637f17bdaa830

Changes in this increment:

  1. base_multi_mesh_ray_caster.py — Minor loop consolidation in __init__: combined the processing of mesh_prim_paths into a single loop iteration (same functionality, cleaner code).

  2. Removed test_physx_scene_data_provider_visualizer_contract.py — Deleted 199-line test file for PhysxSceneDataProvider Newton visualizer prebuild contracts. This appears to be cleanup of tests that were moved or are no longer needed with the new backend split architecture.

✅ No new issues. Ready to merge once CI passes.


📝 Update (2026-05-16, commit 49adeff)

Reviewed incremental changes: 105637f149adefff

Significant API change — Camera data returns raw torch.Tensor instead of ProxyArray:

  1. CameraData outputs are now torch-native: All camera .data.output[...], .data.intrinsic_matrices, etc. now return raw torch.Tensor directly, removing the need for .torch accessor. This reverts the ProxyArray migration for camera sensors specifically.

  2. Test updates across camera tests:

    • test_camera.py: Extensive updates removing .torch accessors (~50+ occurrences)
    • test_first_frame_textured_rendering.py: Removed .torch from camera output access
    • test_multi_tiled_camera.py: Same pattern
  3. test_scripts_torcharray_patterns.py updated: Docstrings now reflect that CameraData returns raw torch.Tensor, and the regex scanner excludes data.output[...] from ProxyArray lint checks.

  4. Contrib sensor updated: visuotactile_sensor.py removed .torch accessor from camera data access.

  5. Cleanup:

    • Removed test_physx_scene_data_provider_visualizer_contract.py (199 lines)
    • Removed changelog fragments (.skip and .rst files)
    • Minor formatting in interactive_scene.py (ClonePlan single-line)
    • Test simplification in test_interactive_scene.py (identity checks vs torch.testing.assert_close)

Assessment: The camera torch-native decision makes sense — cameras deal with image data that is naturally torch tensor (no warp kernel interaction), unlike other sensors that may benefit from ProxyArray's warp/torch dual representation.

✅ No new issues. Ready to merge once CI passes.


Update (e5d00a5): Reviewed incremental changes since 49adeff. Key updates in this push:

  • Kernel fusion: Merged depth-clipping and copy-to-image operations into single kernels (compute_distance_to_image_plane_to_image_masked_kernel, copy_float2d_to_image1_depth_clipped_masked_kernel) reducing kernel launch overhead
  • Code cleanup: Removed apply_depth_clipping_masked_kernel and fill_float2d_masked_kernel in favor of fused operations
  • Test cleanup: Removed unnecessary AppLauncher from tests that don't require it (part of merged #5646)
  • Linter fixes: Minor formatting/string improvements

No new concerns introduced. The architecture established in the initial review remains clean and the incremental changes improve performance and code quality.

Copy link
Copy Markdown

@StafaH StafaH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! Left some comments.

Overall there's alot of switching between torch and warp ops, which could be cleaned up to use warp only. Also we could use ProxyArrays in all the base classes to clean things up even more. Nothing critical for this PR though, can merge it in and clean up later

mesh_orientations: list[list[tuple[float, float, float, float]]] = []

# Pack all targets into the rectangular table used by the Warp kernel.
for env_id in range(self._num_envs):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be vectorized with numpy. Also seems the dummy record is only used to extend the list. Can we not allocate the array in advance?

Even better it seems you could populate it with a warp kernel, since the end result is a warp kernel below?

)
self._mesh_positions_w_torch = wp.to_torch(self._mesh_positions_w)
self._mesh_orientations_w_torch = wp.to_torch(self._mesh_orientations_w)
self._mesh_positions_w_torch[:] = torch.tensor(mesh_positions, dtype=torch.float32, device=self.device)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use proxy array here?

self._mesh_positions_w_torch[:] = torch.tensor(mesh_positions, dtype=torch.float32, device=self.device)
self._mesh_orientations_w_torch[:] = torch.tensor(mesh_orientations, dtype=torch.float32, device=self.device)

def _build_mesh_records(self, target_cfg: _RaycastTargetCfg, plan: ClonePlan | None, dummy_mesh_id: int | None):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use an alias for _RaycastTargetCfg?


# Prefer ClonePlan data for env-scoped targets; destination USD prims may not exist.
if plan is not None:
target_path = re.sub(r"env_\.\*", "env_0", target_cfg.prim_expr)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not use target_cfg.prim_expr.replace here?

meshes: ClassVar[dict[tuple[str, str], wp.Mesh]] = {}
"""A dictionary to store warp meshes for raycasting, shared across all instances.

The keys are ``(prim_path, device)`` tuples and values are the corresponding warp Mesh objects.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are warp meshes being created on both devices? Should it not just be on the device currently set globally?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand how original code was written this way, seems like its trying to make sure gpu to cross each other to raycast onto other devices mesh?


# read prims to ray-cast
for mesh_prim_path in self.cfg.mesh_prim_paths:
mesh_key = (mesh_prim_path, self._device)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment, seems _device will not change? Then we don't need to key with tuple


def _initialize_rays_impl(self):
# Create all indices buffer
self._ALL_INDICES = torch.arange(self._view.count, device=self._device, dtype=torch.long)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this does not need to be caps

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think _ALL_INDICES is a common thing many files, I agree, but I will rename every one of them in one go for consistency


logger = logging.getLogger(__name__)

_MeshRecord = tuple[int, tuple[float, float, float], tuple[float, float, float, float]]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In alot of places this tuple exists for vec3 and vec4/quat, maybe we can alias the type somewhere so it's a bit easier to read. If we convert more of the code to warp we can use wp.vec3 and wp.quat which would be alot cleaner

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thoughts , I removed these typing

def __str__(self) -> str:
"""Returns: A string containing information about the instance."""
return (
f"Ray-caster @ '{self.cfg.prim_path}': \n"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseRayCaster has the same str overload, can we just call that one?

@ooctipus ooctipus force-pushed the octi/raycaster-backend-split branch 4 times, most recently from 9afb19a to 9eaddcb Compare May 16, 2026 11:18
@ooctipus
Copy link
Copy Markdown
Collaborator Author

@greptile-apps review

@ooctipus ooctipus requested a review from StafaH May 16, 2026 11:45
Copy link
Copy Markdown

@StafaH StafaH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if isinstance(target, str):
target_cfg = cfg.RaycastTargetCfg(prim_expr=target, track_mesh_transforms=False)
else:
target_cfg = target
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow users to pass in the config? Why not allow only paths? The variable is called mesh_prim_paths

mesh = create_trimesh_from_geom_shape(mesh_prim)
mesh.apply_scale(sim_utils.resolve_prim_scale(mesh_prim))
relative_pos, relative_quat = sim_utils.resolve_prim_pose(mesh_prim, reference_prim)
relative_pos = torch.tensor(relative_pos, dtype=torch.float32)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably just use numpy, since we end up with warp arrays in the end, and all of this is being done on CPU

self._ray_distance_w = wp.zeros((self._view_count, self.num_rays), dtype=wp.float32, device=self._device)
if self.cfg.update_mesh_ids:
self._ray_mesh_id_w = wp.zeros((self._view_count, self.num_rays), dtype=wp.int16, device=self._device)
self._data.ray_mesh_ids = ProxyArray(self._ray_mesh_id_w)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere, we can just directly do ProxyArray(wp.empty(xxxx)), no need to store a second reference to the warp array, since the ProxyArray reference can be used in the same way.

self._ray_mesh_id_w = wp.empty((1, 1), dtype=wp.int16, device=self._device)
# Persistent dummy buffers for unused kernel outputs; allocated once to avoid per-step allocations.
self._dummy_normal_w = wp.empty((1, 1), dtype=wp.vec3, device=self._device)
self._dummy_face_id_w = wp.empty((1, 1), dtype=wp.int32, device=self._device)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For warp arrays let's suffix with _wp, since _w means world frame in many other parts of the repo


# Camera-specific bookkeeping buffers
self._frame_wp = wp.zeros(self._view.count, dtype=wp.int64, device=self._device)
self._frame = wp.to_torch(self._frame_wp)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another PR I changed this to be ProxyArray so depending on which ones gets merged first this might need to be updated

)

# World-frame ray buffers.
self._ray_starts_w = wp.zeros((self._view.count, self.num_rays), dtype=wp.vec3f, device=self._device)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere, use wp.empty instead of wp.zeros, it's much faster (zero-ing out an array takes some time). Only need wp.zeros when you specifically need a zero valued array

inputs=[env_mask, self.ray_hits_w.warp, self._ray_distance_cam_w],
device=self._device,
)
if return_normal:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine with above and pass return_normal as argument to the kernel?

self._newton_pose_w = wp.zeros(self._view_count, dtype=wp.transformf, device=self._device)
self._newton_pos_w = wp.zeros(self._view_count, dtype=wp.vec3f, device=self._device)
self._newton_quat_w = wp.zeros(self._view_count, dtype=wp.quatf, device=self._device)
self._newton_pos_w_proxy = ProxyArray(self._newton_pos_w)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine with above initialization? ProxyArray(wp.empty(...))

@ooctipus ooctipus force-pushed the octi/raycaster-backend-split branch from b894dc8 to 215bcc5 Compare May 17, 2026 02:33
@ooctipus ooctipus merged commit 0dd2412 into isaac-sim:develop May 17, 2026
38 checks passed
@github-project-automation github-project-automation Bot moved this from In review to Done in Isaac Lab May 17, 2026
@ooctipus ooctipus deleted the octi/raycaster-backend-split branch May 17, 2026 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants