pref: Enable mgpu in FrameView#5514
Conversation
Greptile SummaryThis PR lifts the
Confidence Score: 5/5Safe to merge; the device-guard removal is well-scoped, the write-path refactor correctly eliminates the double PrepareForReuse, and the CI pre-flight ensures multi-GPU coverage cannot silently regress. The core change (passing self._device directly to SelectPrims) is minimal and the surrounding refactor is correctness-improving. The only outstanding note is a stale one-line docstring that still advertises the old device restriction; everything else in the diff is either a genuine fix or well-tested new coverage. The init docstring in fabric_frame_view.py still lists the old 'cpu' or 'cuda:0' restriction and should be updated to match the new behavior. Important Files Changed
Reviews (4): Last reviewed commit: "Address review feedback on the multi-GPU..." | Re-trigger Greptile |
a6cd73e to
2c619fe
Compare
There was a problem hiding this comment.
Review Summary
Clean, well-motivated PR that removes an artificial device restriction. The refactoring of set_world_poses and set_scales into a single _compose_fabric_transform helper is a nice improvement — it ensures PrepareForReuse is only called once per logical update regardless of which components are being set.
Strengths
- Single kernel launch for combined updates —
_sync_fabric_from_usd_oncenow does pos+orient+scale in one pass instead of two separate calls - assert → RuntimeError — Topology guard won't be optimized away with
python -O - Thorough multi-GPU test coverage with proper skip/fail semantics depending on CI vs local
- CI workflow integration — dedicated
test-fabric-multi-gpujob triggered by relevant file changes - sys.argv stripping — prevents Kit segfault from pytest flags
Minor Notes
See inline comments.
There was a problem hiding this comment.
🔍 Code Review for PR #5514 (Updated Review)
Incremental review: changes from c1254cd → 1c2e02d
✅ Updated Assessment
The PR has grown significantly since the last review, now including:
- OVPhysX RigidObject data layer (new
rigid_object_data.py) - PhysX Jacobian/mass matrix/gravity compensation properties
- OvPhysxManager device locking and reuse improvements
- IK/OSC task-space controller migration
🎯 Key Changes Since Last Review
1. OVPhysX RigidObject Implementation ✅
A comprehensive BaseRigidObjectData subclass for OVPhysX with:
- Root/body pose and velocity properties using
TimestampedBuffer - Proper COM→link frame transformations via Warp kernels
ProxyArraywrappers for consistent Torch/Warp interop- Deprecated state-concat properties preserved for backward compatibility
@property
def root_com_ang_vel_b(self) -> ProxyArray:
if self._root_com_ang_vel_b.timestamp < self._sim_timestamp:
wp.launch(shared_kernels.quat_apply_inverse_1D_kernel, ...)
self._root_com_ang_vel_b.timestamp = self._sim_timestamp
# ... lazy ProxyArray init2. OvPhysxManager Device Locking ✅
Process-global device lock prevents mid-session device switches:
_locked_device: ClassVar[str | None] = None
if cls._locked_device is not None and ovphysx_device != cls._locked_device:
raise RuntimeError(
f"OvPhysxManager is locked to device {cls._locked_device!r}..."
)The physx.reset() approach for soft-reset (vs dropping reference) avoids the Carbonite static destructor race.
3. PhysX Task-Space Properties ✅
New ArticulationData properties with proper refresh gating:
body_link_jacobian_w— COM→origin shift kernel appliedbody_com_jacobian_w— passthrough to PhysXmass_matrix— passthrough to PhysXgravity_compensation_forces— passthrough to PhysX
The shift_jacobian_com_to_origin kernel correctly handles the frame mismatch.
4. IK/OSC Test Coverage ✅
Comprehensive tests pinning cross-backend parity:
test_get_jacobians_link_origin_contract— J·q̇ ≈ body velocitytest_franka_ik_tracking_accuracy— IK converges to mm precisiontest_franka_osc_tracking_accuracy— OSC tracking with impedancetest_get_gravity_compensation_forces_static_equilibrium— τ_gc holds arm
💡 Observations
-
CPU Scene Config Fix: The
_configure_physx_scene_primnow correctly gates GPU-specific attributes ondevice == "gpu". Previously, CPU mode would set GPU broadphase/capacity attributes incorrectly. -
Tensor Types Extension: New
RIGID_BODY_*tensor types with propertry/exceptguards for optional wheel bindings (RIGID_BODY_ACCELERATION,RIGID_BODY_INV_MASS,RIGID_BODY_INV_INERTIA). -
Test Timeout Adjustment:
TIMEOUT_RETRIES = 0inconftest.pyand new timeout fortest_contact_sensor.py— reasonable CI tuning.
🔒 Safety Checklist
- Device lock prevents silent corruption from device mismatch
- Soft-reset avoids Carbonite destructor race
- New properties gated by timestamp for lazy refresh
- Write paths invalidate dependent caches
- Tests cover both backends where applicable
📋 Summary
| Aspect | Status |
|---|---|
| OVPhysX RigidObject | ✅ Complete |
| PhysX Task-Space | ✅ Complete |
| Device Locking | ✅ Correct |
| Test Coverage | ✅ Thorough |
| Breaking Changes | ✅ None |
Recommendation: PR is ready for merge. Changes since last review are substantial but well-tested.
Update (8de9a39d): Significant additional changes since last review:
🆕 New Features
-
URDF Converter Refactor — New
UrdfConverterCfgwith robot type support, asset transformation, multi-physics conversion, and debug mode. Oldurdf_utils.py(fixed joint merging) removed. -
RT2 Path Tracing Settings —
RenderCfggainsmax_bounces,split_glass,split_clearcoat,split_rough_reflection,ambient_light_intensity,ambient_occlusion_denoiser_mode,view_tile_limitfor advanced rendering control. -
Warp Math Kernels — New
warp_math.pywith GPU-native camera orientation conversion and depth clamping kernels (avoids torch round-trips on hot path). -
ProxyArray Attribute Forwarding —
__getattr__bridge allows tensor methods (float(),clone(),cpu()) on ProxyArray with deprecation warning. -
Stacked Image Observations — New
stacked_imageMDP term with ring-buffer frame stacking for visual RL (tested via mock camera). -
Install CLI Refactor —
setup.pyextras restructured; new entry points (isaaclab,play,train); comprehensive install CI tests for uv/conda workflows.
🔧 Refactoring
SceneDataProvidermoved fromisaaclab.physicstoisaaclab.scene.scene_data_providerSimulationContext.update_scene_data_provider()removed; physics manager now handles forward calls- Import cleanup:
from isaaclab.utils.configclass import configclass(explicit module path) - Math safety:
quat_from_matrixandcreate_rotation_matrix_from_viewreturn NaN for invalid/singular inputs
✅ Test Coverage
- Heterogeneous cloned scene test for multi-mesh ray caster camera
- Install workflow tests (uv, conda, none/newton/rl/visualizer/mimic extras)
- Stacked image observation unit tests
- CI startup time limit adjusted for GitHub Actions
No blocking issues found. PR continues to be well-tested and ready for merge.
ada2608 to
c1254cd
Compare
🤖 Isaac Lab Review Bot — Follow-up Review (c1254cd)All previously raised concerns have been addressed ✅ Resolved Issues from Previous Review:
Summary of Additional Changes:Version bumps:
Dependency updates:
Bug fixes:
New features:
|
c1254cd to
1c2e02d
Compare
Remove _fabric_supported_devices allowlist and the device fallback check. USDRT SelectPrims supports any CUDA device index, so Fabric acceleration now runs on cuda:0, cuda:1, etc. This unblocks distributed training where each rank is pinned to a non-primary GPU. - Remove _fabric_supported_devices constant and device check in __init__ - Remove assertion in _initialize_fabric - Update docstrings for multi-GPU support - Add cuda:1 test cases (skipped on single-GPU runners) - Add sys.argv strip to prevent Kit segfault from pytest flags - Add dedicated multi-GPU CI workflow
1c2e02d to
8de9a39
Compare
Description
Removes the
cuda:0-only restriction inFabricFrameView. USDRTSelectPrimsnow accepts any CUDA device index, so Fabric acceleration runs on the simulation device (e.g.,cuda:1) instead of silently falling back to the slower USD path. This unblocks distributed training where each process is pinned to a specific GPU.The user-facing surface change is small (drops the device guard in
__init__, the assertion in_initialize_fabric, and the_fabric_supported_devicesallowlist). The remainder of the diff comprises:set_world_poses,set_scales, and the initial USD→Fabric sync now share a single_compose_fabric_transformhelper. The sync collapses to one combined kernel launch with onePrepareForReuse, eliminating the previous "doublePrepareForReuse" code smell where a non-idempotent second call could mask a topology-change signal._rebuild_fabric_arraysnow raisesRuntimeErrorinstead of usingassert, so the check survivespython -O. Previously, an-Orun with a real prim-count mismatch would silently dispatch with a stale_view_to_fabricmapping and produce wrong poses or out-of-bounds kernel indices.cuda:1-parameterized tests guarded by a newmulti_gpupytest marker (registered inpyproject.toml), plus a dedicated CI job on the existing multi-GPU runner so regressions show up automatically on PRs that touchFabricFrameViewor its test file.The skip-vs-fail logic in
_skip_if_unavailableis intentional and works in concert with the CI workflow:cuda:1testspytest.skipso local runs stay green.nvidia-smi+torch.cuda.device_count()check at the top oftest-fabric-multi-gpu. If the runner regresses to a single GPU, the pre-flight emits::error::and exits non-zero before pytest is even invoked, so a misconfigured runner cannot silently green-light a PR.The test helper used to special-case
GITHUB_ACTIONS=trueand callpytest.fail, but that branch was removed: the workflow pre-flight catches the same failure mode without coupling the test file toos.environ.Type of change
cuda:0continues to work exactly as before;cuda:1+ now also works instead of silently falling back to USD. No public API surface changed.Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists thereTest plan
Three new tests, all marked
@pytest.mark.multi_gpuand parameterized with["cuda:1"]:test_fabric_cuda1_world_pose_roundtrip—set_world_poses→get_world_posesreturns the same values on a non-primary CUDA device.test_fabric_cuda1_no_usd_writeback— Fabric writes oncuda:1do not write back to USD (atol=0.0— equality, not approximate).test_fabric_cuda1_scales_roundtrip— covers theset_scaleswrite path oncuda:1, since both Fabric write paths now run onself._device.A new CI job,
test-fabric-multi-gpu, runs in.github/workflows/test-multi-gpu.yamlon the existing[self-hosted, linux, x64, gpu, multi-gpu]runner. The job pre-flights withnvidia-smiand./isaaclab.sh -p -c "import torch; print(torch.cuda.device_count())", and fails loudly with::error::if the runner regresses to a single GPU. Triggered automatically on PRs that touchsource/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.pyor its test file.To verify locally on a multi-GPU machine:
./isaaclab.sh -p -m pytest -m multi_gpu \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -vTo verify the
cuda:0path is unchanged:./isaaclab.sh -p -m pytest -m "not multi_gpu" \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v