Fix OVRTX renderer device mismatch on multi-GPU by fatimaanes · Pull Request #5594 · isaac-sim/IsaacLab

fatimaanes · 2026-05-12T18:00:51Z

Description

Fixes OVRTXRenderer crash on multi-GPU systems when sim.device is not cuda:0.

Root cause: A hardcoded DEVICE = "cuda:0" constant in ovrtx_renderer_kernels.py was imported and used for all Warp kernel launches and buffer allocations. Additionally, AttributeBinding.map() calls used device_id=0, pinning attribute mapping to GPU 0 regardless of the simulation device.

Fix:

Remove the DEVICE constant and use self._device (set from CameraRenderSpec.device) for all Warp operations (11 locations)
Add _device_id property to extract the CUDA device index from the device string
Pass device_id=self._device_id to AttributeBinding.map() calls (2 locations: object binding and camera binding)

Note on RenderVarOutput.map() calls: These remain unchanged (device=Device.CUDA only) because the OVRTX C API for render output mapping (ovrtx_map_output_description_t) does not accept a device_id parameter — the output is inherently mapped on whichever GPU OVRTX rendered on.

Total: 13 hardcoded GPU-0 references fixed (11 Warp + 2 AttributeBinding).

This is the same bug class fixed for NewtonRenderer in #5019 — OVRTX was not updated at that time.

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist

I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and added my name to the CONTRIBUTORS.md or my organization to the CONTRIBUTORS.md list

Replace hardcoded DEVICE = "cuda:0" in ovrtx_renderer with the actual sim device from CameraRenderSpec.device. All Warp kernel launches and buffer allocations now target the correct GPU when sim.device is not cuda:0 (e.g. distributed training).

greptile-apps · 2026-05-12T18:04:40Z

Greptile Summary

This PR fixes OVRTXRenderer crashes on multi-GPU systems by replacing a hardcoded DEVICE = "cuda:0" constant with self._device (derived from CameraRenderSpec.device) across all Warp kernel launches and OVRTX binding.map() calls.

Removes the DEVICE constant from ovrtx_renderer_kernels.py and adds a _device_id property to OVRTXRenderer that parses the integer CUDA index from the device string; all 11 kernel launches and 6 binding.map() calls are updated to use the correct device.
create_render_data now sets self._device = spec.device before any initialization, ensuring _setup_object_bindings and the returned OVRTXRenderData both target the right GPU.
The test file retains a local DEVICE = "cuda:0" constant for unit-test purposes, keeping existing tests unchanged.

Confidence Score: 5/5

Safe to merge — all hardcoded GPU-0 references are systematically replaced and the two issues flagged in the previous review round are fully addressed.

The change is a focused, mechanical substitution: every DEVICE / device_id=0 occurrence in the renderer is now driven by self._device / self._device_id. The _device_id property correctly handles both "cuda:N" and bare "cuda" forms, self._device is set from spec.device before any initialization runs, and the test suite retains its own local DEVICE = "cuda:0" so existing tests are unaffected. No new logic paths or data mutations are introduced.

No files require special attention.

Important Files Changed

Filename	Overview
source/isaaclab_ov/isaaclab_ov/renderers/ovrtx_renderer.py	Core fix: adds `_device_id` property, sets `self._device` from spec in `create_render_data`, and replaces all 17 hardcoded GPU-0 references with `self._device`/`self._device_id`.
source/isaaclab_ov/isaaclab_ov/renderers/ovrtx_renderer_kernels.py	Removes the `DEVICE = "cuda:0"` module-level constant; all kernel functions are unchanged and device-agnostic by design.
source/isaaclab_ov/test/test_ovrtx_renderer_kernels.py	Stops importing `DEVICE` from kernels; defines its own local `DEVICE = "cuda:0"` so unit tests remain pinned to cuda:0 without affecting production code.
source/isaaclab_ov/changelog.d/fix-ovrtx-device-mismatch.rst	New changelog entry accurately describing the multi-GPU device mismatch fix.
CONTRIBUTORS.md	Adds Fatima Anes in alphabetical order.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[OVRTXRenderer.__init__] -->|self._device = 'cuda:0'| B[Default device set]
    C[create_render_data spec] -->|self._device = spec.device| D[Device updated from spec]
    D --> E{_initialized_scene?}
    E -->|No| F[_initialize_from_spec]
    F --> G[_setup_object_bindings\nwp.array device=self._device]
    E -->|Yes| H[OVRTXRenderData spec, self._device]
    G --> H

    I[_device_id property] -->|self._device.split ':'| J[Returns int CUDA index]

    K[update_transforms] -->|object_binding.map device_id=self._device_id| L[wp.from_dlpack]
    L -->|wp.launch device=self._device| M[sync_newton_transforms_kernel]

    N[update_camera] -->|wp.zeros device=self._device| O[camera_transforms]
    O -->|wp.launch device=self._device| P[create_camera_transforms_kernel]
    P -->|camera_binding.map device_id=self._device_id| Q[wp.copy to binding]

    R[_process_render_frame] -->|LdrColor.map device_id=self._device_id| S[extract_rgba_tiles\ndevice=self._device]
    R -->|DepthSD.map device_id=self._device_id| T[extract_depth_tiles\ndevice=self._device]
    R -->|DiffuseAlbedoSD.map device_id=self._device_id| U[extract_rgba_tiles albedo\ndevice=self._device]
    R -->|SemanticSegmentation.map device_id=self._device_id| V[extract_rgba_tiles semantic\ndevice=self._device]

_{Reviews (2): Last reviewed commit: "Update kernel tests to use local device ..." | Re-trigger Greptile}

Route all OVRTX binding.map() calls through _device_id property so DLPack tensors are mapped on the correct CUDA device. Without this, update_transforms, update_camera, and render output mapping pin buffers to cuda:0 regardless of sim.device.

The DEVICE constant was removed from ovrtx_renderer_kernels in the previous commit. Define it locally in the test module instead.

fatimaanes · 2026-05-13T00:38:32Z

@greptile retrigger

isaaclab-review-bot

🤖 Isaac Lab Review Bot - Incremental Review

Reviewed commits: 044d6b5 → 980cd22

Summary of Changes

This update simplifies the multi-GPU device handling fix in _process_render_frame():

Change Details

Removed explicit device_id from .map() calls:

frame.render_vars["LdrColor"].map() — removed device_id=self._device_id
frame.render_vars[depth_var].map() — removed device_id=self._device_id
frame.render_vars["DiffuseAlbedoSD"].map() — removed device_id=self._device_id
frame.render_vars["SemanticSegmentation"].map() — removed device_id=self._device_id

Analysis

This is a refinement of the original multi-GPU fix. The binding.map() calls inside _process_render_frame() previously used hardcoded device_id=0. The initial fix added device_id=self._device_id, but this update removes the parameter entirely.

This simplification is likely intentional because:

The frame render variables already exist on the correct device from the renderer initialization
Omitting device_id lets OVRTX use its default device inference
The explicit device_id is still retained for _camera_binding.map() and _object_binding.map() in update_transforms() and update_camera() where it matters for data flow

Code Quality Assessment

The simplification is appropriate:

Removes unnecessary parameter where device context is already established
Keeps explicit device control where it matters (camera/object bindings)
Cleaner code with no functional regression expected

Verdict

The changes look good. This is a sensible simplification of the multi-GPU fix. No blocking concerns.

Last reviewed SHA: 980cd22469e7043e7f261fa314774b78098983c5

kellyguo11 · 2026-05-15T21:35:16Z

@fatimaanes seeing some failures in CI tests - https://github.com/isaac-sim/IsaacLab/actions/runs/25935794523/job/76246703492?pr=5594

Traceback (most recent call last):
  File "/workspace/isaaclab/source/isaaclab_ov/isaaclab_ov/renderers/ovrtx_renderer.py", line 632, in render
    self._process_render_frame(
  File "/workspace/isaaclab/source/isaaclab_ov/isaaclab_ov/renderers/ovrtx_renderer.py", line 572, in _process_render_frame
    with frame.render_vars["LdrColor"].map(device=Device.CUDA, device_id=self._device_id) as mapping:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: RenderVarOutput.map() got an unexpected keyword argument 'device_id'

RenderVarOutput.map() only accepts (device, sync_stream) per the OVRTX API (ovrtx_map_output_description_t has device_type + sync_stream). AttributeBinding.map() accepts (device, device_id) per ovrtx_mapping_desc_t. The previous commit incorrectly added device_id to both call types. This reverts the 4 RenderVarOutput.map() calls to their original form while keeping device_id on the 2 AttributeBinding.map() calls. Co-Authored-By: Fatima Anes <fanes@nvidia.com>

github-actions Bot added bug Something isn't working isaac-lab Related to Isaac Lab team labels May 12, 2026

fatimaanes requested a review from kellyguo11 May 12, 2026 18:03

pbarejko approved these changes May 13, 2026

View reviewed changes

Update kernel tests to use local device constant

8097348

The DEVICE constant was removed from ovrtx_renderer_kernels in the previous commit. Define it locally in the test module instead.

kellyguo11 added 4 commits May 13, 2026 12:07

Merge branch 'develop' into fatima/fix-ovrtx-device-mismatch

256b518

Merge branch 'develop' into fatima/fix-ovrtx-device-mismatch

963d305

Merge branch 'develop' into fatima/fix-ovrtx-device-mismatch

c700bc5

Merge branch 'develop' into fatima/fix-ovrtx-device-mismatch

044d6b5

isaaclab-review-bot Bot reviewed May 15, 2026

View reviewed changes

kellyguo11 added this to Isaac Lab May 15, 2026

kellyguo11 moved this to Ready to merge in Isaac Lab May 15, 2026

kellyguo11 merged commit 4737154 into isaac-sim:develop May 15, 2026
33 of 34 checks passed

github-project-automation Bot moved this from Ready to merge to Done in Isaac Lab May 15, 2026

isaaclab-review-bot Bot mentioned this pull request May 16, 2026

Actuators integration #5455

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix OVRTX renderer device mismatch on multi-GPU#5594

Fix OVRTX renderer device mismatch on multi-GPU#5594
kellyguo11 merged 8 commits into
isaac-sim:developfrom
fatimaanes:fatima/fix-ovrtx-device-mismatch

fatimaanes commented May 12, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 12, 2026 •

edited

Loading

Uh oh!

fatimaanes commented May 13, 2026

Uh oh!

isaaclab-review-bot Bot left a comment •

edited

Loading

Uh oh!

kellyguo11 commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fatimaanes commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist

Uh oh!

greptile-apps Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

fatimaanes commented May 13, 2026

Uh oh!

isaaclab-review-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

🤖 Isaac Lab Review Bot - Incremental Review

Summary of Changes

Change Details

Analysis

Code Quality Assessment

Verdict

Uh oh!

kellyguo11 commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fatimaanes commented May 12, 2026 •

edited

Loading

greptile-apps Bot commented May 12, 2026 •

edited

Loading

isaaclab-review-bot Bot left a comment •

edited

Loading