Skip to content

[BUG FIX] vis/rasterizer: defer OffscreenRenderer creation until first render#31

Open
gpinkert wants to merge 2 commits intoamd-integrationfrom
lazy-offscreen-renderer
Open

[BUG FIX] vis/rasterizer: defer OffscreenRenderer creation until first render#31
gpinkert wants to merge 2 commits intoamd-integrationfrom
lazy-offscreen-renderer

Conversation

@gpinkert
Copy link
Copy Markdown

@gpinkert gpinkert commented Apr 25, 2026

Previously, Rasterizer.build() eagerly constructed a pyrender.OffscreenRenderer whenever no interactive viewer was active, which is the path taken by every scene.build() call with show_viewer=False. Constructing the OffscreenRenderer immediately calls EGLPlatform.init_context() (genesis/ext/pyrender/platforms/egl.py), which in turn calls eglInitialize on a Mesa/radeonsi display.

This is fine when a single test process drives the GPU sequentially, but it has two problems on the AMD/ROCm test setup:

  1. It creates an EGL/GL context for every scene.build(), even when the test never renders. The vast majority of the rigid-physics test suite never instantiates a camera or calls scene.render() / cam.render(), so the GL context, FBO, depth buffer, and shader compiles are pure per-test overhead.

  2. When the test runner uses pytest-xdist with multiple workers on a single AMD GPU, the concurrent eglInitialize / radeonsi context creations contend on the driver and reliably fail with:

    radeonsi: error: can't create eop_bug_scratch
    radeonsi: error: Failed to create a context.
    

    followed by a SIGSEGV inside eglInitialize (egl.py:223). The crash manifests under any combination of -n N and -n N --forked because pytest-forked runs each test inside a fresh fork that re-enters scene.build() and races against the other workers' GL initialisations.

The fix moves the OffscreenRenderer construction out of build() and into a new _ensure_renderer() helper that is invoked on the first render_camera() call. With this change:

  • Tests that never render (the dominant case) never touch EGL, so they cannot hit the radeonsi context-creation race. This unblocks pytest -n N parallel execution on a single AMD GPU.
  • Tests that do render are unaffected behaviourally: the same OffscreenRenderer is created the first time render_camera() runs, and reused on subsequent calls. make_current() / make_uncurrent(), resize handling, and destroy() all already null-check self._renderer, so no further refactor was required.

The visualizer/rasterizer wiring is otherwise unchanged: cameras are still added in add_camera(), pyrender.Renderer (per-camera target) is still allocated up-front (it only touches the JIT context, not GL), and destroy() continues to clean up whichever renderer was actually materialised.

Result: pytest tests/test_rigid_physics.py -n 8 -m required now runs to completion on a single-GPU AMD/ROCm host. End-to-end wall-time on the required set drops from ~30 minutes (sequential -n 0 --forked) to roughly 8 minutes, with the speedup limited by per-process kernel compilation rather than EGL or GPU contention.

…t render

Previously, `Rasterizer.build()` eagerly constructed a
`pyrender.OffscreenRenderer` whenever no interactive viewer was active,
which is the path taken by every `scene.build()` call with
`show_viewer=False`. Constructing the OffscreenRenderer immediately calls
`EGLPlatform.init_context()` (genesis/ext/pyrender/platforms/egl.py), which
in turn calls `eglInitialize` on a Mesa/`radeonsi` display.

This is fine when a single test process drives the GPU sequentially, but
it has two problems on the AMD/ROCm test setup:

1. It creates an EGL/GL context for *every* `scene.build()`, even when the
   test never renders. The vast majority of the rigid-physics test suite
   never instantiates a camera or calls `scene.render()` / `cam.render()`,
   so the GL context, FBO, depth buffer, and shader compiles are pure
   per-test overhead.

2. When the test runner uses `pytest-xdist` with multiple workers on a
   single AMD GPU, the concurrent `eglInitialize` / `radeonsi` context
   creations contend on the driver and reliably fail with:

       radeonsi: error: can't create eop_bug_scratch
       radeonsi: error: Failed to create a context.

   followed by a SIGSEGV inside `eglInitialize` (egl.py:223). The crash
   manifests under any combination of `-n N` and `-n N --forked` because
   `pytest-forked` runs each test inside a fresh fork that re-enters
   `scene.build()` and races against the other workers' GL initialisations.

The fix moves the OffscreenRenderer construction out of `build()` and
into a new `_ensure_renderer()` helper that is invoked on the first
`render_camera()` call. With this change:

- Tests that never render (the dominant case) never touch EGL, so they
  cannot hit the `radeonsi` context-creation race. This unblocks
  `pytest -n N` parallel execution on a single AMD GPU.
- Tests that do render are unaffected behaviourally: the same
  `OffscreenRenderer` is created the first time `render_camera()` runs,
  and reused on subsequent calls. `make_current()` / `make_uncurrent()`,
  resize handling, and `destroy()` all already null-check `self._renderer`,
  so no further refactor was required.

The visualizer/rasterizer wiring is otherwise unchanged: cameras are
still added in `add_camera()`, `pyrender.Renderer` (per-camera target) is
still allocated up-front (it only touches the JIT context, not GL), and
`destroy()` continues to clean up whichever renderer was actually
materialised.

Result: `pytest tests/test_rigid_physics.py -n 8 -m required` now runs
to completion on a single-GPU AMD/ROCm host. End-to-end wall-time on the
required set drops from ~30 minutes (sequential `-n 0 --forked`) to
roughly 8 minutes, with the speedup limited by per-process kernel
compilation rather than EGL or GPU contention.
The CI tester scripts that invoke the Genesis test suite live in another
repo and pass `-n 0` (sequential) explicitly. Now that EGL is initialised
lazily and concurrent `scene.build()` calls no longer race on `radeonsi`
context creation (see prior commit), it is safe — and substantially
faster — to run the suite in parallel on a single AMD GPU.

This change rewrites `-n 0` to `-n 8` from inside the existing
`pytest_cmdline_main` hook so the parallel default takes effect without
any modification to the external tester scripts. The override is gated
on:

  * `os.path.exists("/dev/kfd")` — the AMDGPU kernel driver device file,
    so non-AMD hosts (NVIDIA, Apple, CPU-only) keep the user-supplied
    value.
  * `not show_viewer` — the immediately-preceding block already pins
    `numprocesses` to 0 when the interactive viewer is requested, and
    that decision must win.
  * `config.option.numprocesses == 0` — explicit `-n N` for any non-zero
    `N` is preserved verbatim, so debugging with `-n 1` or experimenting
    with other worker counts still works.

`pytest-xdist` is already installed in `genesis:amd-integration` (via
the `[dev]` extras pulled in by `Dockerfile.rocm`), and the existing
`-n 0` invocations already depend on it being present, so no packaging
changes are needed for this hook to take effect.

Net effect: `pytest -n 0 --forked` issued from the upstream tester
scripts now runs with eight workers on AMD/ROCm, dropping the
required-test wall-time on `tests/test_rigid_physics.py` from roughly
30 minutes to under 10 minutes on a single-GPU host.
@yaoliu13
Copy link
Copy Markdown
Collaborator

/run-ci

Copy link
Copy Markdown
Collaborator

@yaoliu13 yaoliu13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is not based on the latest amd-integration: lazy-offscreen-renderer...ROCm:Genesis:amd-integration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants