Rendering correctness test determinism#5353
Conversation
…nistic initial poses
Greptile SummaryThis PR improves rendering correctness test determinism by removing the per-fixture Confidence Score: 5/5Safe to merge — changes are test-only and tighten correctness thresholds under a more deterministic baseline. All findings are P2 style suggestions. The core logic change (removing seed/reset calls and centralising thresholds) is sound and well-motivated. Golden images have been regenerated. No production code is affected. No files require special attention; all changes are confined to the test suite and its golden images. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Test fixture created\ne.g. cartpole_env] --> B[env constructed\nno seed / no reset]
B --> C[Camera.data.output accessed\nbuffers updated on access]
C --> D[_validate_camera_outputs called\nwith env_name threshold lookup]
D --> E{env_name in\n_MAX_DIFFERENT_PIXELS_PERCENTAGE_BY_ENV_NAME?}
E -- Yes --> F[Compare against golden image\nusing per-env threshold]
E -- No --> G[KeyError at runtime]
F --> H{diff_pct <= threshold?}
H -- Pass --> I[Test passes]
H -- Fail --> J[pytest.fail with diff details]
Reviews (2): Last reviewed commit: "smaller (stricter) threshold" | Re-trigger Greptile |
| camera_outputs_nested_dict = _collect_camera_outputs(env) | ||
| num_camera_outputs = len(camera_outputs_nested_dict) |
There was a problem hiding this comment.
Camera data may be empty without a prior reset
gym.make() constructs the environment but, per the Gymnasium contract, sensor/observation data is not guaranteed to be populated until the first reset() call. If Isaac Sim defers filling camera buffers to reset(), then _collect_camera_outputs will return an empty dict and the assertion on line 794 (assert num_camera_outputs == 1) will fail with a confusing message rather than indicating the root cause.
The same pattern was removed from the other three env fixtures (shadow_hand_env, cartpole_env, dexsuite_kuka_allegro_lift_env). If those fixtures also rely on camera data being populated at construction time, any future environment that defers rendering until reset() will silently break here.
There was a problem hiding this comment.
this is fine. we will check whether camera outputs exist and contain non-zero pixels inside _validate_camera_outputs.
There was a problem hiding this comment.
Camera output buffers will be updated on invocation of Camera.data():
@property
def data(self) -> CameraData:
# update sensors if needed
self._update_outdated_buffers()
# return the data
return self._dataThere was a problem hiding this comment.
Got it, thanks for clarifying! I see now that _validate_camera_outputs has the assertion assert len(camera_outputs) > 0 plus the per-tensor non-zero check, so empty or blank outputs will fail clearly rather than silently passing. That addresses my concern. 👍
There was a problem hiding this comment.
Thank you for the clarification! You're right — the Camera.data property's lazy update via _update_outdated_buffers() ensures buffers are populated on access, making the explicit reset() unnecessary for camera data retrieval in this context.
This is a nice pattern that decouples buffer population from the reset lifecycle. My concern about the Gymnasium contract was misplaced here since Isaac Lab's camera sensor implementation handles this internally. 👍
55d512c to
c308e1b
Compare
| # needs to be large enough to tolerate minor rendering noise while small enough to catch unexpected changes. | ||
| _MAX_DIFFERENT_PIXELS_PERCENTAGE_BY_ENV_NAME = { | ||
| "cartpole": 1.0, | ||
| "shadow_hand": 3.5, |
There was a problem hiding this comment.
it was 8.0 before
| # The value is set case by case based on the screen space taken up by the env in camera output images. It | ||
| # needs to be large enough to tolerate minor rendering noise while small enough to catch unexpected changes. | ||
| _MAX_DIFFERENT_PIXELS_PERCENTAGE_BY_ENV_NAME = { | ||
| "cartpole": 1.0, |
There was a problem hiding this comment.
it was 2.0 before
| _MAX_DIFFERENT_PIXELS_PERCENTAGE_BY_ENV_NAME = { | ||
| "cartpole": 1.0, | ||
| "shadow_hand": 3.5, | ||
| "dexsuite_kuka": 4.5, |
There was a problem hiding this comment.
it was 10.0 before
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR removes env.reset() calls from rendering correctness tests to eliminate non-deterministic initial articulation poses. Instead, camera buffers are filled via the lazy _update_outdated_buffers() mechanism when accessing camera.data. This allows tighter pixel difference thresholds (cartpole: 2.0→1.0, shadow_hand: 8.0→3.0, dexsuite_kuka: 10.0→4.0). The implementation is correct and well-reasoned.
Architecture Impact
Changes are self-contained to the test file. No cross-module impact — the _MAX_DIFFERENT_PIXELS_PERCENTAGE_BY_ENV_NAME dictionary centralizes thresholds and is only consumed within this test module.
Implementation Verdict
Ship it — Clean implementation with good code organization. The approach of relying on default articulation poses instead of randomized reset states is sound for rendering correctness tests.
Test Coverage
This PR is the test. The changes improve test reliability by removing a source of non-determinism. The lowered thresholds make the tests more sensitive to actual rendering regressions while the removal of env.reset() eliminates false positives from initial pose variance. The 71 updated golden images reflect the new deterministic default poses.
CI Status
CI shows 3 test failures, but none are caused by this PR:
Isaac-Cartpole-RGB-TheiaTiny-v0failures intest_environments.pyandtest_environments_with_stage_in_memory.py— these are Theia vision model tests, unrelated to rendering correctness tests- Training benchmark failures (duration/reward thresholds) — flaky performance benchmarks, not related to this PR
The rendering correctness tests themselves (shard 3/3) passed ✅
Findings
🔵 Improvement: test_rendering_correctness.py:775-777 — Consider documenting the env_name mapping rationale
The _RENDER_CORRECTNESS_TASK_IDS list now includes env_name mappings (e.g., Isaac-Repose-Cube-Shadow-Vision-Direct-v0 maps to shadow_hand). A brief comment explaining why Shadow Hand is used for the Repose-Cube task would help future maintainers understand this is based on screen space coverage, not task similarity.
Overall this is a well-thought-out improvement that makes the tests more deterministic and the thresholds more meaningful. The code organization with the centralized threshold dictionary is cleaner than the previous inline values.
# Description
For the rendering correctness test, we don't really need the
`env.reset()` call to fill the camera output buffers. Instead, if we
remove `env.reset()`, the camera output buffers will be filled on the
invocation of camera.data:
```python
@Property
def data(self) -> CameraData:
# update sensors if needed
self._update_outdated_buffers()
# return the data
return self._data
```
This means we can remove `env.reset()` calls in the rendering
correctness test to avoid non-deterministic initial pose. Articulation
bodies will be at their default pose for rendering for all combos. With
this removal I can set the max pixel diff threshold to smaller
(stricter) values:
```python
"cartpole": 1.0, # decreased from 2.0
"shadow_hand": 3.0, # decreased from 8.0
"dexsuite_kuka": 4.0, # decreased from 10.0
```
The test will become more sensitive to capture rendering changes but
hopefully it can still tolerate minor pixel noise.
Fixes # (issue)
<!-- As a practice, it is recommended to open an issue to have
discussions on the proposed pull request.
This makes it easier for the community to keep track of what is being
developed or added, and if a given feature
is demanded by more than one party. -->
## Type of change
- Test-only change
## Checklist
- [x] I have read and understood the [contribution
guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html)
- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [ ] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it
For example,
- [x] I have done this task
- [ ] I have not done this task
-->
Description
For the rendering correctness test, we don't really need the
env.reset()call to fill the camera output buffers. Instead, if we removeenv.reset(), the camera output buffers will be filled on the invocation of camera.data:This means we can remove
env.reset()calls in the rendering correctness test to avoid non-deterministic initial pose. Articulation bodies will be at their default pose for rendering for all combos. With this removal I can set the max pixel diff threshold to smaller (stricter) values:The test will become more sensitive to capture rendering changes but hopefully it can still tolerate minor pixel noise.
Fixes # (issue)
Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there