Fix silent ABI mismatch in cubric Python shim#5444
Conversation
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR fixes a silent ABI mismatch in the cubric Python ctypes shim where v0.1 vtable offsets were being used against v0.2 Kit builds, causing compute() calls to land on unbind() instead. The fix updates the offsets, requests v0.2 from carb, and adds a runtime version verification step that refuses to proceed on mismatch. The approach is sound and addresses a real silent-failure bug.
Architecture Impact
Self-contained to _cubric.py. The module is an internal implementation detail of isaaclab_newton.physics — callers already handle the fallback path when CubricBindings.initialize() returns False. No public API changes. The new runtime check adds a safe-fail path that didn't exist before.
Implementation Verdict
Minor fixes needed — the core logic is correct, but there are a few edge cases and clarity improvements worth addressing.
Test Coverage
The PR author explicitly notes testing is N/A due to the difficulty of spoofing Kit versions. This is a reasonable judgment call for a ctypes shim that depends on native carb infrastructure. The runtime verification is effectively a "test" that runs in production. However, the lack of any unit test for the version parsing logic in _verify_iadapter_version is a minor gap — the string/offset parsing could be tested with mock memory layouts.
CI Status
No CI checks available yet — cannot assess pass/fail status.
Findings
🟡 Warning: _cubric.py:259 — interface_count is read as u64 but PluginDesc::interfaceCount is likely size_t or uint32_t
The code reads interface_count as a full 64-bit value at offset 48:
interface_count = _read_u64(plugin_desc_ptr + _PD_OFF_INTERFACE_COUNT)If the native struct packs interfaceCount as a 32-bit value followed by another 32-bit field, you'll read garbage into the high bits. On little-endian x86_64 this may work by accident if the next 4 bytes are zero, but it's fragile. Consider reading as c_uint64 only if you've verified the struct layout, or use c_size_t which matches platform conventions.
🟡 Warning: _cubric.py:263-273 — Loop doesn't handle potential out-of-bounds read if interface_count is corrupted
The loop trusts interface_count from memory. If the plugin descriptor is malformed or the offset constants are wrong, this could iterate far beyond valid memory. Consider adding a sanity cap:
for i in range(min(interface_count, 64)): # reasonable upper bound🔵 Improvement: _cubric.py:269 — ctypes.string_at() without a max length is risky
if ctypes.string_at(name_addr) != b"omni::cubric::IAdapter":If name_addr points to invalid/non-null-terminated memory, this will read until it finds a null byte or crashes. Consider using ctypes.string_at(name_addr, 256) with a reasonable max length, then strip nulls.
🔵 Improvement: _cubric.py:241 — Consider logging the actual acquired version on success
The info log says "cubric IAdapter bindings ready" but doesn't confirm which version was validated. For debugging purposes:
logger.info("cubric IAdapter v%d.%d bindings ready", _IA_EXPECTED_MAJOR, _IA_EXPECTED_MINOR)🔵 Improvement: _cubric.py:44-54 — The v0.2 layout comment is helpful but could note what changed from v0.1
The comment documents v0.2 layout but doesn't explicitly show what v0.1 was (the diff shows bindToStageWithListener was inserted at offset 48). A one-line note like # v0.2 added bindToStageWithListener at offset 48, shifting unbind and compute would help future maintainers understand the ABI break.
🔵 Improvement: _cubric.py:246-248 — _verify_iadapter_version could benefit from a docstring explaining the "why"
The method has a one-liner docstring but the critical context — that carb's 0.x negotiation is permissive and emits only stderr warnings — is in a comment at line 58. Moving that rationale into the docstring would help future readers understand why this verification exists.
Greptile SummaryThis PR fixes a silent ABI mismatch in the
Confidence Score: 3/5Safe to merge for the core bug fix, but the new verification path has a potential crash on struct layout mismatch that should be guarded. The vtable offset correction (56→64) and version bump are clearly correct and the fallback-to-CPU strategy is safe. However, source/isaaclab_newton/isaaclab_newton/physics/_cubric.py — specifically Important Files Changed
Sequence DiagramsequenceDiagram
participant C as CubricBindings.initialize()
participant FW as carb Framework (libcarb.so)
participant IA as omni::cubric::IAdapter
C->>FW: acquireFramework("isaaclab.cubric", v0.0)
FW-->>C: fw_ptr
C->>FW: tryAcquireInterfaceWithClient(offset 24)<br/>desc={name="omni::cubric::IAdapter", v0.2}
FW-->>C: ia_ptr (or null → CPU fallback)
C->>C: _verify_iadapter_version(fw_ptr, ia_ptr)
C->>FW: getInterfacePluginDesc(offset 96, ia_ptr)
FW-->>C: plugin_desc_ptr
C->>C: parse PluginDesc.interfaces[]<br/>find IAdapter entry, check version == (0,2)
alt version mismatch or null
C-->>C: log warning, return False → CPU fallback
else version OK
C->>IA: read vtable fn-ptrs at offsets 8,32,40,64
C-->>C: bind _create_fn, _release_fn, _bind_fn, _compute_fn
Note over C,IA: compute() now calls offset 64 (v0.2)<br/>previously called offset 56 (unbind in v0.2)
end
Reviews (1): Last reviewed commit: "OMPE-91419: Fix silent ABI mismatch in c..." | Re-trigger Greptile |
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
All previous findings have been addressed in this revision.
Previous Concerns Status
- ✅ interface_count sanity cap: Added check
if interface_count > 64with appropriate warning - ✅ string_at length limit: Now uses
ctypes.string_at(name_addr, len(target_name))with explicit length - ✅ Docstring improvement:
_verify_iadapter_versionnow has a proper docstring explaining the carb 0.x permissive negotiation rationale - ✅ v0.2 layout comment: Full vtable layout is now documented with clear offset annotations
- ✅ Version logging: The info message already includes context; the warning messages now explicitly log version numbers
Implementation Verdict
Ship it — the code is defensive, well-documented, and handles the edge cases appropriately.
aa30d6b to
4cfff73
Compare
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
All previous findings were addressed in the prior commit. The new revision (4cfff73) contains no additional code changes beyond what was already reviewed — the diff is identical to the previously approved state.
Implementation Verdict
Ship it — the ABI mismatch fix is correct, defensive, and well-documented. The version verification logic properly guards against silent vtable miscalls.
CI Status
The "Check for Broken Links" failure appears unrelated to this PR's changes (no documentation links were modified in the cubric shim). All code-related checks (pre-commit, Build Wheel) pass.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
The new commit (71b6fbf) is identical to the previously reviewed state — no code changes since the last approval. All previous findings remain addressed.
Implementation Verdict
Ship it — the ABI mismatch fix is correct and the version verification guards against the silent vtable miscall failure mode.
CI Status
"Check for Broken Links" failure is unrelated to this PR (no docs modified). Code checks pass.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
The new commit (0c82fe3) is identical to the previously reviewed state — no code changes since the last review. The ABI mismatch fix remains correct with proper v0.2 vtable offsets and runtime version verification.
Implementation Verdict
Ship it.
CI Status
"Check for Broken Links" failure is unrelated to this PR (documentation link checker, no docs modified in this change).
|
kit will be un-breaking the semver minor version compatibility contract violation upstream, so the breaking ABI change will never actually make it into a kit release. Reverted our shim back to requesting v0.1, relaxed the warning to use the logger instead, but kept the validation code as a safeguard. |
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
The new commit (b827264) reverts the fix back to v0.1 offsets while keeping the version verification infrastructure. This contradicts the PR description which claims to update offsets to v0.2.
Implementation Verdict
Needs rework — critical inconsistency between PR description and code.
Findings
🔴 Critical: _cubric.py:47-64 — PR description/code mismatch
The PR description states:
"Update offsets to the v0.2 layout, request v0.2 from the framework"
But the code pins to v0.1:
_IA_EXPECTED_MAJOR = 0
_IA_EXPECTED_MINOR = 1 # Should be 2 per PR descriptionAnd the vtable comment says "v0.1 layout" (line 47). If v0.2 has different offsets (which the PR title implies), these offsets are wrong and compute calls will still land on the wrong vtable slot.
Either:
- The PR description is wrong and this is just adding version verification for v0.1, or
- The code is wrong and should have
_IA_EXPECTED_MINOR = 2with updated offsets
Please clarify and fix the inconsistency.
|
@isaaclab-review-bot updated the PR description |
The _cubric.py ctypes shim was pinned to IAdapter v0.1 vtable offsets, but newer Kit builds ship v0.2 — compute calls were silently landing on unbind, disabling cubric's GPU transform hierarchy propagation. carb accepts the version mismatch with only a stderr warning. Update offsets to the v0.2 layout, request v0.2 from the framework, and add a runtime InterfaceDesc check that refuses to acquire on any unexpected version.
- Capped interface_count at 64 as a sanity check. - Bound string_at read length to expected value. - Clarified some comments.
…n higher minor version mismatch, fall back to CPU (non-cubric) path on lower minor or major version mismatch.
…stead of standard IsaacLab logger
…to logger - kit is un-breaking the semver compatibility violation upstream.
da9a41b to
0e5ef21
Compare
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
The latest commit (0e5ef21) adds a changelog fragment (jmart-cubric-abi.rst). The code in _cubric.py is unchanged since the v0.1 revert (6272ce2).
Previous Concern Resolved
The v0.1/v0.2 inconsistency raised in the prior review is now addressed by the updated PR description, which explains that the Kit team is fixing the ABI-breaking semver violation upstream, so Isaac Lab remains pinned to v0.1 while keeping the runtime version verification as defense-in-depth. This is the correct decision — the validation code will catch any future ABI shifts without unnecessarily targeting a version that won't ship.
Final Review
Changelog (new file): Well-written, concise, accurately describes the change as adding runtime verification with CPU path fallback.
Code (_cubric.py):
- ✅ Version constants pinned to v0.1 (correct per updated scope)
- ✅
_verify_iadapter_versionproperly validates viagetInterfacePluginDesc - ✅ Interface count sanity-capped at 64
- ✅
string_atbounded with explicit length - ✅ Graceful fallback to
update_world_xforms()on any mismatch - ✅ Warning on newer minor version (semver-compatible, but flags potential vtable drift)
- ✅ All null-pointer paths handled with appropriate logging
Implementation Verdict
Ship it. The fix is defensive, well-documented, and correctly scoped. The validation infrastructure will serve as an early-warning system against future ABI breaks without introducing unnecessary risk.
Update (0b790cd): Merge commit from develop. The only change to _cubric.py is a minor cleanup: the interface acquisition now uses b"isaaclab.cubric" as the client name instead of trying multiple strategies with carb.scripting-python.plugin and None. This is a reasonable simplification — the previous code was defensive but verbose. All other files in this commit are unrelated changes from the develop branch merge. ✅ Ship it.
# Description Background: The _cubric.py ctypes shim was pinned to IAdapter v0.1 vtable offsets, but newer Kit builds ship v0.2 — compute calls were silently landing on unbind, disabling cubric's GPU transform hierarchy propagation. carb accepts the version mismatch with only a stderr warning. Originally, this change updated offsets to the v0.2 layout, requested v0.2 from the framework, and added a runtime InterfaceDesc check that refused to acquire on any unexpected version. The kit team is fixing the ABI-breaking semver contract violation upstream, so it won't actually make it into a release. So the pinned version in Isaac Lab remains on v0.1 but keeps the validation code as a safety net. This problem will go away once we have official python bindings for cubric in a future kit release. If usdrt eventually exposes the required `eRigidBody` options via the `IFabricHierarchy` API then that would massively simplify the implementation of newton manager. Will pursue a feature request. ## Type of change - Bug fix (non-breaking change which fixes an issue) ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation *(N/A)* - [ ] My changes generate no new warnings *(New warnings on ABI mismatch are intentional)* - [ ] I have added tests that prove my fix is effective or that my feature works *(N/A - spoofing kit versions for unit test would be non-trivial; verified manually)* - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Co-authored-by: Kelly Guo <kellyg@nvidia.com>
Description
Background: The _cubric.py ctypes shim was pinned to IAdapter v0.1 vtable offsets, but newer Kit builds ship v0.2 — compute calls were silently landing on unbind, disabling cubric's GPU transform hierarchy propagation. carb accepts the version mismatch with only a stderr warning.
Originally, this change updated offsets to the v0.2 layout, requested v0.2 from the framework, and added a runtime InterfaceDesc check that refused to acquire on any unexpected version.
The kit team is fixing the ABI-breaking semver contract violation upstream, so it won't actually make it into a release. So the pinned version in Isaac Lab remains on v0.1 but keeps the validation code as a safety net.
This problem will go away once we have official python bindings for cubric in a future kit release.
If usdrt eventually exposes the required
eRigidBodyoptions via theIFabricHierarchyAPI then that would massively simplify the implementation of newton manager. Will pursue a feature request.Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there