Improve contact sensor data buffer and support save data to lerobot dataset#197
Improve contact sensor data buffer and support save data to lerobot dataset#197
Conversation
There was a problem hiding this comment.
Pull request overview
This PR expands the simulation/gym API surface by introducing a richer contact sensor data layout (per-environment buffers + validity mask), improving SensorCfg.from_dict() for nested/list configclasses, and adding a new observation functor for retrieving object user IDs.
Changes:
- Updated contact sensor reporting to use per-env fixed-size buffers with an
is_validmask and added Warp kernel support for scattering contacts. - Enhanced
SensorCfg.from_dict()to better handle nested configclasses and lists of configclasses via type-hints. - Added
get_object_uidobservation functor plus docs/tests updates for the new functionality.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/sim/sensors/test_contact.py | Extends contact sensor tests to validate new per-env buffer shapes and is_valid, and adds a from_dict conversion test. |
| tests/gym/envs/managers/test_observation_functors.py | Adds unit tests for the new get_object_uid functor (plus mock env support). |
| examples/sim/sensors/create_contact_sensor.py | Updates example sim config (but currently introduces a headless flag bug). |
| embodichain/utils/warp/kernels.py | Adds scatter_contact_data Warp kernel to scatter filtered contacts into per-env buffers. |
| embodichain/lab/sim/sensors/contact_sensor.py | Implements per-env contact buffers (max_contacts_per_env) and is_valid mask, plus env filtering/visualization changes. |
| embodichain/lab/sim/sensors/base_sensor.py | Improves SensorCfg.from_dict() to support nested/list configclasses using evaluated type hints. |
| embodichain/lab/gym/envs/managers/observations.py | Adds get_object_uid observation functor. |
| docs/source/overview/sim/sim_sensor.md | Documents the new contact sensor buffer shapes, is_valid, env filtering, and max_contacts_per_env. |
| docs/source/overview/gym/observation_functors.md | Documents get_object_uid and adds an example usage snippet. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| wp.from_torch(self._data_buffer["user_ids"]), | ||
| wp.from_torch(self._data_buffer["is_valid"]), | ||
| ], | ||
| device="cuda:0" if device == "cuda" else device, |
There was a problem hiding this comment.
wp.launch(...) forces device="cuda:0" when str(self.device) is exactly "cuda", which can ignore the intended CUDA device (e.g., when a caller passes torch.device("cuda") but the current/default device is not 0). Consider passing device=str(self.device) directly (or mapping torch.device -> Warp device in a way that preserves any explicit index) to avoid hard-coding GPU 0.
| device="cuda:0" if device == "cuda" else device, | |
| device=device, |
| # Convert env_ids to tensor if needed | ||
| env_ids_tensor = ( | ||
| torch.tensor(env_ids, device=self.device) | ||
| if not isinstance(env_ids, torch.Tensor) | ||
| else env_ids | ||
| ) |
There was a problem hiding this comment.
env_ids_tensor is only moved to self.device when env_ids is not already a torch.Tensor. If a caller passes a CPU tensor while the sensor is on CUDA, the subsequent indexing operations will fail due to device mismatch. Consider ensuring env_ids_tensor = env_ids.to(self.device) when env_ids is a tensor (and possibly normalizing non-tensor sequences to a tensor as well).
| # Convert env_ids to tensor if needed | |
| env_ids_tensor = ( | |
| torch.tensor(env_ids, device=self.device) | |
| if not isinstance(env_ids, torch.Tensor) | |
| else env_ids | |
| ) | |
| # Convert env_ids to tensor on the correct device | |
| if isinstance(env_ids, torch.Tensor): | |
| env_ids_tensor = env_ids.to(self.device) | |
| else: | |
| env_ids_tensor = torch.tensor(env_ids, device=self.device) |
| features[key] = { | ||
| "dtype": str(space.dtype), | ||
| "shape": space.shape, | ||
| "names": key, | ||
| } |
There was a problem hiding this comment.
Extra top-level observations are now stored in features using the raw observation-space key (e.g. features[key]), but nested Dict observations still use the observation.{key}.{sub_key} prefix (see _add_nested_features). This creates an inconsistent schema within the same dataset (some extra obs under observation.*, others not), which can break downstream consumers expecting a single convention. Consider either keeping the observation. prefix for top-level extras as well, or removing it for nested extras so the naming is consistent.
| """Get feature names for an observation based on its functor config. | ||
|
|
||
| Note: | ||
| The `space` parameter is kept for API consistency but not used | ||
| directly, as the feature names are derived from the functor config | ||
| and entity properties. | ||
|
|
||
| For observations generated by `get_object_uid`, returns meaningful names: | ||
| - RigidObject: object UID names | ||
| - Articulation/Robot: link names | ||
|
|
||
| Args: | ||
| key: The observation space key. | ||
| space: The observation space. | ||
|
|
||
| Returns: | ||
| A list of feature names for the observation. |
There was a problem hiding this comment.
The _modify_feature_names docstring mentions parameters like key/space and says it returns a list of names, but the function signature is (_modify_feature_names(self, features: dict[str, Any]) -> None) and it mutates features in place. Please update the docstring so that the documented args/returns match the implementation (or adjust the signature if the docstring is intended).
| """Get feature names for an observation based on its functor config. | |
| Note: | |
| The `space` parameter is kept for API consistency but not used | |
| directly, as the feature names are derived from the functor config | |
| and entity properties. | |
| For observations generated by `get_object_uid`, returns meaningful names: | |
| - RigidObject: object UID names | |
| - Articulation/Robot: link names | |
| Args: | |
| key: The observation space key. | |
| space: The observation space. | |
| Returns: | |
| A list of feature names for the observation. | |
| """Modify feature metadata in-place based on the functor configuration. | |
| This function adjusts the feature definitions stored in ``features``: | |
| - Ensures scalar features with empty shapes ``()`` are converted to ``(1,)``, | |
| as required by LeRobot. | |
| - For observations generated by :func:`get_object_uid`, assigns meaningful | |
| ``names`` values based on the corresponding asset: | |
| - :class:`RigidObject`: the asset UID is used as the feature name. | |
| - :class:`Articulation` / :class:`Robot`: link names are used as | |
| the feature names. | |
| Args: | |
| features: A mapping from feature keys to metadata dictionaries. This | |
| dictionary is modified in-place; no value is returned. |
| sensor_name | ||
| ][ | ||
| frame_name | ||
| ].cpu() # Debug here to inspect contact sensor data |
There was a problem hiding this comment.
The inline comment # Debug here to inspect contact sensor data looks like a leftover debugging note in production code. Consider removing it or converting it into a proper explanatory comment (e.g., describing why .cpu() is required) to avoid confusing future readers.
| ].cpu() # Debug here to inspect contact sensor data | |
| ].cpu() # Move contact sensor tensor to CPU for serialization |
| Note: | ||
| If an environment has more contacts than max_contacts_per_env, excess contacts | ||
| are silently dropped. The num_contacts_per_env output will reflect the actual | ||
| number of contacts written (capped at max_contacts_per_env). |
There was a problem hiding this comment.
This kernel docstring states that contacts beyond max_contacts_per_env are "silently dropped", but the PR description calls out a "truncation warning". Either implement a warning/metric when truncation occurs (e.g., after the launch, detect any env where num_contacts_per_env == max_contacts_per_env and log once), or update the PR description/docs to match the current silent-drop behavior.
Description
This PR introduces a comprehensive contact sensor implementation and expands the
observation functors API with physics attributes retrieval.
Key Changes
Contact Sensor (embodichain/lab/sim/sensors/contact_sensor.py)
articulation links
user_ids, and validity mask
PointCloud
Base Sensor Configuration (embodichain/lab/sim/sensors/base_sensor.py)
Observation Functors (embodichain/lab/gym/envs/managers/observations.py)
friction, damping, inertia, and body scale with caching
properties (stiffness, damping, max_effort, max_velocity, friction) with caching
Documentation
examples
documentation
Tests
both CPU and GPU physics
new observation functors
Type of change
Checklist