From d1031bbc922ccd64b02a49fed1591632d6cf8284 Mon Sep 17 00:00:00 2001 From: yuecideng Date: Fri, 13 Mar 2026 10:08:39 +0800 Subject: [PATCH 1/2] wip --- docs/source/overview/gym/action_functors.md | 97 ++++++++++++ docs/source/overview/gym/dataset_functors.md | 123 +++++++++++++++ docs/source/overview/gym/env.md | 9 +- docs/source/overview/gym/reward_functors.md | 158 +++++++++++++++++++ 4 files changed, 384 insertions(+), 3 deletions(-) create mode 100644 docs/source/overview/gym/action_functors.md create mode 100644 docs/source/overview/gym/dataset_functors.md create mode 100644 docs/source/overview/gym/reward_functors.md diff --git a/docs/source/overview/gym/action_functors.md b/docs/source/overview/gym/action_functors.md new file mode 100644 index 00000000..9c3cbd06 --- /dev/null +++ b/docs/source/overview/gym/action_functors.md @@ -0,0 +1,97 @@ +# Action Functors + +```{currentmodule} embodichain.lab.gym.envs.managers +``` + +This page lists all available action terms that can be used with the Action Manager. Action terms are configured using {class}`~cfg.ActionTermCfg` and are responsible for processing raw actions from the policy and converting them to the format expected by the robot (e.g., qpos, qvel, qf). + +## Joint Position Control + +```{list-table} Joint Position Action Terms +:header-rows: 1 +:widths: 30 70 + +* - Action Term + - Description +* - ``DeltaQposTerm`` + - Delta joint position action: current_qpos + scale * action -> qpos. The policy outputs position deltas relative to the current joint positions. +* - ``QposTerm`` + - Absolute joint position action: scale * action -> qpos. The policy outputs direct target joint positions. +* - ``QposNormalizedTerm`` + - Normalized action in [-1, 1] -> denormalize to joint limits -> qpos. The policy outputs normalized values that are mapped to joint limits. With scale=1.0 (default), action in [-1, 1] maps to [low, high]. +``` + +## End-Effector Control + +```{list-table} End-Effector Action Terms +:header-rows: 1 +:widths: 30 70 + +* - Action Term + - Description +* - ``EefPoseTerm`` + - End-effector pose (6D or 7D) -> IK -> qpos. The policy outputs target end-effector poses which are converted to joint positions via inverse kinematics. Returns ``ik_success`` in the output so reward/observation can penalize or condition on IK failures. Supports both 6D (euler angles) and 7D (quaternion) pose representations. +``` + +## Velocity and Force Control + +```{list-table} Velocity and Force Action Terms +:header-rows: 1 +:widths: 30 70 + +* - Action Term + - Description +* - ``QvelTerm`` + - Joint velocity action: scale * action -> qvel. The policy outputs target joint velocities. +* - ``QfTerm`` + - Joint force/torque action: scale * action -> qf. The policy outputs target joint torques/forces. +``` + +## Usage Example + +```python +from embodichain.lab.gym.envs.managers.cfg import ActionTermCfg + +# Example: Delta joint position control +actions = { + "joint_position": ActionTermCfg( + func="embodichain.lab.gym.envs.managers.action_manager.DeltaQposTerm", + params={ + "scale": 0.1, # Scale factor for action deltas + }, + ), +} + +# Example: Normalized joint position control +actions = { + "normalized_joint_position": ActionTermCfg( + func="embodichain.lab.gym.envs.managers.action_manager.QposNormalizedTerm", + params={ + "scale": 1.0, # Full joint range utilization + }, + ), +} + +# Example: End-effector pose control +actions = { + "eef_pose": ActionTermCfg( + func="embodichain.lab.gym.envs.managers.action_manager.EefPoseTerm", + params={ + "scale": 0.1, + "pose_dim": 7, # 7D (position + quaternion) + }, + ), +} +``` + +## Action Term Properties + +All action terms provide the following properties: + +- ``action_dim``: The dimension of the action space (number of values the policy should output) +- ``process_action(action)``: Method to convert raw policy output to robot control format + +The Action Manager also provides: + +- ``total_action_dim``: Total dimension of all action terms combined +- ``action_type``: The active action type (term name) for backward compatibility diff --git a/docs/source/overview/gym/dataset_functors.md b/docs/source/overview/gym/dataset_functors.md new file mode 100644 index 00000000..73181d5f --- /dev/null +++ b/docs/source/overview/gym/dataset_functors.md @@ -0,0 +1,123 @@ +# Dataset Functors + +```{currentmodule} embodichain.lab.gym.envs.managers +``` + +This page lists all available dataset functors that can be used with the Dataset Manager. Dataset functors are configured using {class}`~cfg.DatasetFunctorCfg` and are responsible for collecting and saving episode data during environment interaction. + +## Recording Functors + +```{list-table} Dataset Recording Functors +:header-rows: 1 +:widths: 30 70 + +* - Functor Name + - Description +* - ``LeRobotRecorder`` + - Records episodes in LeRobot dataset format. Handles observation-action pair recording, format conversion, and episode saving. Requires LeRobot package to be installed. +``` + +## LeRobotRecorder + +The ``LeRobotRecorder`` functor enables recording robot learning episodes in the LeRobot dataset format, which can be used for training with LeRobot's imitation learning algorithms. + +### Features + +- Records observation-action pairs during episodes +- Converts data to LeRobot format automatically +- Saves episodes when they complete +- Supports vision sensors (camera images) +- Supports robot state (qpos, qvel, qf) +- Supports custom observation features +- Auto-incrementing dataset naming + +### Parameters + +```{list-table} LeRobotRecorder Parameters +:header-rows: 1 +:widths: 30 70 + +* - Parameter + - Description +* - ``save_path`` + - Root directory for saving datasets. Defaults to EmbodiChain's default dataset root. +* - ``robot_meta`` + - Robot metadata for dataset (robot_type, control_freq, etc.) +* - ``instruction`` + - Optional task instruction (e.g., {"lang": "pick the cube"}) +* - ``extra`` + - Optional extra metadata (scene_type, task_description, episode_info) +* - ``use_videos`` + - Whether to save videos (True) or images (False). Default: False. +* - ``image_writer_threads`` + - Number of threads for image writing +* - ``image_writer_processes`` + - Number of processes for image writing +``` + +### Recorded Data + +The LeRobotRecorder saves the following data for each frame: + +- ``observation.qpos``: Joint positions +- ``observation.qvel``: Joint velocities +- ``observation.qf``: Joint forces/torques +- ``action``: Applied action +- ``{sensor_name}.color``: Camera images (if sensors present) +- ``{sensor_name}.color_right``: Right camera images (for stereo cameras) + +## Usage Example + +```python +from embodyichain.lab.gym.envs.managers.cfg import DatasetFunctorCfg + +# Example: Record episodes in LeRobot format +dataset = { + "lerobot_recorder": DatasetFunctorCfg( + func="embodichain.lab.gym.envs.managers.datasets.LeRobotRecorder", + params={ + "save_path": "/path/to/dataset/root", + "robot_meta": { + "robot_type": "dexforce_w1", + "control_freq": 30, + }, + "instruction": { + "lang": "pick the cube and place it on the target", + }, + "extra": { + "scene_type": "table", + "task_description": "pick_and_place", + "episode_info": { + "rigid_object_physics_attributes": ["mass"], + }, + }, + "use_videos": False, + }, + ), +} +``` + +### Recording Workflow + +1. **Initialization**: The Dataset Manager initializes the functor with the configured parameters +2. **Data Collection**: During episode rollout, the functor receives observations and actions +3. **Save Trigger**: When an episode completes, call the functor with `mode="save"` +4. **Finalization**: After all episodes, call `finalize()` to save any remaining data + +```python +# Inside environment loop +if episode_done: + dataset_manager.apply(mode="save", env_ids=completed_env_ids) + +# After training completes +dataset_manager.apply(mode="finalize") +``` + +## Dataset Manager Modes + +The Dataset Manager supports the following modes: + +- ``save``: Save completed episodes for specified environment IDs +- ``finalize``: Finalize the dataset and save any remaining data + +See {class}`~managers.dataset_manager.DatasetManager` for more details. diff --git a/docs/source/overview/gym/env.md b/docs/source/overview/gym/env.md index 42311a1d..853374c5 100644 --- a/docs/source/overview/gym/env.md +++ b/docs/source/overview/gym/env.md @@ -165,7 +165,7 @@ For a complete list of available observation functors, please refer to {doc}`obs ### Dataset Manager -For Imitation Learning (IL) tasks, the Dataset Manager automates data collection through dataset functors. It currently supports: +For Imitation Learning (IL) tasks, the Dataset Manager automates data collection through dataset functors. For a complete list of available dataset functors and their parameters, please refer to {doc}`dataset_functors`. It currently supports: * **LeRobot Format** (via {class}`~envs.managers.datasets.LeRobotRecorder`): Standard format for LeRobot training pipelines. Includes support for task instructions, robot metadata, success flags, and optional video recording. @@ -191,7 +191,7 @@ The dataset manager is called automatically during {meth}`~envs.Env.step()`, ens For RL tasks, EmbodiChain uses the **Action Manager** integrated into {class}`~envs.EmbodiedEnv`: -* **Action Preprocessing**: Configurable via ``actions`` in {class}`~envs.EmbodiedEnvCfg`. Supports DeltaQposTerm, QposTerm, QposNormalizedTerm, EefPoseTerm, QvelTerm, QfTerm. +* **Action Preprocessing**: Configurable via ``actions`` in {class}`~envs.EmbodiedEnvCfg`. Supports DeltaQposTerm, QposTerm, QposNormalizedTerm, EefPoseTerm, QvelTerm, QfTerm. For a complete list of available action terms, please refer to {doc}`action_functors`. * **Standardized Info Structure**: {class}`~envs.EmbodiedEnv` provides ``compute_task_state``, ``get_info``, and ``evaluate`` for task-specific success/failure and metrics. * **Episode Management**: Configurable episode length and truncation logic. @@ -256,7 +256,7 @@ class MyRLTaskEnv(EmbodiedEnv): return is_success, is_fail, metrics ``` -Configure rewards through the {class}`~envs.managers.RewardManager` in your environment config rather than overriding ``get_reward``. +Configure rewards through the {class}`~envs.managers.RewardManager` in your environment config rather than overriding ``get_reward``. For a complete list of available reward functors, please refer to {doc}`reward_functors`. ### For Imitation Learning Tasks @@ -301,4 +301,7 @@ For a complete example of a modular environment setup, please refer to the {ref} event_functors.md observation_functors.md +reward_functors.md +action_functors.md +dataset_functors.md ``` diff --git a/docs/source/overview/gym/reward_functors.md b/docs/source/overview/gym/reward_functors.md new file mode 100644 index 00000000..bce98e62 --- /dev/null +++ b/docs/source/overview/gym/reward_functors.md @@ -0,0 +1,158 @@ +# Reward Functors + +```{currentmodule} embodichain.lab.gym.envs.managers +``` + +This page lists all available reward functors that can be used with the Reward Manager. Reward functors are configured using {class}`~cfg.RewardCfg` and return scalar reward tensors that are weighted and summed to form the total environment reward. + +## Distance-Based Rewards + +```{list-table} Distance-Based Reward Functors +:header-rows: 1 +:widths: 30 70 + +* - Functor Name + - Description +* - ``distance_between_objects`` + - Reward based on distance between two rigid objects. Supports either linear negative distance or exponential Gaussian-shaped reward. Higher when objects are closer. +* - ``distance_to_target`` + - Reward based on absolute distance to a virtual target pose. Uses target pose stored in env by randomize_target_pose event. Can use exponential or linear reward, and supports XY-only distance. +* - ``incremental_distance_to_target`` + - Incremental reward for progress toward a virtual target pose. Rewards getting closer compared to previous timestep. Uses tanh shaping and supports asymmetric weighting for approach vs. retreat. +``` + +## Alignment Rewards + +```{list-table} Alignment Reward Functors +:header-rows: 1 +:widths: 30 70 + +* - Functor Name + - Description +* - ``orientation_alignment`` + - Reward rotational alignment between two rigid objects. Uses rotation matrix trace to measure alignment. Ranges from -1 to 1 (1.0 = perfect alignment). +``` + +## Task-Specific Rewards + +```{list-table} Task-Specific Reward Functors +:header-rows: 1 +:widths: 30 70 + +* - Functor Name + - Description +* - ``reaching_behind_object`` + - Reward for positioning end-effector behind object for pushing. Encourages reaching a position behind the object along the object-to-goal direction. +* - ``success_reward`` + - Sparse bonus reward when task succeeds. Reads success status from info['success'] which should be set by the environment. +``` + +## Penalty Rewards + +```{list-table} Penalty Reward Functors +:header-rows: 1 +:widths: 30 70 + +* - Functor Name + - Description +* - ``joint_velocity_penalty`` + - Penalize high joint velocities to encourage smooth motion. Computes L2 norm of joint velocities and returns negative value as penalty. +* - ``action_smoothness_penalty`` + - Penalize large action changes between consecutive timesteps. Encourages smooth control commands. Reads previous action from env.episode_action_buffer. +* - ``joint_limit_penalty`` + - Penalize robot joints that are close to their position limits. Prevents joints from reaching physical limits. Penalty increases as joints approach limits within a margin. +``` + +## Usage Example + +```python +from embodyichain.lab.gym.envs.managers.cfg import RewardCfg, SceneEntityCfg + +# Example: Distance-based reward with exponential shaping +rewards = { + "approach_object": RewardCfg( + func="distance_between_objects", + weight=0.5, + params={ + "source_entity_cfg": SceneEntityCfg(uid="cube"), + "target_entity_cfg": SceneEntityCfg(uid="target"), + "exponential": True, + "sigma": 0.2, + }, + ), +} + +# Example: Joint velocity penalty +rewards = { + "joint_velocity_penalty": RewardCfg( + func="joint_velocity_penalty", + weight=0.001, + params={ + "robot_uid": "robot", + "part_name": "arm", + }, + ), +} + +# Example: Action smoothness penalty +rewards = { + "action_smoothness": RewardCfg( + func="action_smoothness_penalty", + weight=0.01, + params={}, + ), +} + +# Example: Success reward +rewards = { + "success": RewardCfg( + func="success_reward", + weight=10.0, + params={}, + ), +} + +# Example: Incremental distance reward +rewards = { + "incremental_progress": RewardCfg( + func="incremental_distance_to_target", + weight=1.0, + params={ + "source_entity_cfg": SceneEntityCfg(uid="cube"), + "target_pose_key": "goal_pose", + "tanh_scale": 10.0, + "positive_weight": 2.0, + "negative_weight": 0.5, + "use_xy_only": True, + }, + ), +} +``` + +## Reward Function Signature + +All reward functors follow the same signature: + +```python +def reward_functor( + env: EmbodiedEnv, + obs: dict, + action: torch.Tensor | dict, + info: dict, + **params, +) -> torch.Tensor: + """Reward functor. + + Args: + env: The environment instance. + obs: Current observation dictionary. + action: Current action from policy. + info: Info dictionary from environment. + **params: Additional parameters from config. + + Returns: + Reward tensor of shape (num_envs,). + """ +``` + +The reward manager automatically weights and sums all configured rewards to produce the total reward at each timestep. From 065056602c5b7b2155c3c803227f04584d0cebb6 Mon Sep 17 00:00:00 2001 From: yuecideng Date: Fri, 13 Mar 2026 13:50:08 +0800 Subject: [PATCH 2/2] wip --- docs/source/overview/gym/dataset_functors.md | 2 +- docs/source/overview/gym/reward_functors.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/overview/gym/dataset_functors.md b/docs/source/overview/gym/dataset_functors.md index 73181d5f..232847d3 100644 --- a/docs/source/overview/gym/dataset_functors.md +++ b/docs/source/overview/gym/dataset_functors.md @@ -69,7 +69,7 @@ The LeRobotRecorder saves the following data for each frame: ## Usage Example ```python -from embodyichain.lab.gym.envs.managers.cfg import DatasetFunctorCfg +from embodichain.lab.gym.envs.managers.cfg import DatasetFunctorCfg # Example: Record episodes in LeRobot format dataset = { diff --git a/docs/source/overview/gym/reward_functors.md b/docs/source/overview/gym/reward_functors.md index bce98e62..ce03e892 100644 --- a/docs/source/overview/gym/reward_functors.md +++ b/docs/source/overview/gym/reward_functors.md @@ -66,7 +66,7 @@ This page lists all available reward functors that can be used with the Reward M ## Usage Example ```python -from embodyichain.lab.gym.envs.managers.cfg import RewardCfg, SceneEntityCfg +from embodichain.lab.gym.envs.managers.cfg import RewardCfg, SceneEntityCfg # Example: Distance-based reward with exponential shaping rewards = {