From d1031bbc922ccd64b02a49fed1591632d6cf8284 Mon Sep 17 00:00:00 2001
From: yuecideng <dengyueci@qq.com>
Date: Fri, 13 Mar 2026 10:08:39 +0800
Subject: [PATCH 1/2] wip

---
 docs/source/overview/gym/action_functors.md  |  97 ++++++++++++
 docs/source/overview/gym/dataset_functors.md | 123 +++++++++++++++
 docs/source/overview/gym/env.md              |   9 +-
 docs/source/overview/gym/reward_functors.md  | 158 +++++++++++++++++++
 4 files changed, 384 insertions(+), 3 deletions(-)
 create mode 100644 docs/source/overview/gym/action_functors.md
 create mode 100644 docs/source/overview/gym/dataset_functors.md
 create mode 100644 docs/source/overview/gym/reward_functors.md

diff --git a/docs/source/overview/gym/action_functors.md b/docs/source/overview/gym/action_functors.md
new file mode 100644
index 00000000..9c3cbd06
--- /dev/null
+++ b/docs/source/overview/gym/action_functors.md
@@ -0,0 +1,97 @@
+# Action Functors
+
+```{currentmodule} embodichain.lab.gym.envs.managers
+```
+
+This page lists all available action terms that can be used with the Action Manager. Action terms are configured using {class}`~cfg.ActionTermCfg` and are responsible for processing raw actions from the policy and converting them to the format expected by the robot (e.g., qpos, qvel, qf).
+
+## Joint Position Control
+
+```{list-table} Joint Position Action Terms
+:header-rows: 1
+:widths: 30 70
+
+* - Action Term
+  - Description
+* - ``DeltaQposTerm``
+  - Delta joint position action: current_qpos + scale * action -> qpos. The policy outputs position deltas relative to the current joint positions.
+* - ``QposTerm``
+  - Absolute joint position action: scale * action -> qpos. The policy outputs direct target joint positions.
+* - ``QposNormalizedTerm``
+  - Normalized action in [-1, 1] -> denormalize to joint limits -> qpos. The policy outputs normalized values that are mapped to joint limits. With scale=1.0 (default), action in [-1, 1] maps to [low, high].
+```
+
+## End-Effector Control
+
+```{list-table} End-Effector Action Terms
+:header-rows: 1
+:widths: 30 70
+
+* - Action Term
+  - Description
+* - ``EefPoseTerm``
+  - End-effector pose (6D or 7D) -> IK -> qpos. The policy outputs target end-effector poses which are converted to joint positions via inverse kinematics. Returns ``ik_success`` in the output so reward/observation can penalize or condition on IK failures. Supports both 6D (euler angles) and 7D (quaternion) pose representations.
+```
+
+## Velocity and Force Control
+
+```{list-table} Velocity and Force Action Terms
+:header-rows: 1
+:widths: 30 70
+
+* - Action Term
+  - Description
+* - ``QvelTerm``
+  - Joint velocity action: scale * action -> qvel. The policy outputs target joint velocities.
+* - ``QfTerm``
+  - Joint force/torque action: scale * action -> qf. The policy outputs target joint torques/forces.
+```
+
+## Usage Example
+
+```python
+from embodichain.lab.gym.envs.managers.cfg import ActionTermCfg
+
+# Example: Delta joint position control
+actions = {
+    "joint_position": ActionTermCfg(
+        func="embodichain.lab.gym.envs.managers.action_manager.DeltaQposTerm",
+        params={
+            "scale": 0.1,  # Scale factor for action deltas
+        },
+    ),
+}
+
+# Example: Normalized joint position control
+actions = {
+    "normalized_joint_position": ActionTermCfg(
+        func="embodichain.lab.gym.envs.managers.action_manager.QposNormalizedTerm",
+        params={
+            "scale": 1.0,  # Full joint range utilization
+        },
+    ),
+}
+
+# Example: End-effector pose control
+actions = {
+    "eef_pose": ActionTermCfg(
+        func="embodichain.lab.gym.envs.managers.action_manager.EefPoseTerm",
+        params={
+            "scale": 0.1,
+            "pose_dim": 7,  # 7D (position + quaternion)
+        },
+    ),
+}
+```
+
+## Action Term Properties
+
+All action terms provide the following properties:
+
+- ``action_dim``: The dimension of the action space (number of values the policy should output)
+- ``process_action(action)``: Method to convert raw policy output to robot control format
+
+The Action Manager also provides:
+
+- ``total_action_dim``: Total dimension of all action terms combined
+- ``action_type``: The active action type (term name) for backward compatibility
diff --git a/docs/source/overview/gym/dataset_functors.md b/docs/source/overview/gym/dataset_functors.md
new file mode 100644
index 00000000..73181d5f
--- /dev/null
+++ b/docs/source/overview/gym/dataset_functors.md
@@ -0,0 +1,123 @@
+# Dataset Functors
+
+```{currentmodule} embodichain.lab.gym.envs.managers
+```
+
+This page lists all available dataset functors that can be used with the Dataset Manager. Dataset functors are configured using {class}`~cfg.DatasetFunctorCfg` and are responsible for collecting and saving episode data during environment interaction.
+
+## Recording Functors
+
+```{list-table} Dataset Recording Functors
+:header-rows: 1
+:widths: 30 70
+
+* - Functor Name
+  - Description
+* - ``LeRobotRecorder``
+  - Records episodes in LeRobot dataset format. Handles observation-action pair recording, format conversion, and episode saving. Requires LeRobot package to be installed.
+```
+
+## LeRobotRecorder
+
+The ``LeRobotRecorder`` functor enables recording robot learning episodes in the LeRobot dataset format, which can be used for training with LeRobot's imitation learning algorithms.
+
+### Features
+
+- Records observation-action pairs during episodes
+- Converts data to LeRobot format automatically
+- Saves episodes when they complete
+- Supports vision sensors (camera images)
+- Supports robot state (qpos, qvel, qf)
+- Supports custom observation features
+- Auto-incrementing dataset naming
+
+### Parameters
+
+```{list-table} LeRobotRecorder Parameters
+:header-rows: 1
+:widths: 30 70
+
+* - Parameter
+  - Description
+* - ``save_path``
+  - Root directory for saving datasets. Defaults to EmbodiChain's default dataset root.
+* - ``robot_meta``
+  - Robot metadata for dataset (robot_type, control_freq, etc.)
+* - ``instruction``
+  - Optional task instruction (e.g., {"lang": "pick the cube"})
+* - ``extra``
+  - Optional extra metadata (scene_type, task_description, episode_info)
+* - ``use_videos``
+  - Whether to save videos (True) or images (False). Default: False.
+* - ``image_writer_threads``
+  - Number of threads for image writing
+* - ``image_writer_processes``
+  - Number of processes for image writing
+```
+
+### Recorded Data
+
+The LeRobotRecorder saves the following data for each frame:
+
+- ``observation.qpos``: Joint positions
+- ``observation.qvel``: Joint velocities
+- ``observation.qf``: Joint forces/torques
+- ``action``: Applied action
+- ``{sensor_name}.color``: Camera images (if sensors present)
+- ``{sensor_name}.color_right``: Right camera images (for stereo cameras)
+
+## Usage Example
+
+```python
+from embodyichain.lab.gym.envs.managers.cfg import DatasetFunctorCfg
+
+# Example: Record episodes in LeRobot format
+dataset = {
+    "lerobot_recorder": DatasetFunctorCfg(
+        func="embodichain.lab.gym.envs.managers.datasets.LeRobotRecorder",
+        params={
+            "save_path": "/path/to/dataset/root",
+            "robot_meta": {
+                "robot_type": "dexforce_w1",
+                "control_freq": 30,
+            },
+            "instruction": {
+                "lang": "pick the cube and place it on the target",
+            },
+            "extra": {
+                "scene_type": "table",
+                "task_description": "pick_and_place",
+                "episode_info": {
+                    "rigid_object_physics_attributes": ["mass"],
+                },
+            },
+            "use_videos": False,
+        },
+    ),
+}
+```
+
+### Recording Workflow
+
+1. **Initialization**: The Dataset Manager initializes the functor with the configured parameters
+2. **Data Collection**: During episode rollout, the functor receives observations and actions
+3. **Save Trigger**: When an episode completes, call the functor with `mode="save"`
+4. **Finalization**: After all episodes, call `finalize()` to save any remaining data
+
+```python
+# Inside environment loop
+if episode_done:
+    dataset_manager.apply(mode="save", env_ids=completed_env_ids)
+
+# After training completes
+dataset_manager.apply(mode="finalize")
+```
+
+## Dataset Manager Modes
+
+The Dataset Manager supports the following modes:
+
+- ``save``: Save completed episodes for specified environment IDs
+- ``finalize``: Finalize the dataset and save any remaining data
+
+See {class}`~managers.dataset_manager.DatasetManager` for more details.
diff --git a/docs/source/overview/gym/env.md b/docs/source/overview/gym/env.md
index 42311a1d..853374c5 100644
--- a/docs/source/overview/gym/env.md
+++ b/docs/source/overview/gym/env.md
@@ -165,7 +165,7 @@ For a complete list of available observation functors, please refer to {doc}`obs
 
 ### Dataset Manager
 
-For Imitation Learning (IL) tasks, the Dataset Manager automates data collection through dataset functors. It currently supports:
+For Imitation Learning (IL) tasks, the Dataset Manager automates data collection through dataset functors. For a complete list of available dataset functors and their parameters, please refer to {doc}`dataset_functors`. It currently supports:
 
 * **LeRobot Format** (via {class}`~envs.managers.datasets.LeRobotRecorder`):
   Standard format for LeRobot training pipelines. Includes support for task instructions, robot metadata, success flags, and optional video recording.
@@ -191,7 +191,7 @@ The dataset manager is called automatically during {meth}`~envs.Env.step()`, ens
 
 For RL tasks, EmbodiChain uses the **Action Manager** integrated into {class}`~envs.EmbodiedEnv`:
 
-* **Action Preprocessing**: Configurable via ``actions`` in {class}`~envs.EmbodiedEnvCfg`. Supports DeltaQposTerm, QposTerm, QposNormalizedTerm, EefPoseTerm, QvelTerm, QfTerm.
+* **Action Preprocessing**: Configurable via ``actions`` in {class}`~envs.EmbodiedEnvCfg`. Supports DeltaQposTerm, QposTerm, QposNormalizedTerm, EefPoseTerm, QvelTerm, QfTerm. For a complete list of available action terms, please refer to {doc}`action_functors`.
 * **Standardized Info Structure**: {class}`~envs.EmbodiedEnv` provides ``compute_task_state``, ``get_info``, and ``evaluate`` for task-specific success/failure and metrics.
 * **Episode Management**: Configurable episode length and truncation logic.
 
@@ -256,7 +256,7 @@ class MyRLTaskEnv(EmbodiedEnv):
         return is_success, is_fail, metrics
 ```
 
-Configure rewards through the {class}`~envs.managers.RewardManager` in your environment config rather than overriding ``get_reward``.
+Configure rewards through the {class}`~envs.managers.RewardManager` in your environment config rather than overriding ``get_reward``. For a complete list of available reward functors, please refer to {doc}`reward_functors`.
 
 ### For Imitation Learning Tasks
 
@@ -301,4 +301,7 @@ For a complete example of a modular environment setup, please refer to the {ref}
 
 event_functors.md
 observation_functors.md
+reward_functors.md
+action_functors.md
+dataset_functors.md
 ```
diff --git a/docs/source/overview/gym/reward_functors.md b/docs/source/overview/gym/reward_functors.md
new file mode 100644
index 00000000..bce98e62
--- /dev/null
+++ b/docs/source/overview/gym/reward_functors.md
@@ -0,0 +1,158 @@
+# Reward Functors
+
+```{currentmodule} embodichain.lab.gym.envs.managers
+```
+
+This page lists all available reward functors that can be used with the Reward Manager. Reward functors are configured using {class}`~cfg.RewardCfg` and return scalar reward tensors that are weighted and summed to form the total environment reward.
+
+## Distance-Based Rewards
+
+```{list-table} Distance-Based Reward Functors
+:header-rows: 1
+:widths: 30 70
+
+* - Functor Name
+  - Description
+* - ``distance_between_objects``
+  - Reward based on distance between two rigid objects. Supports either linear negative distance or exponential Gaussian-shaped reward. Higher when objects are closer.
+* - ``distance_to_target``
+  - Reward based on absolute distance to a virtual target pose. Uses target pose stored in env by randomize_target_pose event. Can use exponential or linear reward, and supports XY-only distance.
+* - ``incremental_distance_to_target``
+  - Incremental reward for progress toward a virtual target pose. Rewards getting closer compared to previous timestep. Uses tanh shaping and supports asymmetric weighting for approach vs. retreat.
+```
+
+## Alignment Rewards
+
+```{list-table} Alignment Reward Functors
+:header-rows: 1
+:widths: 30 70
+
+* - Functor Name
+  - Description
+* - ``orientation_alignment``
+  - Reward rotational alignment between two rigid objects. Uses rotation matrix trace to measure alignment. Ranges from -1 to 1 (1.0 = perfect alignment).
+```
+
+## Task-Specific Rewards
+
+```{list-table} Task-Specific Reward Functors
+:header-rows: 1
+:widths: 30 70
+
+* - Functor Name
+  - Description
+* - ``reaching_behind_object``
+  - Reward for positioning end-effector behind object for pushing. Encourages reaching a position behind the object along the object-to-goal direction.
+* - ``success_reward``
+  - Sparse bonus reward when task succeeds. Reads success status from info['success'] which should be set by the environment.
+```
+
+## Penalty Rewards
+
+```{list-table} Penalty Reward Functors
+:header-rows: 1
+:widths: 30 70
+
+* - Functor Name
+  - Description
+* - ``joint_velocity_penalty``
+  - Penalize high joint velocities to encourage smooth motion. Computes L2 norm of joint velocities and returns negative value as penalty.
+* - ``action_smoothness_penalty``
+  - Penalize large action changes between consecutive timesteps. Encourages smooth control commands. Reads previous action from env.episode_action_buffer.
+* - ``joint_limit_penalty``
+  - Penalize robot joints that are close to their position limits. Prevents joints from reaching physical limits. Penalty increases as joints approach limits within a margin.
+```
+
+## Usage Example
+
+```python
+from embodyichain.lab.gym.envs.managers.cfg import RewardCfg, SceneEntityCfg
+
+# Example: Distance-based reward with exponential shaping
+rewards = {
+    "approach_object": RewardCfg(
+        func="distance_between_objects",
+        weight=0.5,
+        params={
+            "source_entity_cfg": SceneEntityCfg(uid="cube"),
+            "target_entity_cfg": SceneEntityCfg(uid="target"),
+            "exponential": True,
+            "sigma": 0.2,
+        },
+    ),
+}
+
+# Example: Joint velocity penalty
+rewards = {
+    "joint_velocity_penalty": RewardCfg(
+        func="joint_velocity_penalty",
+        weight=0.001,
+        params={
+            "robot_uid": "robot",
+            "part_name": "arm",
+        },
+    ),
+}
+
+# Example: Action smoothness penalty
+rewards = {
+    "action_smoothness": RewardCfg(
+        func="action_smoothness_penalty",
+        weight=0.01,
+        params={},
+    ),
+}
+
+# Example: Success reward
+rewards = {
+    "success": RewardCfg(
+        func="success_reward",
+        weight=10.0,
+        params={},
+    ),
+}
+
+# Example: Incremental distance reward
+rewards = {
+    "incremental_progress": RewardCfg(
+        func="incremental_distance_to_target",
+        weight=1.0,
+        params={
+            "source_entity_cfg": SceneEntityCfg(uid="cube"),
+            "target_pose_key": "goal_pose",
+            "tanh_scale": 10.0,
+            "positive_weight": 2.0,
+            "negative_weight": 0.5,
+            "use_xy_only": True,
+        },
+    ),
+}
+```
+
+## Reward Function Signature
+
+All reward functors follow the same signature:
+
+```python
+def reward_functor(
+    env: EmbodiedEnv,
+    obs: dict,
+    action: torch.Tensor | dict,
+    info: dict,
+    **params,
+) -> torch.Tensor:
+    """Reward functor.
+
+    Args:
+        env: The environment instance.
+        obs: Current observation dictionary.
+        action: Current action from policy.
+        info: Info dictionary from environment.
+        **params: Additional parameters from config.
+
+    Returns:
+        Reward tensor of shape (num_envs,).
+    """
+```
+
+The reward manager automatically weights and sums all configured rewards to produce the total reward at each timestep.

From 065056602c5b7b2155c3c803227f04584d0cebb6 Mon Sep 17 00:00:00 2001
From: yuecideng <dengyueci@qq.com>
Date: Fri, 13 Mar 2026 13:50:08 +0800
Subject: [PATCH 2/2] wip

---
 docs/source/overview/gym/dataset_functors.md | 2 +-
 docs/source/overview/gym/reward_functors.md  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/source/overview/gym/dataset_functors.md b/docs/source/overview/gym/dataset_functors.md
index 73181d5f..232847d3 100644
--- a/docs/source/overview/gym/dataset_functors.md
+++ b/docs/source/overview/gym/dataset_functors.md
@@ -69,7 +69,7 @@ The LeRobotRecorder saves the following data for each frame:
 ## Usage Example
 
 ```python
-from embodyichain.lab.gym.envs.managers.cfg import DatasetFunctorCfg
+from embodichain.lab.gym.envs.managers.cfg import DatasetFunctorCfg
 
 # Example: Record episodes in LeRobot format
 dataset = {
diff --git a/docs/source/overview/gym/reward_functors.md b/docs/source/overview/gym/reward_functors.md
index bce98e62..ce03e892 100644
--- a/docs/source/overview/gym/reward_functors.md
+++ b/docs/source/overview/gym/reward_functors.md
@@ -66,7 +66,7 @@ This page lists all available reward functors that can be used with the Reward M
 ## Usage Example
 
 ```python
-from embodyichain.lab.gym.envs.managers.cfg import RewardCfg, SceneEntityCfg
+from embodichain.lab.gym.envs.managers.cfg import RewardCfg, SceneEntityCfg
 
 # Example: Distance-based reward with exponential shaping
 rewards = {