-
Notifications
You must be signed in to change notification settings - Fork 15
Guidance Reward #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guidance Reward #121
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a guided autonomy reward system that incentivizes agents to follow reference trajectories by tracking waypoint progress and penalizing deviations in speed and heading.
Key Changes:
- Added four new configuration parameters:
reward_guided_autonomy(master weight),guidance_speed_weight,guidance_heading_weight, andwaypoint_reach_threshold - Implemented route progress tracking with waypoint hit detection and exponential penalty functions for speed and heading deviations
- Integrated the guided autonomy reward computation into the main environment step function
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pufferlib/ocean/env_config.h | Added struct fields and config parser handlers for the four new guided autonomy parameters |
| pufferlib/ocean/drive/drive.py | Propagated the new parameters through Python environment initialization and step methods |
| pufferlib/ocean/drive/drive.h | Implemented waypoint tracking fields in Entity struct, guidance reward computation functions, and integration into the main step loop |
| pufferlib/ocean/drive/binding.c | Added parameter assignments from config to environment struct |
| pufferlib/config/ocean/drive.ini | Added default configuration values for guided autonomy parameters |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pufferlib/ocean/drive/drive.h
Outdated
| float speed_error_sq = speed_error * speed_error; | ||
|
|
||
| // Exponential penalty: 1.0 - exp(-error²) | ||
| float penalty = 1.0f - expf(-speed_error_sq + 1e-8f); |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exponential term should be expf(-speed_error_sq) without adding 1e-8f inside the exponent. Adding epsilon inside changes the mathematical behavior from a numerical stability constant to an incorrect offset. If numerical stability is needed, add epsilon to the squared error before negation: expf(-(speed_error_sq + 1e-8f)).
pufferlib/ocean/drive/drive.h
Outdated
| float heading_error_sq = heading_error * heading_error; | ||
|
|
||
| // Exponential penalty: 1.0 - exp(-error²) | ||
| float penalty = 1.0f - expf(-heading_error_sq + 1e-8f); |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exponential term should be expf(-heading_error_sq) without adding 1e-8f inside the exponent. Adding epsilon inside changes the mathematical behavior from a numerical stability constant to an incorrect offset. If numerical stability is needed, add epsilon to the squared error before negation: expf(-(heading_error_sq + 1e-8f)).
| float penalty = 1.0f - expf(-heading_error_sq + 1e-8f); | |
| float penalty = 1.0f - expf(-(heading_error_sq + 1e-8f)); |
Greptile Summary
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Python as Drive.py
participant Config as env_config.h
participant Binding as binding.c
participant Core as drive.h
User->>Python: Initialize environment with guidance params
Python->>Config: Parse drive.ini config
Config-->>Python: Return config with guidance settings
Python->>Binding: Create environment via my_init()
Binding->>Config: Read guidance_speed_weight, guidance_heading_weight, waypoint_reach_threshold
Binding->>Core: Initialize Drive env with guided autonomy params
Core->>Core: Allocate waypoints_hit arrays for entities
Core-->>Binding: Environment initialized
Binding-->>Python: Environment ready
User->>Python: Call step(action)
Python->>Binding: Execute c_step()
Binding->>Core: move_dynamics() - Update agent position
Binding->>Core: compute_guided_autonomy_reward()
Core->>Core: compute_route_guidance_reward() - Check waypoint proximity
Core->>Core: compute_speed_guidance_reward() - Calculate speed deviation
Core->>Core: compute_heading_guidance_reward() - Calculate heading error
Core-->>Binding: Return total guidance reward
Binding->>Binding: Add guidance reward to env->rewards[i]
Binding-->>Python: Return obs, rewards, dones
Python-->>User: Step result with guidance rewards
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 1 comment
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
| env->goal_radius = conf.goal_radius; | ||
| env->scenario_length = conf.scenario_length; | ||
| env->collision_behavior = conf.collision_behavior; | ||
| env->offroad_behavior = conf.offroad_behavior; | ||
| env->max_controlled_agents = unpack(kwargs, "max_controlled_agents"); | ||
| env->dt = conf.dt; | ||
| env->init_mode = (int)unpack(kwargs, "init_mode"); | ||
| env->control_mode = (int)unpack(kwargs, "control_mode"); | ||
| env->goal_behavior = (int)unpack(kwargs, "goal_behavior"); | ||
| env->goal_radius = (float)unpack(kwargs, "goal_radius"); | ||
| env->init_mode = conf.init_mode; | ||
| env->control_mode = conf.control_mode; | ||
| env->goal_behavior = conf.goal_behavior; | ||
| env->goal_radius = conf.goal_radius; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: goal_radius assigned twice from conf.goal_radius (lines 190 and 199)
| env->goal_radius = conf.goal_radius; | |
| env->scenario_length = conf.scenario_length; | |
| env->collision_behavior = conf.collision_behavior; | |
| env->offroad_behavior = conf.offroad_behavior; | |
| env->max_controlled_agents = unpack(kwargs, "max_controlled_agents"); | |
| env->dt = conf.dt; | |
| env->init_mode = (int)unpack(kwargs, "init_mode"); | |
| env->control_mode = (int)unpack(kwargs, "control_mode"); | |
| env->goal_behavior = (int)unpack(kwargs, "goal_behavior"); | |
| env->goal_radius = (float)unpack(kwargs, "goal_radius"); | |
| env->init_mode = conf.init_mode; | |
| env->control_mode = conf.control_mode; | |
| env->goal_behavior = conf.goal_behavior; | |
| env->goal_radius = conf.goal_radius; | |
| env->goal_radius = conf.goal_radius; | |
| env->scenario_length = conf.scenario_length; | |
| env->collision_behavior = conf.collision_behavior; | |
| env->offroad_behavior = conf.offroad_behavior; | |
| env->max_controlled_agents = unpack(kwargs, "max_controlled_agents"); | |
| env->dt = conf.dt; | |
| env->init_mode = conf.init_mode; | |
| env->control_mode = conf.control_mode; | |
| env->goal_behavior = conf.goal_behavior; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/ocean/drive/binding.c
Line: 190:199
Comment:
**style:** `goal_radius` assigned twice from `conf.goal_radius` (lines 190 and 199)
```suggestion
env->goal_radius = conf.goal_radius;
env->scenario_length = conf.scenario_length;
env->collision_behavior = conf.collision_behavior;
env->offroad_behavior = conf.offroad_behavior;
env->max_controlled_agents = unpack(kwargs, "max_controlled_agents");
env->dt = conf.dt;
env->init_mode = conf.init_mode;
env->control_mode = conf.control_mode;
env->goal_behavior = conf.goal_behavior;
```
How can I resolve this? If you propose a fix, please make it concise.
No description provided.