Skip to content

Conversation

@nadarenator
Copy link
Collaborator

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a guided autonomy reward system that incentivizes agents to follow reference trajectories by tracking waypoint progress and penalizing deviations in speed and heading.

Key Changes:

  • Added four new configuration parameters: reward_guided_autonomy (master weight), guidance_speed_weight, guidance_heading_weight, and waypoint_reach_threshold
  • Implemented route progress tracking with waypoint hit detection and exponential penalty functions for speed and heading deviations
  • Integrated the guided autonomy reward computation into the main environment step function

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pufferlib/ocean/env_config.h Added struct fields and config parser handlers for the four new guided autonomy parameters
pufferlib/ocean/drive/drive.py Propagated the new parameters through Python environment initialization and step methods
pufferlib/ocean/drive/drive.h Implemented waypoint tracking fields in Entity struct, guidance reward computation functions, and integration into the main step loop
pufferlib/ocean/drive/binding.c Added parameter assignments from config to environment struct
pufferlib/config/ocean/drive.ini Added default configuration values for guided autonomy parameters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

float speed_error_sq = speed_error * speed_error;

// Exponential penalty: 1.0 - exp(-error²)
float penalty = 1.0f - expf(-speed_error_sq + 1e-8f);
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exponential term should be expf(-speed_error_sq) without adding 1e-8f inside the exponent. Adding epsilon inside changes the mathematical behavior from a numerical stability constant to an incorrect offset. If numerical stability is needed, add epsilon to the squared error before negation: expf(-(speed_error_sq + 1e-8f)).

Copilot uses AI. Check for mistakes.
float heading_error_sq = heading_error * heading_error;

// Exponential penalty: 1.0 - exp(-error²)
float penalty = 1.0f - expf(-heading_error_sq + 1e-8f);
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exponential term should be expf(-heading_error_sq) without adding 1e-8f inside the exponent. Adding epsilon inside changes the mathematical behavior from a numerical stability constant to an incorrect offset. If numerical stability is needed, add epsilon to the squared error before negation: expf(-(heading_error_sq + 1e-8f)).

Suggested change
float penalty = 1.0f - expf(-heading_error_sq + 1e-8f);
float penalty = 1.0f - expf(-(heading_error_sq + 1e-8f));

Copilot uses AI. Check for mistakes.
@daphne-cornelisse daphne-cornelisse changed the base branch from main to gsp_dev November 11, 2025 23:27
@nadarenator nadarenator marked this pull request as ready for review November 18, 2025 20:23
@greptile-apps
Copy link

greptile-apps bot commented Nov 18, 2025

Greptile Summary

  • Added guided autonomy reward system that tracks waypoint progression, speed matching, and heading alignment against reference trajectories
  • Modified sweep configuration to focus on tuning guidance reward weights instead of human demonstration parameters
  • Increased reward_offroad_collision penalty from -0.2 to -0.5

Confidence Score: 4/5

  • This PR is safe to merge with minor style improvements recommended
  • Implementation is sound with proper memory management and parameter propagation, but contains one redundant assignment that should be cleaned up
  • Review pufferlib/ocean/drive/binding.c for the duplicate assignment

Important Files Changed

Filename Overview
pufferlib/ocean/drive/binding.c Initialized guided autonomy fields from config; contains duplicate goal_radius assignment on line 199
pufferlib/ocean/drive/drive.h Implemented guided autonomy reward computation with waypoint tracking, speed matching, and heading alignment

Sequence Diagram

sequenceDiagram
    participant User
    participant Python as Drive.py
    participant Config as env_config.h
    participant Binding as binding.c
    participant Core as drive.h
    
    User->>Python: Initialize environment with guidance params
    Python->>Config: Parse drive.ini config
    Config-->>Python: Return config with guidance settings
    Python->>Binding: Create environment via my_init()
    Binding->>Config: Read guidance_speed_weight, guidance_heading_weight, waypoint_reach_threshold
    Binding->>Core: Initialize Drive env with guided autonomy params
    Core->>Core: Allocate waypoints_hit arrays for entities
    Core-->>Binding: Environment initialized
    Binding-->>Python: Environment ready
    
    User->>Python: Call step(action)
    Python->>Binding: Execute c_step()
    Binding->>Core: move_dynamics() - Update agent position
    Binding->>Core: compute_guided_autonomy_reward()
    Core->>Core: compute_route_guidance_reward() - Check waypoint proximity
    Core->>Core: compute_speed_guidance_reward() - Calculate speed deviation
    Core->>Core: compute_heading_guidance_reward() - Calculate heading error
    Core-->>Binding: Return total guidance reward
    Binding->>Binding: Add guidance reward to env->rewards[i]
    Binding-->>Python: Return obs, rewards, dones
    Python-->>User: Step result with guidance rewards
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Comment on lines +190 to +199
env->goal_radius = conf.goal_radius;
env->scenario_length = conf.scenario_length;
env->collision_behavior = conf.collision_behavior;
env->offroad_behavior = conf.offroad_behavior;
env->max_controlled_agents = unpack(kwargs, "max_controlled_agents");
env->dt = conf.dt;
env->init_mode = (int)unpack(kwargs, "init_mode");
env->control_mode = (int)unpack(kwargs, "control_mode");
env->goal_behavior = (int)unpack(kwargs, "goal_behavior");
env->goal_radius = (float)unpack(kwargs, "goal_radius");
env->init_mode = conf.init_mode;
env->control_mode = conf.control_mode;
env->goal_behavior = conf.goal_behavior;
env->goal_radius = conf.goal_radius;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: goal_radius assigned twice from conf.goal_radius (lines 190 and 199)

Suggested change
env->goal_radius = conf.goal_radius;
env->scenario_length = conf.scenario_length;
env->collision_behavior = conf.collision_behavior;
env->offroad_behavior = conf.offroad_behavior;
env->max_controlled_agents = unpack(kwargs, "max_controlled_agents");
env->dt = conf.dt;
env->init_mode = (int)unpack(kwargs, "init_mode");
env->control_mode = (int)unpack(kwargs, "control_mode");
env->goal_behavior = (int)unpack(kwargs, "goal_behavior");
env->goal_radius = (float)unpack(kwargs, "goal_radius");
env->init_mode = conf.init_mode;
env->control_mode = conf.control_mode;
env->goal_behavior = conf.goal_behavior;
env->goal_radius = conf.goal_radius;
env->goal_radius = conf.goal_radius;
env->scenario_length = conf.scenario_length;
env->collision_behavior = conf.collision_behavior;
env->offroad_behavior = conf.offroad_behavior;
env->max_controlled_agents = unpack(kwargs, "max_controlled_agents");
env->dt = conf.dt;
env->init_mode = conf.init_mode;
env->control_mode = conf.control_mode;
env->goal_behavior = conf.goal_behavior;
Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/ocean/drive/binding.c
Line: 190:199

Comment:
**style:** `goal_radius` assigned twice from `conf.goal_radius` (lines 190 and 199)

```suggestion
    env->goal_radius = conf.goal_radius;
    env->scenario_length = conf.scenario_length;
    env->collision_behavior = conf.collision_behavior;
    env->offroad_behavior = conf.offroad_behavior;
    env->max_controlled_agents = unpack(kwargs, "max_controlled_agents");
    env->dt = conf.dt;
    env->init_mode = conf.init_mode;
    env->control_mode = conf.control_mode;
    env->goal_behavior = conf.goal_behavior;
```

How can I resolve this? If you propose a fix, please make it concise.

@nadarenator nadarenator merged commit 3a95801 into gsp_dev Nov 18, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants