Guidance Reward #121

nadarenator · 2025-11-07T20:16:20Z

No description provided.

Copilot

Pull Request Overview

This PR implements a guided autonomy reward system that incentivizes agents to follow reference trajectories by tracking waypoint progress and penalizing deviations in speed and heading.

Key Changes:

Added four new configuration parameters: reward_guided_autonomy (master weight), guidance_speed_weight, guidance_heading_weight, and waypoint_reach_threshold
Implemented route progress tracking with waypoint hit detection and exponential penalty functions for speed and heading deviations
Integrated the guided autonomy reward computation into the main environment step function

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
pufferlib/ocean/env_config.h	Added struct fields and config parser handlers for the four new guided autonomy parameters
pufferlib/ocean/drive/drive.py	Propagated the new parameters through Python environment initialization and step methods
pufferlib/ocean/drive/drive.h	Implemented waypoint tracking fields in Entity struct, guidance reward computation functions, and integration into the main step loop
pufferlib/ocean/drive/binding.c	Added parameter assignments from config to environment struct
pufferlib/config/ocean/drive.ini	Added default configuration values for guided autonomy parameters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-10T17:23:27Z

pufferlib/ocean/drive/drive.h

+    float speed_error_sq = speed_error * speed_error;
+
+    // Exponential penalty: 1.0 - exp(-error²)
+    float penalty = 1.0f - expf(-speed_error_sq + 1e-8f);


The exponential term should be expf(-speed_error_sq) without adding 1e-8f inside the exponent. Adding epsilon inside changes the mathematical behavior from a numerical stability constant to an incorrect offset. If numerical stability is needed, add epsilon to the squared error before negation: expf(-(speed_error_sq + 1e-8f)).

Copilot · 2025-11-10T17:23:27Z

pufferlib/ocean/drive/drive.h

+    float heading_error_sq = heading_error * heading_error;
+
+    // Exponential penalty: 1.0 - exp(-error²)
+    float penalty = 1.0f - expf(-heading_error_sq + 1e-8f);


The exponential term should be expf(-heading_error_sq) without adding 1e-8f inside the exponent. Adding epsilon inside changes the mathematical behavior from a numerical stability constant to an incorrect offset. If numerical stability is needed, add epsilon to the squared error before negation: expf(-(heading_error_sq + 1e-8f)).

Suggested change

float penalty = 1.0f - expf(-heading_error_sq + 1e-8f);

float penalty = 1.0f - expf(-(heading_error_sq + 1e-8f));

greptile-apps · 2025-11-18T20:26:35Z

Greptile Summary

Added guided autonomy reward system that tracks waypoint progression, speed matching, and heading alignment against reference trajectories
Modified sweep configuration to focus on tuning guidance reward weights instead of human demonstration parameters
Increased reward_offroad_collision penalty from -0.2 to -0.5

Confidence Score: 4/5

This PR is safe to merge with minor style improvements recommended
Implementation is sound with proper memory management and parameter propagation, but contains one redundant assignment that should be cleaned up
Review pufferlib/ocean/drive/binding.c for the duplicate assignment

Important Files Changed

Filename	Overview
pufferlib/ocean/drive/binding.c	Initialized guided autonomy fields from config; contains duplicate `goal_radius` assignment on line 199
pufferlib/ocean/drive/drive.h	Implemented guided autonomy reward computation with waypoint tracking, speed matching, and heading alignment

Sequence Diagram

sequenceDiagram
    participant User
    participant Python as Drive.py
    participant Config as env_config.h
    participant Binding as binding.c
    participant Core as drive.h
    
    User->>Python: Initialize environment with guidance params
    Python->>Config: Parse drive.ini config
    Config-->>Python: Return config with guidance settings
    Python->>Binding: Create environment via my_init()
    Binding->>Config: Read guidance_speed_weight, guidance_heading_weight, waypoint_reach_threshold
    Binding->>Core: Initialize Drive env with guided autonomy params
    Core->>Core: Allocate waypoints_hit arrays for entities
    Core-->>Binding: Environment initialized
    Binding-->>Python: Environment ready
    
    User->>Python: Call step(action)
    Python->>Binding: Execute c_step()
    Binding->>Core: move_dynamics() - Update agent position
    Binding->>Core: compute_guided_autonomy_reward()
    Core->>Core: compute_route_guidance_reward() - Check waypoint proximity
    Core->>Core: compute_speed_guidance_reward() - Calculate speed deviation
    Core->>Core: compute_heading_guidance_reward() - Calculate heading error
    Core-->>Binding: Return total guidance reward
    Binding->>Binding: Add guidance reward to env->rewards[i]
    Binding-->>Python: Return obs, rewards, dones
    Python-->>User: Step result with guidance rewards

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}
_{React with 👍 or 👎 to share your feedback on this new summary format}

greptile-apps · 2025-11-18T20:26:34Z

pufferlib/ocean/drive/binding.c

+    env->goal_radius = conf.goal_radius;
    env->scenario_length = conf.scenario_length;
    env->collision_behavior = conf.collision_behavior;
    env->offroad_behavior = conf.offroad_behavior;
    env->max_controlled_agents = unpack(kwargs, "max_controlled_agents");
    env->dt = conf.dt;
-    env->init_mode = (int)unpack(kwargs, "init_mode");
-    env->control_mode = (int)unpack(kwargs, "control_mode");
-    env->goal_behavior = (int)unpack(kwargs, "goal_behavior");
-    env->goal_radius = (float)unpack(kwargs, "goal_radius");
+    env->init_mode = conf.init_mode;
+    env->control_mode = conf.control_mode;
+    env->goal_behavior = conf.goal_behavior;
+    env->goal_radius = conf.goal_radius;


style: goal_radius assigned twice from conf.goal_radius (lines 190 and 199)

Suggested change

env->goal_radius = conf.goal_radius;

env->scenario_length = conf.scenario_length;

env->collision_behavior = conf.collision_behavior;

env->offroad_behavior = conf.offroad_behavior;

env->max_controlled_agents = unpack(kwargs, "max_controlled_agents");

env->dt = conf.dt;

env->init_mode = (int)unpack(kwargs, "init_mode");

env->control_mode = (int)unpack(kwargs, "control_mode");

env->goal_behavior = (int)unpack(kwargs, "goal_behavior");

env->goal_radius = (float)unpack(kwargs, "goal_radius");

env->init_mode = conf.init_mode;

env->control_mode = conf.control_mode;

env->goal_behavior = conf.goal_behavior;

env->goal_radius = conf.goal_radius;

env->goal_radius = conf.goal_radius;

env->scenario_length = conf.scenario_length;

env->collision_behavior = conf.collision_behavior;

env->offroad_behavior = conf.offroad_behavior;

env->max_controlled_agents = unpack(kwargs, "max_controlled_agents");

env->dt = conf.dt;

env->init_mode = conf.init_mode;

env->control_mode = conf.control_mode;

env->goal_behavior = conf.goal_behavior;

Prompt To Fix With AI

This is a comment left during a code review. Path: pufferlib/ocean/drive/binding.c Line: 190:199 Comment: **style:** `goal_radius` assigned twice from `conf.goal_radius` (lines 190 and 199) ```suggestion env->goal_radius = conf.goal_radius; env->scenario_length = conf.scenario_length; env->collision_behavior = conf.collision_behavior; env->offroad_behavior = conf.offroad_behavior; env->max_controlled_agents = unpack(kwargs, "max_controlled_agents"); env->dt = conf.dt; env->init_mode = conf.init_mode; env->control_mode = conf.control_mode; env->goal_behavior = conf.goal_behavior; ``` How can I resolve this? If you propose a fix, please make it concise.

WIP guidance reward

698eb4d

eugenevinitsky requested a review from Copilot November 10, 2025 17:22

Copilot AI reviewed Nov 10, 2025

View reviewed changes

daphne-cornelisse changed the base branch from main to gsp_dev November 11, 2025 23:27

nadarenator added 8 commits November 14, 2025 15:22

remove irrelevant const

d266d2f

heading fix

10cb5d0

minor refactor

19a5260

Merge branch 'gsp_dev' into kj/guidance_reward

41f2bdc

fix normalize function defn

edc94f7

removed master weight

16e5cc0

guidance reward sweeps

4bf7376

Merge branch 'gsp_dev' into kj/guidance_reward

5b1dbda

nadarenator marked this pull request as ready for review November 18, 2025 20:23

greptile-apps bot reviewed Nov 18, 2025

View reviewed changes

nadarenator merged commit 3a95801 into gsp_dev Nov 18, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance Reward #121

Guidance Reward #121

Uh oh!

nadarenator commented Nov 7, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

greptile-apps bot commented Nov 18, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	float penalty = 1.0f - expf(-heading_error_sq + 1e-8f);
	float penalty = 1.0f - expf(-(heading_error_sq + 1e-8f));

Guidance Reward #121

Guidance Reward #121

Uh oh!

Conversation

nadarenator commented Nov 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Nov 18, 2025

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants