Skip to content

Improve SARSA in FREE#643

Closed
alexnikulkov wants to merge 1 commit intofacebookresearch:mainfrom
alexnikulkov:export-D36360500
Closed

Improve SARSA in FREE#643
alexnikulkov wants to merge 1 commit intofacebookresearch:mainfrom
alexnikulkov:export-D36360500

Conversation

@alexnikulkov
Copy link
Contributor

Summary:

  1. Add new sections to YAML for model and optimizer configs
  2. Add support for weights in Parametric DQN input
  3. Expose FC hidden layer dims in config
  4. Sort data in the batch by separable_id, timestamp, position.
  5. Zero-out the weight for observations for which we don't know the next state ("terminal", but they are actually not terminal, we just don't know their next state), the time_diff is negative or the position feature is missing, preventing us from sorting properly.
  6. Read and pass in the batch time gap to next state
  7. Clip reward (paced bid)

Differential Revision: D36360500

Summary:
1. Add new sections to YAML for model and optimizer configs
2. Add support for weights in Parametric DQN input
3. Expose FC hidden layer dims in config
4. Sort data in the batch by separable_id, timestamp, position.
5. Zero-out the weight for observations for which we don't know the next state ("terminal", but they are actually not terminal, we just don't know their next state), the time_diff is negative or the position feature is missing, preventing us from sorting properly.
6. Read and pass in the batch time gap to next state
7. Clip reward (paced bid)

Differential Revision: D36360500

fbshipit-source-id: 9ed29c367627753e8f801ee90a6d0042bac006dd
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D36360500

@codecov-commenter
Copy link

Codecov Report

Merging #643 (dc806ee) into main (deb9c67) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #643      +/-   ##
==========================================
- Coverage   87.02%   87.02%   -0.01%     
==========================================
  Files         354      354              
  Lines       22442    22443       +1     
  Branches       44       44              
==========================================
  Hits        19531    19531              
- Misses       2885     2886       +1     
  Partials       26       26              
Impacted Files Coverage Δ
reagent/core/types.py 86.70% <100.00%> (+0.02%) ⬆️
reagent/mab/ucb.py 86.84% <0.00%> (-2.64%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update deb9c67...dc806ee. Read the comment docs.

xuruiyang pushed a commit that referenced this pull request Sep 20, 2025
Summary:
Pull Request resolved: #643

1. Add new sections to YAML for model and optimizer configs
2. Add support for weights in Parametric DQN input
3. Expose FC hidden layer dims in config
4. Sort data in the batch by separable_id, timestamp, position.
5. Zero-out the weight for observations for which we don't know the next state ("terminal", but they are actually not terminal, we just don't know their next state), the time_diff is negative or the position feature is missing, preventing us from sorting properly.
6. Read and pass in the batch time gap to next state
7. Clip reward (paced bid)

To launch MC LTV training:
- local run: `starlight app run -j 1 free.reagent.train_ltv:train`
- submit to MAST: `starlight app submit reagent/submit_config.py:get_config_ltv`

To launch SARSA LTV training:
- local run: `starlight app run -j 1 free.reagent.train_ltv:train_sarsa`
- submit to MAST: `starlight app submit reagent/submit_config.py:get_config_ltv -- --model_type SARSA`

Reviewed By: czxttkl

Differential Revision: D36360500

fbshipit-source-id: c07f0b2ea297844970389b2059a7c42d63d16a8d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants