[RLlib] Cleanup examples folder #14: Add example script for policy (RLModule) inference on new API stack.#45831
Conversation
…nup_examples_folder_14_policy_inference_examples
| policy_id="default_policy", # <- default value | ||
| ) | ||
| # Compute an action using a B=1 observation "batch". | ||
| input_dict = {Columns.OBS: torch.from_numpy(obs).unsqueeze(0)} |
There was a problem hiding this comment.
Should we use here the input specs of the module to infer the keys? Here it’s simple but in other scenarios it could help users to figure out what to feed in.
There was a problem hiding this comment.
I'm not 100% sure. The input specs don't tell us much about what exact data is required for the keys in the specs. I'm honestly thinking about removing the specs altogether at some point (long-term).
For example: If my RLModule - right now - says: I need obs and prev_rewards, then I still don't know for example, how many of the previous rewards are required. This detailed information - crucial for building the batch - is not something my model would tell me, I will have to provide a proper ConnectorV2 logic along with it.
| # Send the computed action `a` to the env. | ||
| obs, reward, done, truncated, _ = env.step(a) | ||
| episode_reward += reward | ||
| obs, reward, terminated, truncated, _ = env.step(action) |
There was a problem hiding this comment.
This is a beautiful example to show how simple this runs now. Let us think about some ways to simplify it also for modules using connectors.
There was a problem hiding this comment.
Awesome! Yeah, I agree. Doing these examples feel very fast and easy. I don't have to do much debugging at all to make these run right from the get-go. There is another PR that does something very similar, but with a connector (that handles the LSTM states).
…r policy (RLModule) inference on new API stack. (ray-project#45831) Signed-off-by: Richard Liu <ricliu@google.com>
Add example script for policy (RLModule) inference on new API stack.
Why are these changes needed?
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.