[RLlib] RecSim Interest evolution environment should use custom video sampler: IEvVideoSampler due to only one cluster being used.#22211
Conversation
Docs returned from this sampler contains actual features, instead of utility indices.
|
Waiting for LINT to pass. |
| return iev.IEvUserModel( | ||
| env_ctx["slate_size"], | ||
| choice_model_ctor=choice_model.MultinomialProportionalChoiceModel, | ||
| choice_model_ctor=choice_model.MultinomialLogitChoiceModel, |
There was a problem hiding this comment.
What's the difference and why did we need to change this? In the original RecSim repo, they use:
choice_model.MultinomialProportionalChoiceModel.
There was a problem hiding this comment.
The difference is how to handle negative logits.
MultinomialLogitChoiceModel uses p(x) = exp(x) / Sum_{y in scores} exp(y), while
MultinomialProportionalChoiceModel uses p(x) = (x - min_normalizer) / sum(x - min_normalizer). You need to know the lower bound of your output logits before you can convert everything to be positive.
intuitively, these 2 should work similarly, like the more negative model output is, the less likely it will get clicked on.
IEvVideoSampler.
IEvVideoSampler.IEvVideoSampler due to only one cluster being used.
|
I actually noticed another issue with RecSim. IEvUserModel is hardcoded to use UtilityModelUserSampler: So I don't know if switching to IEvVideoSampler actually works or not ... |
… sampler: `IEvVideoSampler` due to only one cluster being used. (ray-project#22211)
… sampler: `IEvVideoSampler` due to only one cluster being used. (ray-project#22211)
Why are these changes needed?
Interest evolution env should probably use IEV video sampler, instead of the utility model video sampler.
Docs returned from this sampler contains actual features, instead of utility indices.
This may help your slateq runs.
Related issue number
Checks
scripts/format.shto lint the changes in this PR.