Why substract entropy from Q-values? ("min_q_version == 3")

Hey,

unlike in the paper, implementation has this part with subtracting the action probabilities from Q: https://github.com/aviralkumar2907/CQL/blob/d67dbe9cf5d2b96e3b462b6146f249b3d6569796/d4rl/rlkit/torch/sac/cql.py#L253

My guess that the effect would be to have less focus of the loss on a single high-Q action, should policy focus on such. But then we already have temperature parameter. Not sure author will answer, so anybody who knows, I'd appreciate your insights :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why substract entropy from Q-values? ("min_q_version == 3") #25

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Why substract entropy from Q-values? ("min_q_version == 3") #25

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions