Skip to content

Why substract entropy from Q-values? ("min_q_version == 3") #25

@ikamensh

Description

@ikamensh

Hey,

unlike in the paper, implementation has this part with subtracting the action probabilities from Q:

if self.min_q_version == 3:

My guess that the effect would be to have less focus of the loss on a single high-Q action, should policy focus on such. But then we already have temperature parameter. Not sure author will answer, so anybody who knows, I'd appreciate your insights :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions