Skip to content

Wrong generation length for HFPolicy with stop strings #499

@KiddoZhu

Description

@KiddoZhu

When custom stop strings are used in HFPolicy, the generation length computed by fsdp1_policy_worker.py#L716-L721 always treats the first padded EOS token as a generated EOS token. For a batch of two samples

[token1, stop token, padded EOS, padded EOS]
[token2, token3, token4, token5]

The current implementation will compute generation lengths as 3 and 4 respectively. The ground truths are 2 and 4 respectively.

This issue mainly affects run_multi_turn_rollout with HFPolicy and batch size > 1. Some potential fixes I can think of:

  • Using different tokens for padding and EOS in HFPolicy.
  • Computing the generation lengths based on non-zero logprobs (rejected by @SahilJain314 ).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions