Conversation
Eugene-hu
commented
Aug 25, 2023
- Adds additional checks for empty strings and smaller completions
- Adds a penalty for repeated tokens akin to huggingface's repeat penalty during generation (https://github.com/huggingface/transformers/blob/v4.32.0/src/transformers/generation/logits_process.py#L279)
- The penalty will deter repeated tokens with subsequently heavier penalties each time it occurs.
openvalidators/reward/dpo.py
Outdated
| def __init__(self, device: str): | ||
| super().__init__() | ||
| self.device = device | ||
| self.penalty = 1.2 |
There was a problem hiding this comment.
do we have any data for why we picked 1.2 for penalty?
There was a problem hiding this comment.
It is the same default parameter used by huggingface and was retrieved from this paper (https://arxiv.org/pdf/2305.14314.pdf)
There was a problem hiding this comment.
I think that adding this reference to the code in one comment line could help clarify future doubts
|
|
||
| # Check if completion is | ||
| if completion.strip() == '' or len(completion) <= 5: | ||
| return -11 # exp(-11)=1.67e-5 < 2e-5=1/50257 (typical vocab size) |
There was a problem hiding this comment.
I'm not sure if I got it why is -11, could you please elaborate more so I could better understand it?
There was a problem hiding this comment.
exp(-11) corresponds to base value given to zero or short responses; it is the nearest integer value that is less than equal probability value across all logits (1/50257).
There was a problem hiding this comment.
Would it be feasible to calculate this in runtime by getting something like 1 / model.vocab_size? That way the code will be independent of the model used as it would be calculated dynamically.
There was a problem hiding this comment.
Would it be feasible to calculate this in runtime by getting something like
1 / model.vocab_size? That way the code will be independent of the model used as it would be calculated dynamically.
Yes this can be done in a future update, and will be necessary if the DPO model tokenizer is changed to something non-standard.
There was a problem hiding this comment.
Got it, I will create an issue for that so we don't lose track of this