[trainer] fix the GA model_accepts_loss_kwargs#34915
Conversation
muellerzr
left a comment
There was a problem hiding this comment.
TIL! However, as you can see by the failing test, this doesn't always work 😅 (If we can get it to that's great, I think that's originally why I went with explicit rather than implicit)
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
I think this pr introduced a new bug that if user use user defined loss funcion and |
|
|
|
I understand now, I misinterpreted the if condition earlier. However, in this PR, when model_accepts_loss_kwargs is True, it won't pass the num_items_in_batch parameter, which would make the GA loss modification functionality ineffective. Why is that? |
|
In newest code, run will cause error: |
|
Ah shit the if condition is reversed |
|
Opened a PR for a fix, thanks! |
What does this PR do?
Fixes #34577
model_accepts_loss_kwargswas wrongly looking at kwarg names, while you should only need kwargs (since the name can vary for FlashAttentionKwargs, LossKwargs etc)