Fix(43240): pass kwargs to nn.functional.cross_entropy by jasiecky · Pull Request #43251 · huggingface/transformers

jasiecky · 2026-01-13T11:38:16Z

What does this PR do?

The problem to be solved is the issue #43240. This PR implements passing weight and label_smoothing parameters of nn.functional.cross_entropy in fixed_cross_entropy function.

Fixes #43240

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@iamsernine @ArthurZucker @stas00 @cyyever

stas00

I have a hard time matching the description of this PR to the proposed change. It looks like you want to pass additional kwargs which at the moment are dropped by this wrapper

looking at https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html
it's weight and label_smoothing - is that what you're trying to pass?

And of course you need a test to support your PR, which would also self-document what you're trying to accomplish.

ensures consistent loss scaling by controlling the reduction mode when num_items_in_batch is provided

This is already done. Look at the first line of the function.

stas00 · 2026-01-14T17:41:55Z

Thank you for adding the tests. I still don't understand what you're trying to solve with this PR.

Your PR description:

This PR adds validation for keyword arguments passed to cross_entropy and ensures consistent loss scaling by controlling the reduction mode when num_items_in_batch is provided.

The 2nd part is invalid, the pre-PR code already does that.

For the first part, what kwargs do you need to pass for your workload? Your tests exercise the 2 keys that have been ignored, but do you actually need them? It's been a long time since this function was added - perhaps fixed_ implies that it does a limit scope of things. I'm not sure. The trainer for example performs label_smoothing here

transformers/src/transformers/trainer.py

Lines 708 to 709 in 0a5d574

    
           if self.args.label_smoothing_factor != 0: 
        
               self.label_smoother = LabelSmoother(epsilon=self.args.label_smoothing_factor)

stas00 · 2026-01-14T18:06:05Z

Thinking more about it and stepping away from this particular PR, one of the remaining issues in the HF Transformers API is that some keys in kwargs are silently dropped in some of the APIs. For example if you call from_config and set config.attn_implementation it will be silently ignored - it should assert and tell the user to pass attn_implementation as its own kwargs key in from_config and not part of the config object.

So what I suggest is that if you don't have a particular problem to solve in this PR, it can be made useful by asserting when unexpected kwargs are passed - those must not be silently dropped. The user needs to know when they are not using the API correctly.

The decision of whether fixed_cross_entropy should support weight and label_smoothing kwargs I will leave to the current maintainers.

jasiecky · 2026-01-15T10:50:41Z

Thinking more about it and stepping away from this particular PR, one of the remaining issues in the HF Transformers API is that some keys in kwargs are silently dropped in some of the APIs. For example if you call from_config and set config.attn_implementation it will be silently ignored - it should assert and tell the user to pass attn_implementation as its own kwargs key in from_config and not part of the config object.

So what I suggest is that if you don't have a particular problem to solve in this PR, it can be made useful by asserting when unexpected kwargs are passed - those must not be silently dropped. The user needs to know when they are not using the API correctly.

The decision of whether fixed_cross_entropy should support weight and label_smoothing kwargs I will leave to the current maintainers.

The problem to be solved is the issue 43240. The thing is that currently we are not able to pass kwargs into nn.functional.cross_entropy so usage of weight and label_smoothing is impossible. If you think it would be a better solution I might change kwargs to these two parameters and pass them into the mentioned function.

stas00 · 2026-01-15T18:04:35Z

That's helpful. Then that should be the description of the PR.

And you have a competitor here: #43254

…into fix/43242

jasiecky · 2026-01-16T14:27:29Z

That's helpful. Then that should be the description of the PR.

And you have a competitor here: #43254

I updated the code and the description, ready for review.

stas00

LGTM

…into fix/43242

stas00 · 2026-01-16T17:34:30Z

    weight: torch.Tensor | None = None,
-    **kwargs,
+    label_smoothing: float = 0.0,
+    **_kwargs,


huh? _?

I'd say remove it altogether, since it's being silently ignored and that's bad for the caller.

Do you mean to remove kwargs? You accepted the code containing them;) If we don't use them the function isn't compatible with some parts of the repo so I changed it to _kwargs in order to explicitly show that kwargs're ignored.

I have no idea how renaming to _kwargs implies that it is ignored. When something is ignored it shouldn't be there.

As I shared earlier my opinion is that if **kwargs is in the API, they should be introspected and any unexpected keys should be asserted on. **kwargs are useful when a function is an intermediary and passes it on. In this case kwargs aren't passed on and thus shouldn't be there.

You accepted the code containing them;)

I'm not a current maintainer so my vote isn't binding. You want to engage current maintainers instead.

It's a naming convention, it doesn't imply anything indeed. Let's wait for the mainteners;)

@iamsernine @ArthurZucker @cyyever

github-actions · 2026-01-19T17:09:29Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43251&sha=6e287b

iamsernine

didn't know why i'm being mentionned here but lgtm

jasiecky added 2 commits January 13, 2026 12:31

added validating kwargs passed to nn.functional.cross_entropy

084eb5c

rollback

51ad984

jasiecky changed the title ~~Fix(43242): pass kwargs to nn.functional.cross_entropy~~ Fix(43240): pass kwargs to nn.functional.cross_entropy Jan 13, 2026

jasiecky added 3 commits January 13, 2026 13:44

removed not allowed kwargs

763fabd

moved to inspect

5225729

added allowed_kwargs variable

ee43e8f

stas00 reviewed Jan 13, 2026

View reviewed changes

Comment thread src/transformers/loss/loss_utils.py Outdated

jasiecky added 2 commits January 14, 2026 09:43

added tests

aa8d0ac

reduplicated code

3f7f007

jasiecky force-pushed the fix/43242 branch from d15c170 to 3f7f007 Compare January 14, 2026 08:46

Merge branch 'main' into fix/43242

29eb45b

Merge branch 'main' into fix/43242

660f71e

Merge branch 'main' into fix/43242

f6c526f

stas00 mentioned this pull request Jan 15, 2026

Add supported kwargs to fixed_cross_entropy #43254

Open

jasiecky and others added 4 commits January 16, 2026 15:16

Merge branch 'main' into fix/43242

57f3778

added only supported parameters

0203462

Merge branch 'fix/43242' of https://github.com/jasiecky/transformers …

b022749

…into fix/43242

removed unused imports

7b1e6af

jasiecky and others added 3 commits January 16, 2026 16:00

changed label_smoothing to float

fceddcf

added kwargs

699ff0d

Merge branch 'main' into fix/43242

05fb96e

stas00 approved these changes Jan 16, 2026

View reviewed changes

jasiecky and others added 3 commits January 16, 2026 18:10

Merge branch 'main' into fix/43242

d3838c9

changed to _kwargs

294d851

Merge branch 'fix/43242' of https://github.com/jasiecky/transformers …

f651994

…into fix/43242

stas00 reviewed Jan 16, 2026

View reviewed changes

Merge branch 'main' into fix/43242

6e287b1

iamsernine approved these changes Feb 1, 2026

View reviewed changes

Merge branch 'main' into fix/43242

2395023

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Conversation

jasiecky commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

stas00 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stas00 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stas00 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasiecky commented Jan 15, 2026

Uh oh!

stas00 commented Jan 15, 2026

Uh oh!

jasiecky commented Jan 16, 2026

Uh oh!

stas00 left a comment

Choose a reason for hiding this comment

Uh oh!

stas00 Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasiecky Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

stas00 Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

jasiecky Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jan 19, 2026

Uh oh!

iamsernine left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jasiecky commented Jan 13, 2026 •

edited

Loading

stas00 left a comment •

edited

Loading

stas00 commented Jan 14, 2026 •

edited

Loading

stas00 commented Jan 14, 2026 •

edited

Loading

stas00 Jan 16, 2026 •

edited

Loading