[Auto] Super merge super-merge-example: combined 6 of 10 worktrees by evalstate · Pull Request #40 · evalstate/transformers

evalstate · 2026-04-27T12:32:21Z

Super-merge report: super-merge-example

Base ref: origin/main (7e435bef05ce0f699f56ce58d54fd3320c220a43)
Requested branch: super-merge-example-20260427122842
Worktree: /home/ssmith/source/mergeability-test/.mergeability/super-worktrees/super-merge-example
Note: the checkout initially contained super-merge-example-20260427122839; super-merge-example-20260427122842 did not exist locally, so it was created at the same base HEAD before merging.
Final HEAD: c19b57702c
Final status: clean tracked working tree; merge-report.md is a local mergeability artifact and was not staged.

Source worktrees considered

Worktree	Branch	Decision	Notes
`cluster-41211-3`	`merge-cluster-cluster-41211-3-20260427115403`	Included	Adds DEIMv2 via huggingface#44339. Previous report skipped older duplicate huggingface#41356. Large model/docs/tests/rules change, but the local branch was coherent and merged cleanly.
`cluster-43240-3`	`merge-cluster-cluster-43240-3-20260427115403`	Included	Minimal `fixed_cross_entropy` kwargs fix from huggingface#43254. Previous report skipped duplicate huggingface#43251.
`cluster-43656-4`	`merge-cluster-cluster-43656-4-20260424123400`	Skipped	Empty branch at `origin/main`; no merge report and no diff.
`cluster-43698-3`	`merge-cluster-cluster-43698-3-20260427115403`	Skipped	Empty branch. Previous report found SwanLab issue already fixed on main by huggingface#43719; open PRs were superseded/obsolete.
`cluster-43824-3`	`merge-cluster-cluster-43824-3-20260427115403`	Skipped	Empty branch. Previous report found old `TypeAdapter`/serve changes stale after serving refactor; merge attempts had conflicted and were aborted there.
`cluster-43979-11`	`merge-cluster-cluster-43979-11-20260427115403`	Skipped	Empty branch. Previous report tried multiple broad output-tracing PRs and aborted conflicts; needs maintainer-selected model-by-model ports.
`cluster-44018-2`	`merge-cluster-cluster-44018-2-20260427115403`	Included	GPT-Neo output tracing refactor from huggingface#44018. Previous report skipped duplicate huggingface#44068.
`cluster-45081-3`	`merge-cluster-cluster-45081-3-20260427115403`	Included	Mistral tokenizer regression test from huggingface#45317/huggingface#45086; source fix was already present on base.
`cluster-45520-3`	`merge-cluster-cluster-45520-3-20260427115403`	Included	Flash-attention distribution-map `KeyError` fix from huggingface#45524. Previous report skipped duplicate/riskier huggingface#45650.
`cluster-45561-3`	`merge-cluster-cluster-45561-3-20260427115403`	Included	xdist-safe `captured_info` handling from huggingface#45645. Previous report skipped narrower duplicate huggingface#45639.

Merge order and results

Merged cluster-43240-3 cleanly.
Merged cluster-45520-3 cleanly.
Merged cluster-45081-3 cleanly.
Merged cluster-44018-2 cleanly.
Merged cluster-41211-3 cleanly; src/transformers/loss/loss_utils.py auto-merged, preserving both the new DEIMv2 loss registration and the fixed_cross_entropy kwargs fix.
Merged cluster-45561-3 cleanly after DEIMv2; overlapping upstream-main changes were handled by git without manual resolution.

No super-merge merge/cherry-pick attempts failed, and no manual conflict resolution was required in this worktree.

Current combined diff

git diff --stat origin/main...HEAD: 81 files changed, 8010 insertions, 681 deletions.
Major included areas:
- DEIMv2 model/config/loss/conversion/docs/tests and model registry/rules updates.
- GPT-Neo output-tracing decorator refactor and test update.
- fixed_cross_entropy forwarding for weight and label_smoothing.
- Flash-attention availability helpers using safe distribution-map lookups.
- Mistral tokenizer regression test for fix_mistral_regex=True.
- xdist-safe patched testing debug artifacts plus related CI/notification updates.

Validation

python -m compileall -q $(git diff --name-only origin/main...HEAD | grep "\\.py$" ...) — passed for all changed Python files.
ruff check <changed Python files> — failed with 31 UP038 diagnostics in changed files (tuple-style isinstance checks). These come from the merged branches/current upstream-style churn and were not auto-fixed in this mergeability pass.
python -m pytest --version — failed: local environment has no pytest installed. No targeted pytest suites were run.

Recommended next steps

Review the large combined diff carefully, especially branches (model: Add DEIMv2 to Transformers huggingface/transformers#44339 and Fix xdist collisions for captured_info artifacts and preserve CI debug logs huggingface/transformers#45645) that bring substantial upstream-main churn in addition to their feature/fix commits.
Run repository-owned fix/check tooling in a full dev environment: make style or make fix-repo, then targeted tests for DEIMv2, GPT-Neo, tokenizer auto, testing utils, and import-utils behavior.
Confirm modular/generated model files are consistent after make fix-repo before publishing.
Do not resurrect skipped duplicate/stale PRs unless maintainers explicitly request a different approach.

…tors Migrates GPT-Neo to the standardized output collection interface as part of huggingface#43979. Removes manual output_attentions/output_hidden_states/return_dict handling in favor of hook-based output capturing via decorators. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…moval - Added @merge_with_config_defaults to resolve use_cache from config - Removed output_attentions=True from test_local_attn_probs (attention layer always returns weights, no longer accepts this param) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…izer

The existing test only checks that passing fix_mistral_regex=True doesn't error, but the hub model's config version causes early return so the patching logic is never exercised. This new test creates a local config with an old transformers_version to force the patching code path, verifying that the pre_tokenizer is correctly patched to a Sequence without AttributeError.

…not in the distribution map is_flash_attn_2_available / _3 / _4 / _greater_or_equal do two checks: is_available, _ = _is_package_available("flash_attn", return_version=True) is_available = is_available and "flash-attn" in [ pkg.replace("_", "-") for pkg in PACKAGE_DISTRIBUTION_MAPPING["flash_attn"] ] Step 1 uses importlib.util.find_spec, which returns a spec if any "flash_attn" import is findable (an editable install, a namespace package, a bundled shim, or a stub module under another project). Step 2 then assumes that every findable import name also has an entry in importlib.metadata.packages_distributions(). That assumption does not hold. On Python 3.13 with ComfyUI setups (huggingface#45520), and in any environment where the import is resolvable via a non-pip source, packages_distributions() has no "flash_attn" key. Because the list comprehension is evaluated before the `in` operator, short-circuit evaluation of the outer `and` does not protect us - the KeyError fires during `transformers` import and takes down the whole process before any model is loaded. Swap the four raising subscripts for `.get(name, [])`. If the name is missing from the distribution map we simply conclude that the requested flash-attention flavour is not properly installed - which is the same answer is_flash_attn_*_available() would have returned anyway - instead of raising. The inner helper `_is_package_available` already wraps the same subscript in a try/except, so we are only making the outer call sites match that contract. Fixes huggingface#45520

* qa: bumped mlinter and allow local override * bump version * Update utils/check_modeling_rules_doc.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * license header * license header --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

…#45610) * Fix missing conversion of experts Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * Fix eager config attribute reading Co-authored-by: Copilot <copilot@github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add proper error when kernels isn't installed Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * remove unnecessary mapping Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * review comments Co-authored-by: Copilot <copilot@github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * remove double newline Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Copilot <copilot@github.com>

…uggingface#45601) * fix: compute auxiliary losses when denoising is disabled in D-FINE * style: fix formatting * test: add regression test for auxiliary losses when denoising is disabled * test: fix num_labels config in auxiliary loss regression test --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* remove warnings * fix * revert * revert useless * move function outside

…ing path (huggingface#45582) * generate: drop stale num_return_sequences warning on continuous batching path The continuous-batching branch warned that num_return_sequences was unsupported alongside num_beams, but generate_batch() already honors generation_config.num_return_sequences when expanding requests. The warning fires for any run that explicitly sets num_return_sequences even though the feature works, cluttering logs and misleading users. Drop the num_return_sequences half of the warning; keep the num_beams guard since beam search is still unsupported on the CB path. Fixes huggingface#45563 * Apply repo consistency fixes --------- Co-authored-by: Joaquin Hui Gomez <joaquinhuigomez@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>

* skip * skip

* chore(qa): split pipeline and add type checking * added serving to quality * fmt

allow Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…45631)

* circleci with torch 2.11 * circleci with torch 2.11 * circleci with torch 2.11 * circleci with torch 2.11 * circleci with torch 2.11 * circleci with torch 2.11 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…th `num_labels=1` (huggingface#45611) * Raise clear error for problem_type="single_label_classification" with num_labels=1 This combination is mathematically degenerate: applying cross-entropy loss to a single logit always yields zero loss, so training silently accomplishes nothing. Validate the combination in PreTrainedConfig.__post_init__ so users get a clear error at config construction with a pointer to the correct setup (num_labels=2 for binary classification, or problem_type="regression" for a single-output regression head). Closes huggingface#45479 * Update src/transformers/configuration_utils.py * Update tests/utils/test_configuration_utils.py * Update src/transformers/configuration_utils.py --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

…g logs

…uggingface#45625) Add supports_gradient_checkpointing to NemotronHPreTrainedModel

* Add output language to chunks * Add output language to chunks * Fix formating * Return full language instead of iso code * revert changes (excep test) * correct fix * fix * values for runner --------- Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

…43240-3-20260427115403

…/transformers into merge-cluster-cluster-45081-3-20260427115403

…/transformers into merge-cluster-cluster-45520-3-20260427115403

# Conflicts: # src/transformers/models/gpt_neo/modeling_gpt_neo.py

…r-merge-example-20260427122842

Rocketknight1 and others added 30 commits January 29, 2026 19:00

Add supported kwargs to fixed_cross_entropy

7c722ba

make style

afb3f23

init: Add files (v1)

eaef822

fix: Fix ci/circleci: check_repository_consistency

ddc1bd7

feat: Add support and test harness for all variants

85c7356

fix: Fix ci/circleci: check_repository_consistency

adc4079

Merge branch 'main' into add-deimv2

81a3d06

refactor: Resolve review comments

39d300e

Merge branch 'main' into add-deimv2

476d69f

refactor: Resolve second review round

4ad0dc5

nit: Fix copyright year

16f2d07

Merge branch 'main' into add-deimv2

78eaf93

Merge branch 'main' into add-deimv2

dbe577b

Merge branch 'main' into add-deimv2

1259628

refactor: Resolve third review round

31ee908

fix AttributeError in _patch_mistral_regex for Mistral tokenizer

16cc6d4

revert: Adhere to the pattern from yonigozlan

4a3a877

Merge branch 'main' into add-deimv2

558c2af

nit: Clarify the docstring

ada78bf

refactor: Resolve fourth review round

496ce9c

Merge branch 'main' into add-deimv2

5a12a56

Fix AttributeError in _patch_mistral_regex by removing .backend_token…

d202aca

…izer

Merge branch 'main' into add-deimv2

85b4079

refactor: Closing in on the final set of nits

422a440

Merge branch 'main' into add-deimv2

f932158

fix: Resolve merge conflicts

b833ee3

tarekziade and others added 30 commits April 23, 2026 14:45

fix: Fix loss coupling issue

fb1f387

Merge branch 'main' into add-deimv2

3629f13

Remove unnecessary generate warnings (huggingface#45619)

16f3dde

* remove warnings * fix * revert * revert useless * move function outside

Skip failing offloading tests (huggingface#45624)

a66638d

* skip * skip

chore(qa): split pipeline and add type checking (huggingface#45432)

f0f456b

* chore(qa): split pipeline and add type checking * added serving to quality * fmt

Allow more artifacts to be download in CI (huggingface#45629)

23ca437

allow Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

chore: bump doc-builder SHA for main doc build workflow (huggingface#…

622b8e9

…45631)

CircleCI with torch 2.11 (huggingface#45633)

678e871

* circleci with torch 2.11 * circleci with torch 2.11 * circleci with torch 2.11 * circleci with torch 2.11 * circleci with torch 2.11 * circleci with torch 2.11 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix xdist collisions for captured_info artifacts and preserve CI debu…

47a512b

…g logs

Add supports_gradient_checkpointing to NemotronHPreTrainedModel (h…

ded2b74

…uggingface#45625) Add supports_gradient_checkpointing to NemotronHPreTrainedModel

Merge branch 'main' into main

d94ced8

Merge PR huggingface#44339: model: Add DEIMv2 to Transformers

c81eeec

Merge commit 'refs/mergeability/pr-43254' into merge-cluster-cluster-…

a988888

…43240-3-20260427115403

Merge commit 'refs/pull/45317/head' of https://github.com/huggingface…

6ddb3cd

…/transformers into merge-cluster-cluster-45081-3-20260427115403

Merge commit 'refs/pull/45086/head' of https://github.com/huggingface…

44c68dc

…/transformers into merge-cluster-cluster-45081-3-20260427115403

Merge commit 'refs/pull/45524/head' of https://github.com/huggingface…

4bbd240

…/transformers into merge-cluster-cluster-45520-3-20260427115403

Merge PR huggingface#45645: Fix xdist captured_info collisions

9cb1a72

Merge PR huggingface#44018: Refactor GPT-Neo output tracing

63b8d18

# Conflicts: # src/transformers/models/gpt_neo/modeling_gpt_neo.py

Merge branch 'merge-cluster-cluster-43240-3-20260427115403' into supe…

e28a543

…r-merge-example-20260427122842

Merge branch 'merge-cluster-cluster-45520-3-20260427115403' into supe…

ea4e0fa

…r-merge-example-20260427122842

Merge branch 'merge-cluster-cluster-45081-3-20260427115403' into supe…

b4b8ee0

…r-merge-example-20260427122842

Merge branch 'merge-cluster-cluster-44018-2-20260427115403' into supe…

793d57e

…r-merge-example-20260427122842

Merge branch 'merge-cluster-cluster-41211-3-20260427115403' into supe…

a54f010

…r-merge-example-20260427122842

Merge branch 'merge-cluster-cluster-45561-3-20260427115403' into supe…

c19b577

…r-merge-example-20260427122842

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Auto] Super merge super-merge-example: combined 6 of 10 worktrees#40

[Auto] Super merge super-merge-example: combined 6 of 10 worktrees#40
evalstate wants to merge 71 commits intomainfrom
super-merge-example-20260427123152

evalstate commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

evalstate commented Apr 27, 2026

Super-merge report: super-merge-example

Source worktrees considered

Merge order and results

Current combined diff

Validation

Recommended next steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants