Skip to content

[Auto] Super merge super-merge-example: combined 6 of 10 worktrees#40

Open
evalstate wants to merge 71 commits intomainfrom
super-merge-example-20260427123152
Open

[Auto] Super merge super-merge-example: combined 6 of 10 worktrees#40
evalstate wants to merge 71 commits intomainfrom
super-merge-example-20260427123152

Conversation

@evalstate
Copy link
Copy Markdown
Owner

Super-merge report: super-merge-example

  • Base ref: origin/main (7e435bef05ce0f699f56ce58d54fd3320c220a43)
  • Requested branch: super-merge-example-20260427122842
  • Worktree: /home/ssmith/source/mergeability-test/.mergeability/super-worktrees/super-merge-example
  • Note: the checkout initially contained super-merge-example-20260427122839; super-merge-example-20260427122842 did not exist locally, so it was created at the same base HEAD before merging.
  • Final HEAD: c19b57702c
  • Final status: clean tracked working tree; merge-report.md is a local mergeability artifact and was not staged.

Source worktrees considered

Worktree Branch Decision Notes
cluster-41211-3 merge-cluster-cluster-41211-3-20260427115403 Included Adds DEIMv2 via huggingface#44339. Previous report skipped older duplicate huggingface#41356. Large model/docs/tests/rules change, but the local branch was coherent and merged cleanly.
cluster-43240-3 merge-cluster-cluster-43240-3-20260427115403 Included Minimal fixed_cross_entropy kwargs fix from huggingface#43254. Previous report skipped duplicate huggingface#43251.
cluster-43656-4 merge-cluster-cluster-43656-4-20260424123400 Skipped Empty branch at origin/main; no merge report and no diff.
cluster-43698-3 merge-cluster-cluster-43698-3-20260427115403 Skipped Empty branch. Previous report found SwanLab issue already fixed on main by huggingface#43719; open PRs were superseded/obsolete.
cluster-43824-3 merge-cluster-cluster-43824-3-20260427115403 Skipped Empty branch. Previous report found old TypeAdapter/serve changes stale after serving refactor; merge attempts had conflicted and were aborted there.
cluster-43979-11 merge-cluster-cluster-43979-11-20260427115403 Skipped Empty branch. Previous report tried multiple broad output-tracing PRs and aborted conflicts; needs maintainer-selected model-by-model ports.
cluster-44018-2 merge-cluster-cluster-44018-2-20260427115403 Included GPT-Neo output tracing refactor from huggingface#44018. Previous report skipped duplicate huggingface#44068.
cluster-45081-3 merge-cluster-cluster-45081-3-20260427115403 Included Mistral tokenizer regression test from huggingface#45317/huggingface#45086; source fix was already present on base.
cluster-45520-3 merge-cluster-cluster-45520-3-20260427115403 Included Flash-attention distribution-map KeyError fix from huggingface#45524. Previous report skipped duplicate/riskier huggingface#45650.
cluster-45561-3 merge-cluster-cluster-45561-3-20260427115403 Included xdist-safe captured_info handling from huggingface#45645. Previous report skipped narrower duplicate huggingface#45639.

Merge order and results

  1. Merged cluster-43240-3 cleanly.
  2. Merged cluster-45520-3 cleanly.
  3. Merged cluster-45081-3 cleanly.
  4. Merged cluster-44018-2 cleanly.
  5. Merged cluster-41211-3 cleanly; src/transformers/loss/loss_utils.py auto-merged, preserving both the new DEIMv2 loss registration and the fixed_cross_entropy kwargs fix.
  6. Merged cluster-45561-3 cleanly after DEIMv2; overlapping upstream-main changes were handled by git without manual resolution.

No super-merge merge/cherry-pick attempts failed, and no manual conflict resolution was required in this worktree.

Current combined diff

  • git diff --stat origin/main...HEAD: 81 files changed, 8010 insertions, 681 deletions.
  • Major included areas:
    • DEIMv2 model/config/loss/conversion/docs/tests and model registry/rules updates.
    • GPT-Neo output-tracing decorator refactor and test update.
    • fixed_cross_entropy forwarding for weight and label_smoothing.
    • Flash-attention availability helpers using safe distribution-map lookups.
    • Mistral tokenizer regression test for fix_mistral_regex=True.
    • xdist-safe patched testing debug artifacts plus related CI/notification updates.

Validation

  • python -m compileall -q $(git diff --name-only origin/main...HEAD | grep "\\.py$" ...) — passed for all changed Python files.
  • ruff check <changed Python files> — failed with 31 UP038 diagnostics in changed files (tuple-style isinstance checks). These come from the merged branches/current upstream-style churn and were not auto-fixed in this mergeability pass.
  • python -m pytest --version — failed: local environment has no pytest installed. No targeted pytest suites were run.

Recommended next steps

Rocketknight1 and others added 30 commits January 29, 2026 19:00
…tors

Migrates GPT-Neo to the standardized output collection interface as part of huggingface#43979.
Removes manual output_attentions/output_hidden_states/return_dict handling
in favor of hook-based output capturing via decorators.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…moval

- Added @merge_with_config_defaults to resolve use_cache from config
- Removed output_attentions=True from test_local_attn_probs (attention
  layer always returns weights, no longer accepts this param)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The existing test only checks that passing fix_mistral_regex=True doesn't
error, but the hub model's config version causes early return so the
patching logic is never exercised. This new test creates a local config
with an old transformers_version to force the patching code path, verifying
that the pre_tokenizer is correctly patched to a Sequence without
AttributeError.
…not in the distribution map

is_flash_attn_2_available / _3 / _4 / _greater_or_equal do two checks:

  is_available, _ = _is_package_available("flash_attn", return_version=True)
  is_available = is_available and "flash-attn" in [
      pkg.replace("_", "-") for pkg in PACKAGE_DISTRIBUTION_MAPPING["flash_attn"]
  ]

Step 1 uses importlib.util.find_spec, which returns a spec if any
"flash_attn" import is findable (an editable install, a namespace
package, a bundled shim, or a stub module under another project).
Step 2 then assumes that every findable import name also has an entry
in importlib.metadata.packages_distributions().

That assumption does not hold. On Python 3.13 with ComfyUI setups
(huggingface#45520), and in any environment where the import is resolvable via a
non-pip source, packages_distributions() has no "flash_attn" key.
Because the list comprehension is evaluated before the `in` operator,
short-circuit evaluation of the outer `and` does not protect us - the
KeyError fires during `transformers` import and takes down the whole
process before any model is loaded.

Swap the four raising subscripts for `.get(name, [])`. If the name is
missing from the distribution map we simply conclude that the requested
flash-attention flavour is not properly installed - which is the same
answer is_flash_attn_*_available() would have returned anyway - instead
of raising. The inner helper `_is_package_available` already wraps the
same subscript in a try/except, so we are only making the outer call
sites match that contract.

Fixes huggingface#45520
tarekziade and others added 30 commits April 23, 2026 14:45
* qa: bumped mlinter and allow local override

* bump version

* Update utils/check_modeling_rules_doc.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* license header

* license header

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
…#45610)

* Fix missing conversion of experts

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Fix eager config attribute reading

Co-authored-by: Copilot <copilot@github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add proper error when kernels isn't installed

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* remove unnecessary mapping

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* review comments

Co-authored-by: Copilot <copilot@github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* remove double newline

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

---------

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Copilot <copilot@github.com>
…uggingface#45601)

* fix: compute auxiliary losses when denoising is disabled in D-FINE

* style: fix formatting

* test: add regression test for auxiliary losses when denoising is disabled

* test: fix num_labels config in auxiliary loss regression test

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* remove warnings

* fix

* revert

* revert useless

* move function outside
…ing path (huggingface#45582)

* generate: drop stale num_return_sequences warning on continuous batching path

The continuous-batching branch warned that num_return_sequences was
unsupported alongside num_beams, but generate_batch() already honors
generation_config.num_return_sequences when expanding requests.  The
warning fires for any run that explicitly sets num_return_sequences
even though the feature works, cluttering logs and misleading users.

Drop the num_return_sequences half of the warning; keep the num_beams
guard since beam search is still unsupported on the CB path.

Fixes huggingface#45563

* Apply repo consistency fixes

---------

Co-authored-by: Joaquin Hui Gomez <joaquinhuigomez@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
* chore(qa): split pipeline and add type checking

* added serving to quality

* fmt
allow

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* circleci with torch 2.11

* circleci with torch 2.11

* circleci with torch 2.11

* circleci with torch 2.11

* circleci with torch 2.11

* circleci with torch 2.11

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
…th `num_labels=1` (huggingface#45611)

* Raise clear error for problem_type="single_label_classification" with num_labels=1

This combination is mathematically degenerate: applying cross-entropy loss to a
single logit always yields zero loss, so training silently accomplishes nothing.
Validate the combination in PreTrainedConfig.__post_init__ so users get a clear
error at config construction with a pointer to the correct setup (num_labels=2
for binary classification, or problem_type="regression" for a single-output
regression head).

Closes huggingface#45479

* Update src/transformers/configuration_utils.py

* Update tests/utils/test_configuration_utils.py

* Update src/transformers/configuration_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
…uggingface#45625)

Add supports_gradient_checkpointing to NemotronHPreTrainedModel
* Add output language to chunks

* Add output language to chunks

* Fix formating

* Return full language instead of iso code

* revert changes (excep test)

* correct fix

* fix

* values for runner

---------

Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
# Conflicts:
#	src/transformers/models/gpt_neo/modeling_gpt_neo.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.