[Auto] Super merge super-merge-example: combined 6 of 10 worktrees#40
Open
[Auto] Super merge super-merge-example: combined 6 of 10 worktrees#40
Conversation
…tors Migrates GPT-Neo to the standardized output collection interface as part of huggingface#43979. Removes manual output_attentions/output_hidden_states/return_dict handling in favor of hook-based output capturing via decorators. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…moval - Added @merge_with_config_defaults to resolve use_cache from config - Removed output_attentions=True from test_local_attn_probs (attention layer always returns weights, no longer accepts this param) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The existing test only checks that passing fix_mistral_regex=True doesn't error, but the hub model's config version causes early return so the patching logic is never exercised. This new test creates a local config with an old transformers_version to force the patching code path, verifying that the pre_tokenizer is correctly patched to a Sequence without AttributeError.
…not in the distribution map
is_flash_attn_2_available / _3 / _4 / _greater_or_equal do two checks:
is_available, _ = _is_package_available("flash_attn", return_version=True)
is_available = is_available and "flash-attn" in [
pkg.replace("_", "-") for pkg in PACKAGE_DISTRIBUTION_MAPPING["flash_attn"]
]
Step 1 uses importlib.util.find_spec, which returns a spec if any
"flash_attn" import is findable (an editable install, a namespace
package, a bundled shim, or a stub module under another project).
Step 2 then assumes that every findable import name also has an entry
in importlib.metadata.packages_distributions().
That assumption does not hold. On Python 3.13 with ComfyUI setups
(huggingface#45520), and in any environment where the import is resolvable via a
non-pip source, packages_distributions() has no "flash_attn" key.
Because the list comprehension is evaluated before the `in` operator,
short-circuit evaluation of the outer `and` does not protect us - the
KeyError fires during `transformers` import and takes down the whole
process before any model is loaded.
Swap the four raising subscripts for `.get(name, [])`. If the name is
missing from the distribution map we simply conclude that the requested
flash-attention flavour is not properly installed - which is the same
answer is_flash_attn_*_available() would have returned anyway - instead
of raising. The inner helper `_is_package_available` already wraps the
same subscript in a try/except, so we are only making the outer call
sites match that contract.
Fixes huggingface#45520
* qa: bumped mlinter and allow local override * bump version * Update utils/check_modeling_rules_doc.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * license header * license header --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
…#45610) * Fix missing conversion of experts Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * Fix eager config attribute reading Co-authored-by: Copilot <copilot@github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add proper error when kernels isn't installed Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * remove unnecessary mapping Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * review comments Co-authored-by: Copilot <copilot@github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * remove double newline Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Copilot <copilot@github.com>
…uggingface#45601) * fix: compute auxiliary losses when denoising is disabled in D-FINE * style: fix formatting * test: add regression test for auxiliary losses when denoising is disabled * test: fix num_labels config in auxiliary loss regression test --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* remove warnings * fix * revert * revert useless * move function outside
…ing path (huggingface#45582) * generate: drop stale num_return_sequences warning on continuous batching path The continuous-batching branch warned that num_return_sequences was unsupported alongside num_beams, but generate_batch() already honors generation_config.num_return_sequences when expanding requests. The warning fires for any run that explicitly sets num_return_sequences even though the feature works, cluttering logs and misleading users. Drop the num_return_sequences half of the warning; keep the num_beams guard since beam search is still unsupported on the CB path. Fixes huggingface#45563 * Apply repo consistency fixes --------- Co-authored-by: Joaquin Hui Gomez <joaquinhuigomez@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
* skip * skip
* chore(qa): split pipeline and add type checking * added serving to quality * fmt
allow Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* circleci with torch 2.11 * circleci with torch 2.11 * circleci with torch 2.11 * circleci with torch 2.11 * circleci with torch 2.11 * circleci with torch 2.11 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
…th `num_labels=1` (huggingface#45611) * Raise clear error for problem_type="single_label_classification" with num_labels=1 This combination is mathematically degenerate: applying cross-entropy loss to a single logit always yields zero loss, so training silently accomplishes nothing. Validate the combination in PreTrainedConfig.__post_init__ so users get a clear error at config construction with a pointer to the correct setup (num_labels=2 for binary classification, or problem_type="regression" for a single-output regression head). Closes huggingface#45479 * Update src/transformers/configuration_utils.py * Update tests/utils/test_configuration_utils.py * Update src/transformers/configuration_utils.py --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
…uggingface#45625) Add supports_gradient_checkpointing to NemotronHPreTrainedModel
* Add output language to chunks * Add output language to chunks * Fix formating * Return full language instead of iso code * revert changes (excep test) * correct fix * fix * values for runner --------- Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
…43240-3-20260427115403
…/transformers into merge-cluster-cluster-45081-3-20260427115403
…/transformers into merge-cluster-cluster-45081-3-20260427115403
…/transformers into merge-cluster-cluster-45520-3-20260427115403
# Conflicts: # src/transformers/models/gpt_neo/modeling_gpt_neo.py
…r-merge-example-20260427122842
…r-merge-example-20260427122842
…r-merge-example-20260427122842
…r-merge-example-20260427122842
…r-merge-example-20260427122842
…r-merge-example-20260427122842
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Super-merge report: super-merge-example
origin/main(7e435bef05ce0f699f56ce58d54fd3320c220a43)super-merge-example-20260427122842/home/ssmith/source/mergeability-test/.mergeability/super-worktrees/super-merge-examplesuper-merge-example-20260427122839;super-merge-example-20260427122842did not exist locally, so it was created at the same base HEAD before merging.c19b57702cmerge-report.mdis a local mergeability artifact and was not staged.Source worktrees considered
cluster-41211-3merge-cluster-cluster-41211-3-20260427115403cluster-43240-3merge-cluster-cluster-43240-3-20260427115403fixed_cross_entropykwargs fix from huggingface#43254. Previous report skipped duplicate huggingface#43251.cluster-43656-4merge-cluster-cluster-43656-4-20260424123400origin/main; no merge report and no diff.cluster-43698-3merge-cluster-cluster-43698-3-20260427115403cluster-43824-3merge-cluster-cluster-43824-3-20260427115403TypeAdapter/serve changes stale after serving refactor; merge attempts had conflicted and were aborted there.cluster-43979-11merge-cluster-cluster-43979-11-20260427115403cluster-44018-2merge-cluster-cluster-44018-2-20260427115403cluster-45081-3merge-cluster-cluster-45081-3-20260427115403cluster-45520-3merge-cluster-cluster-45520-3-20260427115403KeyErrorfix from huggingface#45524. Previous report skipped duplicate/riskier huggingface#45650.cluster-45561-3merge-cluster-cluster-45561-3-20260427115403captured_infohandling from huggingface#45645. Previous report skipped narrower duplicate huggingface#45639.Merge order and results
cluster-43240-3cleanly.cluster-45520-3cleanly.cluster-45081-3cleanly.cluster-44018-2cleanly.cluster-41211-3cleanly;src/transformers/loss/loss_utils.pyauto-merged, preserving both the new DEIMv2 loss registration and thefixed_cross_entropykwargs fix.cluster-45561-3cleanly after DEIMv2; overlapping upstream-main changes were handled by git without manual resolution.No super-merge merge/cherry-pick attempts failed, and no manual conflict resolution was required in this worktree.
Current combined diff
git diff --stat origin/main...HEAD: 81 files changed, 8010 insertions, 681 deletions.fixed_cross_entropyforwarding forweightandlabel_smoothing.fix_mistral_regex=True.Validation
python -m compileall -q $(git diff --name-only origin/main...HEAD | grep "\\.py$" ...)— passed for all changed Python files.ruff check <changed Python files>— failed with 31UP038diagnostics in changed files (tuple-styleisinstancechecks). These come from the merged branches/current upstream-style churn and were not auto-fixed in this mergeability pass.python -m pytest --version— failed: local environment has nopytestinstalled. No targeted pytest suites were run.Recommended next steps
make styleormake fix-repo, then targeted tests for DEIMv2, GPT-Neo, tokenizer auto, testing utils, and import-utils behavior.make fix-repobefore publishing.