Fix flaky SAM-HQ integration tests by adding set_seed by raimbekovm · Pull Request #43133 · huggingface/transformers

raimbekovm · 2026-01-06T15:56:02Z

What does this PR do?

Fixes flaky SamHQModelIntegrationTest by adding set_seed(0) in setUp().

The root cause: HF SAM-HQ creates two independent SamHQPositionalEmbedding instances (shared_image_embedding and prompt_encoder.shared_embedding), but the checkpoint only loads weights to shared_image_embedding. The other one stays randomly initialized, causing non-deterministic test results.

Note: This is a workaround. The proper fix would be to share the embedding between both (like the original SAM-HQ) or copy weights to both during loading. Happy to implement that instead if preferred.

Fixes #42890

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. tests/models/sam_hq/test_modeling_sam_hq.py::SamHQModelIntegrationTest may fail since a lot of cases are lack of set_seed() #42890
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@NielsRogge @molbap @yonigozlan

Rocketknight1 · 2026-01-12T14:51:08Z

_keys_to_ignore_on_load_missing will only suppress the warning if the key is missing from the checkpoint, it shouldn't change the loading behaviour. Is the checkpoint missing that tensor for some reason?

raimbekovm · 2026-01-12T15:46:12Z

You're right. I investigated the architecture:

The original SAM-HQ has one shared pe_layer.positional_encoding_gaussian_matrix. But the HF model creates two independent SamHQPositionalEmbedding instances:

shared_image_embedding (SamHQModel.init)
prompt_encoder.shared_embedding (SamHQPromptEncoder.init)

During conversion, the tensor is mapped only to shared_image_embedding.positional_embedding. The prompt_encoder.shared_embedding.positional_embedding remains randomly initialized.

The proper fix would be to either share the embedding between both (like the original) or copy the tensor to both locations during checkpoint loading. The set_seed approach is a workaround that masks this architectural mismatch.

Should I update the PR to fix the root cause instead?

Rocketknight1 · 2026-01-20T13:51:54Z

Yes, I think so! At the very least a PR that did that would give us a good starting point to check what it fixes and if there are any regressions

Share positional_embedding between shared_image_embedding and prompt_encoder.shared_embedding using _tied_weights_keys mechanism, matching the original SAM-HQ architecture where pe_layer is shared.

github-actions · 2026-01-20T15:32:59Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: sam_hq

raimbekovm · 2026-01-20T15:34:38Z

Implemented the proper fix mentioned in the PR description.

What changed:

Share positional_embedding between shared_image_embedding and prompt_encoder.shared_embedding (matching original SAM-HQ architecture)
Use _tied_weights_keys to handle weight loading correctly
Changed buffer to nn.Parameter(requires_grad=False) because mark_tied_weights_as_initialized() only works with parameters

Why this approach:
Tried several alternatives first:

_register_load_state_dict_pre_hook / register_load_state_dict_post_hook — don't work with safetensors/accelerate loading
_load_from_state_dict override — not supported by modular converter

The _tied_weights_keys mechanism works but requires get_expanded_tied_weights_keys override since the default implementation checks tie_word_embeddings (designed for language models).

Tests:

Embeddings are now properly shared (same object)
Weights load correctly from checkpoint
Deterministic across multiple loads
Forward pass works correctly

The set_seed workaround from the first commit is no longer needed but I left it as a safety net.

Rocketknight1 · 2026-01-21T15:16:11Z

cc @NielsRogge @molbap @yonigozlan for review, since I'm not the expert on SAM models!

yonigozlan · 2026-01-22T19:58:55Z

hello @raimbekovm ! Thanks for raising this issue. The issue appears to be more important than the scope of this PR, so I opened another one to fix everything that's wrong here #43420

raimbekovm · 2026-01-23T02:11:13Z

Glad I could help surface the issue — feel free to ping me if you need any help!

Fix flaky SAM-HQ integration tests by adding set_seed

219867a

Rocketknight1 mentioned this pull request Jan 12, 2026

tests/models/sam_hq/test_modeling_sam_hq.py::SamHQModelIntegrationTest may fail since a lot of cases are lack of set_seed() #42890

Closed

4 tasks

Fix SAM-HQ positional embedding loading from checkpoint

407c8ee

Share positional_embedding between shared_image_embedding and prompt_encoder.shared_embedding using _tied_weights_keys mechanism, matching the original SAM-HQ architecture where pe_layer is shared.

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky SAM-HQ integration tests by adding set_seed#43133

Fix flaky SAM-HQ integration tests by adding set_seed#43133
raimbekovm wants to merge 2 commits intohuggingface:mainfrom
raimbekovm:fix-sam-hq-flaky-tests

raimbekovm commented Jan 6, 2026 •

edited

Loading

Uh oh!

Rocketknight1 commented Jan 12, 2026

Uh oh!

raimbekovm commented Jan 12, 2026

Uh oh!

Rocketknight1 commented Jan 20, 2026

Uh oh!

github-actions Bot commented Jan 20, 2026

Uh oh!

raimbekovm commented Jan 20, 2026

Uh oh!

Rocketknight1 commented Jan 21, 2026 •

edited

Loading

Uh oh!

yonigozlan commented Jan 22, 2026

Uh oh!

raimbekovm commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

raimbekovm commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Jan 12, 2026

Uh oh!

raimbekovm commented Jan 12, 2026

Uh oh!

Rocketknight1 commented Jan 20, 2026

Uh oh!

github-actions Bot commented Jan 20, 2026

Uh oh!

raimbekovm commented Jan 20, 2026

Uh oh!

Rocketknight1 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yonigozlan commented Jan 22, 2026

Uh oh!

raimbekovm commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

raimbekovm commented Jan 6, 2026 •

edited

Loading

Rocketknight1 commented Jan 21, 2026 •

edited

Loading