Skip to content

Fix flaky SAM-HQ integration tests by adding set_seed#43133

Open
raimbekovm wants to merge 2 commits intohuggingface:mainfrom
raimbekovm:fix-sam-hq-flaky-tests
Open

Fix flaky SAM-HQ integration tests by adding set_seed#43133
raimbekovm wants to merge 2 commits intohuggingface:mainfrom
raimbekovm:fix-sam-hq-flaky-tests

Conversation

@raimbekovm
Copy link
Copy Markdown
Contributor

@raimbekovm raimbekovm commented Jan 6, 2026

What does this PR do?

Fixes flaky SamHQModelIntegrationTest by adding set_seed(0) in setUp().

The root cause: HF SAM-HQ creates two independent SamHQPositionalEmbedding instances (shared_image_embedding and prompt_encoder.shared_embedding), but the checkpoint only loads weights to shared_image_embedding. The other one stays randomly initialized, causing non-deterministic test results.

Note: This is a workaround. The proper fix would be to share the embedding between both (like the original SAM-HQ) or copy weights to both during loading. Happy to implement that instead if preferred.

Fixes #42890

Before submitting

Who can review?

@NielsRogge @molbap @yonigozlan

@Rocketknight1
Copy link
Copy Markdown
Member

_keys_to_ignore_on_load_missing will only suppress the warning if the key is missing from the checkpoint, it shouldn't change the loading behaviour. Is the checkpoint missing that tensor for some reason?

@raimbekovm
Copy link
Copy Markdown
Contributor Author

You're right. I investigated the architecture:

The original SAM-HQ has one shared pe_layer.positional_encoding_gaussian_matrix. But the HF model creates two independent SamHQPositionalEmbedding instances:

  • shared_image_embedding (SamHQModel.init)
  • prompt_encoder.shared_embedding (SamHQPromptEncoder.init)

During conversion, the tensor is mapped only to shared_image_embedding.positional_embedding. The prompt_encoder.shared_embedding.positional_embedding remains randomly initialized.

The proper fix would be to either share the embedding between both (like the original) or copy the tensor to both locations during checkpoint loading. The set_seed approach is a workaround that masks this architectural mismatch.

Should I update the PR to fix the root cause instead?

@Rocketknight1
Copy link
Copy Markdown
Member

Yes, I think so! At the very least a PR that did that would give us a good starting point to check what it fixes and if there are any regressions

Share positional_embedding between shared_image_embedding and
prompt_encoder.shared_embedding using _tied_weights_keys mechanism,
matching the original SAM-HQ architecture where pe_layer is shared.
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: sam_hq

@raimbekovm
Copy link
Copy Markdown
Contributor Author

Implemented the proper fix mentioned in the PR description.

What changed:

  • Share positional_embedding between shared_image_embedding and prompt_encoder.shared_embedding (matching original SAM-HQ architecture)
  • Use _tied_weights_keys to handle weight loading correctly
  • Changed buffer to nn.Parameter(requires_grad=False) because mark_tied_weights_as_initialized() only works with parameters

Why this approach:
Tried several alternatives first:

  • _register_load_state_dict_pre_hook / register_load_state_dict_post_hook — don't work with safetensors/accelerate loading
  • _load_from_state_dict override — not supported by modular converter

The _tied_weights_keys mechanism works but requires get_expanded_tied_weights_keys override since the default implementation checks tie_word_embeddings (designed for language models).

Tests:

  • Embeddings are now properly shared (same object)
  • Weights load correctly from checkpoint
  • Deterministic across multiple loads
  • Forward pass works correctly

The set_seed workaround from the first commit is no longer needed but I left it as a safety net.

@Rocketknight1
Copy link
Copy Markdown
Member

Rocketknight1 commented Jan 21, 2026

cc @NielsRogge @molbap @yonigozlan for review, since I'm not the expert on SAM models!

@yonigozlan
Copy link
Copy Markdown
Member

hello @raimbekovm ! Thanks for raising this issue. The issue appears to be more important than the scope of this PR, so I opened another one to fix everything that's wrong here #43420

@raimbekovm
Copy link
Copy Markdown
Contributor Author

Glad I could help surface the issue — feel free to ping me if you need any help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tests/models/sam_hq/test_modeling_sam_hq.py::SamHQModelIntegrationTest may fail since a lot of cases are lack of set_seed()

3 participants