Skip to content

[EfficientLoFTR] LRU Cached embedding computation at inference#40329

Merged
qubvel merged 6 commits intohuggingface:mainfrom
sbucaille:eloftr-dynamic-rope
Aug 27, 2025
Merged

[EfficientLoFTR] LRU Cached embedding computation at inference#40329
qubvel merged 6 commits intohuggingface:mainfrom
sbucaille:eloftr-dynamic-rope

Conversation

@sbucaille
Copy link
Copy Markdown
Contributor

What does this PR do?

Reverted EfficientLoFTR RoPE computation back to inference time to allow different image sizes usage
Decorated with LRU Cache similarly to DinoV3 implementation

Who can review?

@qubvel

Copy link
Copy Markdown
Contributor

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just a question, otherwise looks good 👍

Comment on lines +74 to +75
i_indices = torch.ones(embed_height, embed_width).cumsum(0).float().unsqueeze(-1)
j_indices = torch.ones(embed_height, embed_width).cumsum(1).float().unsqueeze(-1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device / dtype is also needed, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_embeddings is only used inside the torch.autocast context manager so I assume we don't need it ?

i_indices = torch.ones(embed_height, embed_width).cumsum(0).float().unsqueeze(-1)
j_indices = torch.ones(embed_height, embed_width).cumsum(1).float().unsqueeze(-1)

emb = torch.zeros(1, embed_height, embed_width, hidden_size // 2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here for zeros

@sbucaille
Copy link
Copy Markdown
Contributor Author

@qubvel gentle bump! I answered your comment

@qubvel
Copy link
Copy Markdown
Contributor

qubvel commented Aug 26, 2025

Thanks for the ping! Missed it.

I still think we should pass inv_freq.device and inv_freq.dtype while creating tensors to make this function safe to use even outside of the autocast block

@sbucaille
Copy link
Copy Markdown
Contributor Author

Adressed in 3fa6b94

@qubvel
Copy link
Copy Markdown
Contributor

qubvel commented Aug 26, 2025

run-slow: efficientloftr

@github-actions
Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/efficientloftr']
quantizations: [] ...

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sbucaille
Copy link
Copy Markdown
Contributor Author

You can rerun slow tests, it should be fixed, I forgot the features were aggregated before the attention, so the embeddings need to be of correct size

@qubvel
Copy link
Copy Markdown
Contributor

qubvel commented Aug 27, 2025

run-slow: efficientloftr

@github-actions
Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/efficientloftr']
quantizations: [] ...

@qubvel qubvel self-requested a review August 27, 2025 13:41
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: efficientloftr

if self.hidden_size != self.out_features[-1]:
raise ValueError(
f"hidden_size should be equal to the last value in out_features. hidden_size = {self.hidden_size}, out_features = {self.stage_out_channels}"
f"hidden_size should be equal to the last value in out_features. hidden_size = {self.hidden_size}, out_features = {self.out_features[-1]}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix attribute name

Comment on lines +53 to +61
out_features: list[int] = [32, 32, 128],
stage_stride: list[int] = [2, 1, 2],
q_aggregation_kernel_size: int = 1,
kv_aggregation_kernel_size: int = 1,
q_aggregation_stride: int = 1,
kv_aggregation_stride: int = 1,
num_attention_layers: int = 2,
num_attention_heads: int = 8,
hidden_size: int = 64,
hidden_size: int = 128,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's for FA2 tests to pass

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How didn't I catch this one before ? 🤔 But thanks for taking care of it !

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's skipped locally in case you don't have FA2 installed 😄

@qubvel
Copy link
Copy Markdown
Contributor

qubvel commented Aug 27, 2025

run-slow: efficientloftr

@github-actions
Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/efficientloftr']
quantizations: [] ...

@qubvel qubvel merged commit 52aaa3f into huggingface:main Aug 27, 2025
20 checks passed
@sbucaille sbucaille deleted the eloftr-dynamic-rope branch September 2, 2025 03:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants