Clarification on Some Parameters and Locality-Constrained Sparse Attention Implementation

Hi,

I have been exploring your implementation and came across the parameters **topk_ratio and local_range**. Could you please clarify the following points?

topk_ratio:

What does the topk_ratio parameter control in your model? How does it relate to the resolution of the input data? 

local_range:

What is the role of local_range in the attention process? How does it constrain the attention span or locality?

Additionally, I am interested in understanding the Locality-Constrained Sparse Attention mechanism. Specifically:

Where is the implementation of Locality-Constrained Sparse Attention in the codebase? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on Some Parameters and Locality-Constrained Sparse Attention Implementation #71

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarification on Some Parameters and Locality-Constrained Sparse Attention Implementation #71

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions