Skip to content

Refactor bert-based models to use global attention function #37495

@Marcel256

Description

@Marcel256

Feature request

Refactoring of the attention modules in bert-based models to use global attention function

Motivation

Enabling easier support of SDPA and flash attention while minimizing code duplication in Bert copies

Your contribution

I already created a draft PR #37494 to outline the changes required. Would love to get feedback and would continue working on this PR if needed

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions