Feature request
Refactoring of the attention modules in bert-based models to use global attention function
Motivation
Enabling easier support of SDPA and flash attention while minimizing code duplication in Bert copies
Your contribution
I already created a draft PR #37494 to outline the changes required. Would love to get feedback and would continue working on this PR if needed
Feature request
Refactoring of the attention modules in bert-based models to use global attention function
Motivation
Enabling easier support of SDPA and flash attention while minimizing code duplication in Bert copies
Your contribution
I already created a draft PR #37494 to outline the changes required. Would love to get feedback and would continue working on this PR if needed