Skip to content

Add multi-modality transformers  #2775

@ahatamiz

Description

@ahatamiz

Is your feature request related to a problem? Please describe.
Add transformers that can be leveraged for processing multi-modal data (i.e. vision and language). The transformer block can be also used for creating cross-attention modules.
Describe the solution you'd like
The architecture can be imported from HuggingFace.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions