Skip to content
This repository was archived by the owner on Feb 7, 2025. It is now read-only.
This repository was archived by the owner on Feb 7, 2025. It is now read-only.

Missing scale in CrossAttention class #146

@Warvito

Description

@Warvito

The attention mechanism in the cross-attention class is missing the multiplication by the scale factor.

attention_scores = torch.matmul(query, key.transpose(-1, -2))

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions