Conversation
| expected_slice = torch.tensor( | ||
| [[0.9148, -1.4148, 3.8040], [3.3443, 1.9478, 0.2080], [1.6604, 2.8184, -0.3618]] |
There was a problem hiding this comment.
this value was obtained from the buggy modeling code . Now since w fix the code, we need to update the value.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Let's do is_causal = True if is_decoder else False, I think that makes sense and removes redundant args passing around
|
I love the suggestion, unfortunately there is a super nit issue. At the time back, I added this block (so a decoder could do self-attention as well as cross-attention to an encoder's output, just like the original Although, this is never used for Kosmos2 ( microsoft uses fairseq to contain a lot of code in the library, not all the paths are used for a particular model). So if a user have
I would say no hub repository of kosmos2 use |
Ah I see, so we have a not-used code path. I think we can remove it with a minor deprecation cycle, which is in line with the whole "unbloating" cycle we have currently. And then we can safely assume that non-decoder uses non-causal mask |
|
OK, will do that. Thanks |
What does this PR do?
[VLMs] support attention backends (#37576) actually breaks
kosmos2asKosmosTextAttentionis used in the decoder (Kosmos2TextBlock) as well asKosmos2ImageToTextProjection(which should attend to all image places).But without
is_causal, thesdpa_attention_forwardwill treat it ascausaldue toMaybe there is a better way to handle this. But I would not spend too much time but just adding
is_causalarugment and pass it.All tests pass on A10 now