CLIP-SF and BLIP-SF weight: w1, w2, w3, and w4.

It seems that CLIP-SF and BLIP-SF have not been trained on w1, w2, w3, and w4.
In the code for encoding in UniIR/src/models/uniir_clip/clip_scorefusion/clip_sf.py, it is as follows:
def encode_text(self, text_tensor):
    return self.clip_model.encode_text(text_tensor)

def encode_image(self, image_tensor):
    return self.clip_model.encode_image(image_tensor)

def fuse_embeddings(self, img_emb, txt_emb):
    fused_emb = img_emb + txt_emb
    return fused_emb

def encode_multimodal_input(self, txt_tensor, img_tensor, txt_mask, img_mask):
    """
    :param txt_tensor:
    :param img_tensor:
    :param txt_mask: expected shape: [batch_size, 1]
    :param img_mask: expected shape: [batch_size, 1]
    :return:
    """
    txt_emb = self.encode_text(txt_tensor) * txt_mask.unsqueeze(-1)
    img_emb = self.encode_image(img_tensor) * img_mask.unsqueeze(-1)
    return self.fuse_embeddings(txt_emb, img_emb)  # shape: [batch_size, 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CLIP-SF and BLIP-SF weight: w1, w2, w3, and w4. #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CLIP-SF and BLIP-SF weight: w1, w2, w3, and w4. #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions