Skip to content

CLIP-SF and BLIP-SF weight: w1, w2, w3, and w4. #30

@pandaupc

Description

@pandaupc

It seems that CLIP-SF and BLIP-SF have not been trained on w1, w2, w3, and w4.
In the code for encoding in UniIR/src/models/uniir_clip/clip_scorefusion/clip_sf.py, it is as follows:
def encode_text(self, text_tensor):
return self.clip_model.encode_text(text_tensor)

def encode_image(self, image_tensor):
return self.clip_model.encode_image(image_tensor)

def fuse_embeddings(self, img_emb, txt_emb):
fused_emb = img_emb + txt_emb
return fused_emb

def encode_multimodal_input(self, txt_tensor, img_tensor, txt_mask, img_mask):
"""
:param txt_tensor:
:param img_tensor:
:param txt_mask: expected shape: [batch_size, 1]
:param img_mask: expected shape: [batch_size, 1]
:return:
"""
txt_emb = self.encode_text(txt_tensor) * txt_mask.unsqueeze(-1)
img_emb = self.encode_image(img_tensor) * img_mask.unsqueeze(-1)
return self.fuse_embeddings(txt_emb, img_emb) # shape: [batch_size,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions