Hi, I'm following and have some questions:
- There are different operations in the
modified_clip/model.py and modified_clip/open_model.py, like img_features[kth] = ln_x in model.py and img_features[kth] = ln_x - img_features[kth] in open_model.py. Why is there such a difference? They should just be introducing different sizes of ViT-based CLIP models through package CLIP and OpenCLIP.
- In forward of model.py, are some operations like concat
fg_text_features.mean(0, True) into the text_features, and seg_last[seg_last < seg_last.amax(0, keepdim=True) * 0.2] = 0 used to improve the performance? how to determine the threshold as 0.2?
BTW, this code is simple yet elegant. Thanks for your impressive work.
Hi, I'm following and have some questions:
modified_clip/model.pyandmodified_clip/open_model.py, likeimg_features[kth] = ln_xin model.py andimg_features[kth] = ln_x - img_features[kth]in open_model.py. Why is there such a difference? They should just be introducing different sizes of ViT-based CLIP models through package CLIP and OpenCLIP.fg_text_features.mean(0, True)into the text_features, andseg_last[seg_last < seg_last.amax(0, keepdim=True) * 0.2] = 0used to improve the performance? how to determine the threshold as 0.2?BTW, this code is simple yet elegant. Thanks for your impressive work.