Conversation
|
All green!!
model = CLIPModel.from_pretrained(checkpoint)
inputs = CLIPProcessor(texts=..., images=..., some_other_kwargs)
outputs = model(**inputs)
Ready for second review @LysandreJik @sgugger @patrickvonplaten |
sgugger
left a comment
There was a problem hiding this comment.
Great work! A few last loose ends to tie up (in particular don't forget to replace all checkpoint names in the docstrings by ones in the openai namespace!) and it should be good to merge.
| logger = logging.get_logger(__name__) | ||
|
|
||
| CLIP_PRETRAINED_CONFIG_ARCHIVE_MAP = { | ||
| "valhalla/clip-vit-base-patch32": "https://huggingface.co/valhalla/clip-vit-base-patch32/resolve/main/config.json", |
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
LysandreJik
left a comment
There was a problem hiding this comment.
Great, LGTM! Only need to update from valhalla namespace to openai namespace and it looks good to me.
* begin second draft * fix import, style * add loss * fix embeds, logits_scale, and projection * fix imports * add conversion script * add feature_extractor and processor * style * add tests for tokenizer, extractor and processor * add vision model tests * add weight init * add more tests * fix save_load test * model output, dosstrings, causal mask * config doc * add clip model tests * return dict * bigin integration test * add integration tests * fix-copies * fix init * Clip => CLIP * fix module name * docs * fix doc * output_dim => projection_dim * fix checkpoint names * remoe fast tokenizer file * fix conversion script * fix tests, quality * put causal mask on device * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix attribute test * style * address sylvains comments * style * fix docstrings * add qucik_gelu in activations, docstrings * clean-up attention test * fix act fun * fix config * fix torchscript tests * even batch_size * remove comment * fix ouput tu_tuple * fix save load tests * fix add tokens test * add fast tokenizer * update copyright * new processor API * fix docs * docstrings * docs * fix doc * fix doc * fix tokenizer * fix import in doc example * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * check types of config * valhalla => openai * load image using url * fix test * typo Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
How to use processor in getitem()? I got an error"RuntimeError: stack expects each tensor to be equal size, but got [1, 11] at entry 0 and [1, 13] at entry 1" ,as follow: |
|
Hi @lycfight could you please open an issue with a minimal code snippet so we could take a look. Thanks :) |
of course |
What does this PR do?
This PR adds the CLIP model.
CLIP is a multi-modal vision+language model which uses a transformer model for encoding both the images and text.
CLIPTextModelandCLIPVisionModelcan be loaded independently, and composed together to get theCLIPModel.CLIPTextModelandCLIPVisionModeluse the shared encoder classCLIPEncoder.CLIPTextConfigandCLIPVisionConfig. This could be kept in one config class but then we would have to add two arguments for each config value i.etext_hidden_sizefor text modelvision_hidden_sizefor vision model etc.One issue here is that when we load an individual model, like
CLIPTextModelusing the weights of the wholeCLIPModelthe config ends up containing both text and vision config dicts, this does not cause any issue but could be confusing to look at.
One important thing to note here is that CLIP's tokenizer does have a pad token defined for it, but they use 0 as
pad_token_idto pad the text, but the token, but the token associated with 0 is not a pad token. So here, to able to do padding I've addedpad_token_idas apropertywhich returns 0. I would be happy to hear if there is some other way to achieve this.Also, I've added a processor class here but not sure if we really need it for this model. We could easily use the extractor for the vision model and tokenizer for the text model.
Would love your review about the design @LysandreJik , @patrickvonplaten , @sgugger.