Adds uniform processing kwargs to paligemma.#32377
Adds uniform processing kwargs to paligemma.#32377MnCSSJ4x wants to merge 8 commits intohuggingface:mainfrom
Conversation
zucchini-nlp
left a comment
There was a problem hiding this comment.
Hey @MnCSSJ4x ! Thanks for working on this! Left a few comments setting defaults values
Currently the tests are failing, so we have to fix them and make sure the paligemma_processing_tests contain the ProcessorTesterMixin. That is the way to ensure we're actually testing the new kwargs format.
Also, I want to ask some paligemma-specific tests for processors as it seems that Paligemma has several model-specific kwargs
cc @molbap if you have time to take a look
|
Oh, can you remove the "Fixes #" from body. If not merging this PR will close the linked issue :) |
molbap
left a comment
There was a problem hiding this comment.
Thanks for working on this! left a couple comments
|
@zucchini-nlp I changed the code to incorporate the new parameters. Let me know if I am going in the right direction. I haven't started writing the tests as I was occupied these few days. Will add that soon. Let me know if there is any reference which can help me understand the test or to take ideas from. |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Great, thanks for re-iterating on this! For the tests you can take a look at this one: https://github.com/huggingface/transformers/blob/main/tests/models/align/test_processor_align.py. If paligemma doesn't have any processing test, you have to create a new file. Otherwise adding ProcessorTesterMixin will enable new tests for Paligemma. The general info about tests is here :)
Additionally, can you add paligemma specific tests, which will test model-specific kwargs like suffix etc. When the CI turns greenm feel free to tag me and @molbap for review
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
|
Hey @yonigozlan I wrote some tests however got failure in tokenization and others where I checked the circle CI log and noticed gated repo issue. Is there a way to fix it or mock it up? |
yonigozlan
left a comment
There was a problem hiding this comment.
Thanks for working on the tests! Left a few comments on the processor refactor also.
For the tests, as I indicated below, the base tests for ProcessorTesterMixin were recently changed. You can do this to have the newest changes on your branch:
git fetch upstream
git rebase upstream/main
You'll see that you may not have to override some of the tests at all :).
| text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None, | ||
| images: ImageInput = None, |
There was a problem hiding this comment.
These two inputs should be reversed and support for backward compatibility should be added. This should be similar to what is needed for Fuyu:
https://github.com/huggingface/transformers/blob/aa3bc0b4d5c98a40fc211994697d1893976f8bf9/src/transformers/models/fuyu/processing_fuyu.py#L522-L532
| do_align_long_axis: bool = None, | ||
| do_rescale: bool = None, | ||
| suffix: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None, | ||
| video=None, |
There was a problem hiding this comment.
audio=None Is still needed here for API consistency, even if this model doesn't support the audio modality.
| video=None, | |
| audio = None, | |
| video=None, |
| tokenizer_init_kwargs=self.tokenizer.init_kwargs, | ||
| **kwargs, | ||
| ) | ||
| suffix = output_kwargs["text_kwargs"]["suffix"] |
There was a problem hiding this comment.
If suffix is not specified as a kwargs, this will cause an error. Better to use:
suffix = output_kwargs["text_kwargs"].pop("suffix", None)
| image_kwargs: PaliGemmaImagesKwargs | ||
| _defaults = { | ||
| "text_kwargs": { | ||
| "tokenize_newline_separately": True, |
There was a problem hiding this comment.
Looks like tokenize_newline_separately is not use anywhere, and it is not a default text_kwargs, so it might be best to remove it entirely?
There was a problem hiding this comment.
Yes, it's not used anymore and is not needed - iiuc do_thumbnail, do_align_long_axis and do_rescale neither (FYI, they are not used here)
+1 for removing it
| **output_kwargs["text_kwargs"], | ||
| return_token_type_ids=return_token_type_ids, |
There was a problem hiding this comment.
| **output_kwargs["text_kwargs"], | |
| return_token_type_ids=return_token_type_ids, | |
| return_token_type_ids=return_token_type_ids, | |
| **output_kwargs["text_kwargs"], |
|
|
||
| def setUp(self): | ||
| self.tmpdirname = tempfile.mkdtemp() | ||
| processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224") |
There was a problem hiding this comment.
This will indeed cause a gated repo issue. you could rebuild a processor without using this repo, something like
| processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224") | |
| image_processor = SiglipImageProcessor.from_pretrained("google/siglip-so400m-patch14-384") | |
| tokenizer = GemmaTokenizer(SAMPLE_VOCAB, keep_accents=True) | |
| processor = PaliGemmaProcessor(image_processor=image_processor, tokenizer=tokenizer) |
Where
SAMPLE_VOCAB = get_tests_dir("fixtures/test_sentencepiece.model")
as it is done for test_tokenization_gemma.py.
Not sure if that's the nicest way to fix this though, any idea @zucchini-nlp @molbap ?
There was a problem hiding this comment.
The CI token can be updated so that it can read this repo: in test_modeling.py there is
self.processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")so that should not be an issue already - any idea @ydshieh here?
There was a problem hiding this comment.
Sorry, what is the issue here? This repo seem to be public no?
There was a problem hiding this comment.
wait, are we talking about
google/paligemma-3b-pt-224
or
google/siglip-so400m-patch14-384
but both are accessible even if I am using a firefox private window
| self.skipTest(f"image_processor attribute not present in {self.processor_class}") | ||
| image_processor = self.get_component( | ||
| "image_processor", | ||
| crop_size={"shortest_edge": 234, "longest_edge": 234}, |
There was a problem hiding this comment.
This may not be needed anymore as the base tests were changed recently, same for other tests. Please fetch and rebase on upstream main :)
molbap
left a comment
There was a problem hiding this comment.
Nice work! Let's see with the upstream rebase, I think we can reduce loc count by a fair chunk 🤗
| do_align_long_axis: bool = None, | ||
| do_rescale: bool = None, | ||
| suffix: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None, | ||
| video=None, |
There was a problem hiding this comment.
we can also advertise None audio kwarg here!
|
|
||
| def setUp(self): | ||
| self.tmpdirname = tempfile.mkdtemp() | ||
| processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224") |
There was a problem hiding this comment.
The CI token can be updated so that it can read this repo: in test_modeling.py there is
self.processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")so that should not be an issue already - any idea @ydshieh here?
| def prepare_image_inputs(self): | ||
| """This function prepares a list of PIL images, or a list of numpy arrays if one specifies numpify=True, | ||
| or a list of PyTorch tensors if one specifies torchify=True. | ||
| """ | ||
|
|
||
| image_inputs = [np.random.randint(255, size=(3, 30, 400), dtype=np.uint8)] | ||
|
|
||
| image_inputs = [Image.fromarray(np.moveaxis(x, 0, -1)) for x in image_inputs] | ||
|
|
||
| return image_inputs |
There was a problem hiding this comment.
I'm noticing more of this function across the repo and it's identical in 17 places, I think we can move it to processing_utils.py at some point and save some loc, same remark for above helper functions!
There was a problem hiding this comment.
Maybe this one works, I usually use it for image processor tests? from tests.test_image_processing_common import prepare_image_inputs
There was a problem hiding this comment.
Yep! Let's move. I also have my own personal agenda to remove the "numpify" and "torchify" arguments which are confusing, clash and inconsistent so would be a good opportunity for that
| image_kwargs: PaliGemmaImagesKwargs | ||
| _defaults = { | ||
| "text_kwargs": { | ||
| "tokenize_newline_separately": True, |
There was a problem hiding this comment.
Yes, it's not used anymore and is not needed - iiuc do_thumbnail, do_align_long_axis and do_rescale neither (FYI, they are not used here)
+1 for removing it

What does this PR do?
Adds Uniform Processing kwargs for paligemma model.
Partially Fixes Issue 31911
Before submitting
Pull Request section?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
Can Review - @zucchini-nlp