Adds uniform processing kwargs to paligemma. by MnCSSJ4x · Pull Request #32377 · huggingface/transformers

MnCSSJ4x · 2024-08-01T13:55:31Z

What does this PR do?

Adds Uniform Processing kwargs for paligemma model.

Partially Fixes Issue 31911

Before submitting

Did you read the contributor guideline,
Pull Request section?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Can Review - @zucchini-nlp

zucchini-nlp

Hey @MnCSSJ4x ! Thanks for working on this! Left a few comments setting defaults values

Currently the tests are failing, so we have to fix them and make sure the paligemma_processing_tests contain the ProcessorTesterMixin. That is the way to ensure we're actually testing the new kwargs format.

Also, I want to ask some paligemma-specific tests for processors as it seems that Paligemma has several model-specific kwargs

cc @molbap if you have time to take a look

zucchini-nlp · 2024-08-02T05:30:59Z

Oh, can you remove the "Fixes #" from body. If not merging this PR will close the linked issue :)

molbap

Thanks for working on this! left a couple comments

MnCSSJ4x · 2024-08-05T10:18:18Z

@zucchini-nlp I changed the code to incorporate the new parameters. Let me know if I am going in the right direction. I haven't started writing the tests as I was occupied these few days. Will add that soon. Let me know if there is any reference which can help me understand the test or to take ideas from.

zucchini-nlp

Great, thanks for re-iterating on this! For the tests you can take a look at this one: https://github.com/huggingface/transformers/blob/main/tests/models/align/test_processor_align.py. If paligemma doesn't have any processing test, you have to create a new file. Otherwise adding ProcessorTesterMixin will enable new tests for Paligemma. The general info about tests is here :)

Additionally, can you add paligemma specific tests, which will test model-specific kwargs like suffix etc. When the CI turns greenm feel free to tag me and @molbap for review

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

MnCSSJ4x · 2024-08-14T15:49:30Z

Hey @yonigozlan I wrote some tests however got failure in tokenization and others where I checked the circle CI log and noticed gated repo issue. Is there a way to fix it or mock it up?

yonigozlan

Thanks for working on the tests! Left a few comments on the processor refactor also.
For the tests, as I indicated below, the base tests for ProcessorTesterMixin were recently changed. You can do this to have the newest changes on your branch:
git fetch upstream
git rebase upstream/main
You'll see that you may not have to override some of the tests at all :).

yonigozlan · 2024-08-14T16:15:30Z

        text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
        images: ImageInput = None,


These two inputs should be reversed and support for backward compatibility should be added. This should be similar to what is needed for Fuyu:
https://github.com/huggingface/transformers/blob/aa3bc0b4d5c98a40fc211994697d1893976f8bf9/src/transformers/models/fuyu/processing_fuyu.py#L522-L532

yonigozlan · 2024-08-14T16:18:38Z

-        do_align_long_axis: bool = None,
-        do_rescale: bool = None,
-        suffix: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,
+        video=None,


audio=None Is still needed here for API consistency, even if this model doesn't support the audio modality.

Suggested change

video=None,

audio = None,

video=None,

yonigozlan · 2024-08-14T16:22:51Z

+            tokenizer_init_kwargs=self.tokenizer.init_kwargs,
+            **kwargs,
+        )
+        suffix = output_kwargs["text_kwargs"]["suffix"]


If suffix is not specified as a kwargs, this will cause an error. Better to use:
suffix = output_kwargs["text_kwargs"].pop("suffix", None)

yonigozlan · 2024-08-14T16:24:58Z

+    image_kwargs: PaliGemmaImagesKwargs
+    _defaults = {
+        "text_kwargs": {
+            "tokenize_newline_separately": True,


Looks like tokenize_newline_separately is not use anywhere, and it is not a default text_kwargs, so it might be best to remove it entirely?

Yes, it's not used anymore and is not needed - iiuc do_thumbnail, do_align_long_axis and do_rescale neither (FYI, they are not used here)
+1 for removing it

yonigozlan · 2024-08-14T16:36:16Z

+            **output_kwargs["text_kwargs"],
            return_token_type_ids=return_token_type_ids,


Suggested change

**output_kwargs["text_kwargs"],

return_token_type_ids=return_token_type_ids,

return_token_type_ids=return_token_type_ids,

**output_kwargs["text_kwargs"],

yonigozlan · 2024-08-14T16:44:30Z

+
+    def setUp(self):
+        self.tmpdirname = tempfile.mkdtemp()
+        processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")


This will indeed cause a gated repo issue. you could rebuild a processor without using this repo, something like

Suggested change

processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")

image_processor = SiglipImageProcessor.from_pretrained("google/siglip-so400m-patch14-384")

tokenizer = GemmaTokenizer(SAMPLE_VOCAB, keep_accents=True)

processor = PaliGemmaProcessor(image_processor=image_processor, tokenizer=tokenizer)

Where
SAMPLE_VOCAB = get_tests_dir("fixtures/test_sentencepiece.model")
as it is done for test_tokenization_gemma.py.

Not sure if that's the nicest way to fix this though, any idea @zucchini-nlp @molbap ?

The CI token can be updated so that it can read this repo: in test_modeling.py there is

self.processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")

so that should not be an issue already - any idea @ydshieh here?

Sorry, what is the issue here? This repo seem to be public no?

I suppose it is, with a license to accept?

wait, are we talking about

google/paligemma-3b-pt-224

or

google/siglip-so400m-patch14-384

but both are accessible even if I am using a firefox private window

yonigozlan · 2024-08-14T16:45:55Z

+            self.skipTest(f"image_processor attribute not present in {self.processor_class}")
+        image_processor = self.get_component(
+            "image_processor",
+            crop_size={"shortest_edge": 234, "longest_edge": 234},


This may not be needed anymore as the base tests were changed recently, same for other tests. Please fetch and rebase on upstream main :)

molbap

Nice work! Let's see with the upstream rebase, I think we can reduce loc count by a fair chunk 🤗

molbap · 2024-08-12T12:51:02Z

-        do_align_long_axis: bool = None,
-        do_rescale: bool = None,
-        suffix: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,
+        video=None,


we can also advertise None audio kwarg here!

molbap · 2024-08-14T17:44:18Z

+
+    def setUp(self):
+        self.tmpdirname = tempfile.mkdtemp()
+        processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")


The CI token can be updated so that it can read this repo: in test_modeling.py there is

self.processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")

so that should not be an issue already - any idea @ydshieh here?

molbap · 2024-08-14T17:46:46Z

+    def prepare_image_inputs(self):
+        """This function prepares a list of PIL images, or a list of numpy arrays if one specifies numpify=True,
+        or a list of PyTorch tensors if one specifies torchify=True.
+        """
+
+        image_inputs = [np.random.randint(255, size=(3, 30, 400), dtype=np.uint8)]
+
+        image_inputs = [Image.fromarray(np.moveaxis(x, 0, -1)) for x in image_inputs]
+
+        return image_inputs


I'm noticing more of this function across the repo and it's identical in 17 places, I think we can move it to processing_utils.py at some point and save some loc, same remark for above helper functions!

Maybe this one works, I usually use it for image processor tests? from tests.test_image_processing_common import prepare_image_inputs

Yep! Let's move. I also have my own personal agenda to remove the "numpify" and "torchify" arguments which are confusing, clash and inconsistent so would be a good opportunity for that

molbap · 2024-08-14T17:54:00Z

+    image_kwargs: PaliGemmaImagesKwargs
+    _defaults = {
+        "text_kwargs": {
+            "tokenize_newline_separately": True,


Yes, it's not used anymore and is not needed - iiuc do_thumbnail, do_align_long_axis and do_rescale neither (FYI, they are not used here)
+1 for removing it

Adds uniform processing to paligemma.

bfe6445

MnCSSJ4x changed the title ~~Adds uniform processing to paligemma.~~ Adds uniform processing kwargs to paligemma. Aug 1, 2024

MnCSSJ4x mentioned this pull request Aug 1, 2024

Uniform kwargs for processors #31911

Closed

40 tasks

zucchini-nlp reviewed Aug 2, 2024

View reviewed changes

Comment thread src/transformers/models/paligemma/processing_paligemma.py Outdated

Comment thread src/transformers/models/paligemma/processing_paligemma.py

Comment thread src/transformers/models/paligemma/processing_paligemma.py Outdated

molbap self-requested a review August 2, 2024 12:15

Added specific args and updated call.

ea95baf

molbap reviewed Aug 5, 2024

View reviewed changes

Comment thread src/transformers/models/paligemma/processing_paligemma.py Outdated

Comment thread src/transformers/models/paligemma/processing_paligemma.py Outdated

Removed none defaults.

3c89e17

molbap self-requested a review August 5, 2024 11:35

zucchini-nlp reviewed Aug 5, 2024

View reviewed changes

yonigozlan reviewed Aug 9, 2024

View reviewed changes

Comment thread src/transformers/models/paligemma/processing_paligemma.py Outdated

yonigozlan reviewed Aug 9, 2024

View reviewed changes

Comment thread src/transformers/models/paligemma/processing_paligemma.py Outdated

yonigozlan reviewed Aug 9, 2024

View reviewed changes

Comment thread src/transformers/models/paligemma/processing_paligemma.py Outdated

MnCSSJ4x and others added 5 commits August 14, 2024 20:45

Update src/transformers/models/paligemma/processing_paligemma.py

d62f00b

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

Update src/transformers/models/paligemma/processing_paligemma.py

d5a5eed

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

Update src/transformers/models/paligemma/processing_paligemma.py

b51e05f

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

Add test for processing_paligemma.py

580ef32

Update style using make style.

77baadc

yonigozlan reviewed Aug 14, 2024

View reviewed changes

molbap reviewed Aug 14, 2024

View reviewed changes

yonigozlan mentioned this pull request Sep 18, 2024

Uniformize kwargs for Paligemma processor and update docs #33571

Merged

5 tasks

		text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
		images: ImageInput = None,

		**output_kwargs["text_kwargs"],
		return_token_type_ids=return_token_type_ids,

-        processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")
+        image_processor = SiglipImageProcessor.from_pretrained("google/siglip-so400m-patch14-384")
+        tokenizer = GemmaTokenizer(SAMPLE_VOCAB, keep_accents=True)
+        processor = PaliGemmaProcessor(image_processor=image_processor, tokenizer=tokenizer)

Conversation

MnCSSJ4x commented Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp commented Aug 2, 2024

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MnCSSJ4x commented Aug 5, 2024

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MnCSSJ4x commented Aug 14, 2024

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

MnCSSJ4x commented Aug 1, 2024 •

edited

Loading