Skip to content

Adding superglue fast image processing#41394

Merged
yonigozlan merged 24 commits intohuggingface:mainfrom
AlphaOrOmega:fast_image_processing_superglue_implementation
Oct 16, 2025
Merged

Adding superglue fast image processing#41394
yonigozlan merged 24 commits intohuggingface:mainfrom
AlphaOrOmega:fast_image_processing_superglue_implementation

Conversation

@AlphaOrOmega
Copy link
Copy Markdown
Contributor

@AlphaOrOmega AlphaOrOmega commented Oct 6, 2025

What does this PR do?

TLDR :

  • Implement fast processor for SuperGlue
  • About 3 times faster

This PR aims to translate the features of the class SuperGlueImageProcessor in the fast equivalent class SuperGlueImageProcessorFast.
The implementation heavily follows the standard implementation but reduces memory consumption and about 3 times the execution speed on my hardware.
The implementation mostly refactor the image formatting in the preprocessing step, notably by using torch tensors instead of PIL or Numpy.

Test Performed

RUN_SLOW=1 python -m pytest tests/models/superglue/test_image_processing_superglue.py

With an additional test based on the default processor tester (this test has not to be included in the repo) :

@require_vision
@require_torch
def test_fast_is_faster_than_slow(self):
    if not self.test_slow_image_processor or not self.test_fast_image_processor:
        self.skipTest(reason="Skipping speed test")

    if self.image_processing_class is None or self.fast_image_processing_class is None:
        self.skipTest(reason="Skipping speed test as one of the image processors is not defined")

    def measure_time(image_processor, image):
        # Warmup
        for _ in range(5):
            _ = image_processor(image, return_tensors="pt")
        all_times = []
        for _ in range(10):
            start = time.time()
            _ = image_processor(image, return_tensors="pt")
            all_times.append(time.time() - start)
        # Take the average of the fastest 3 runs
        avg_time = sum(sorted(all_times[:3])) / 3.0
        return avg_time

    dummy_images = self.image_processor_tester.prepare_image_inputs(equal_resolution=False, torchify=True)
    image_processor_slow = self.image_processing_class(**self.image_processor_dict)
    image_processor_fast = self.fast_image_processing_class(**self.image_processor_dict)

    fast_time = measure_time(image_processor_fast, dummy_images)
    slow_time = measure_time(image_processor_slow, dummy_images)

    self.assertLessEqual(fast_time, slow_time)

By reviewing the flame graph, I noticed the improvement in every __calls__ made to the fast version.

Callers of the old processor, and the full execution time of the method:
image

The equivalent but with the fast processor:
image

Some calls made during the test passes directly to the preprocess function, without passing by the __call__ one, I am including them as well:
Slow
image
Fast
image

Before submitting

Who can review?

Thank you for reviewing my PR @yonigozlan (or anyone else :) )

@Rocketknight1
Copy link
Copy Markdown
Member

cc @yonigozlan

Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AlphaOrOmega, thanks for working on this!
Please have a thorough look at the guide in this PR on how to implement a fast image processor correctly: #36978

You can also have a look at image_processing_efficientloftr_fast.py, as it should be almost identical to this image processor!

@AlphaOrOmega
Copy link
Copy Markdown
Contributor Author

Hi @yonigozlan,

Thank you for the feedback, the referenced processor was indeed very close to the one I implemented, so I re-used relevant code and ensured the logic was still here,

Could you please review the recent changes ?

Thank you

Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @AlphaOrOmega for working on this! Just added a commit to use modular for efficientloftr using this new image processor. Let's wait for the CI to pass then we'll merge!

@yonigozlan yonigozlan enabled auto-merge (squash) October 16, 2025 18:08
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, efficientloftr, lightglue, superglue

@yonigozlan yonigozlan merged commit 354567d into huggingface:main Oct 16, 2025
22 checks passed
ngazagna-qc pushed a commit to ngazagna-qc/transformers that referenced this pull request Oct 23, 2025
* Default implementation - no time improvement

* Improved implementation - apparently 2 times faster with only simple function refactor

* elementary torch first approach, still need further implementation of torch first method

* torch-first approach finished

* refactor processor

* refactor test

* partial doc update

* EfficientLoFTRImageProcessorFast based implementation

* EfficientLoFTRImageProcessorFast based implementation

* Logic checked - Test Passed - Validated execution speed

* use modular for efficientloftr

* fix import

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* Default implementation - no time improvement

* Improved implementation - apparently 2 times faster with only simple function refactor

* elementary torch first approach, still need further implementation of torch first method

* torch-first approach finished

* refactor processor

* refactor test

* partial doc update

* EfficientLoFTRImageProcessorFast based implementation

* EfficientLoFTRImageProcessorFast based implementation

* Logic checked - Test Passed - Validated execution speed

* use modular for efficientloftr

* fix import

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants