Support Gemma3 with Clip fused attention by titaiwangms · Pull Request #24280 · microsoft/onnxruntime

titaiwangms · 2025-04-02T18:13:47Z

Description

Essentially, the vision model is traced differently (this time it's without mask.), and the input indices of op.Add and op.MatMul can be different. Also, fp16 and fp32 need different tracing patterns (op.Cast).

Add another traced pattern to CLIP attention to cover no attention_mask case
Accept different index of input on op.Add and op.MatMul (be more general)
fp16 and fp32 shows different pattern (op.Cast after op.Softmax)
Refactor test_fastgelu.py to cover torch.onnx.export(..., dynamo=True)
Add gemma3 vision attention (SigLip) test to cover both fp16 and fp32

Motivation and Context

To optimize Gemma3 multi-modal model, the changes are needed. https://huggingface.co/google/gemma-3-4b-it

NOTE: some related follow-ups (upstream optimizations to onnxscript-optimizer):
microsoft/onnxscript#2158
microsoft/onnxscript#2156

Copilot

Pull Request Overview

This PR refactors and extends model tracing and fusion tests for the Gemma3 vision model while adding support for fp16 and fp32 tracing patterns and generalizing input indices for op.Add and op.MatMul. Key changes include:

Introducing a new traced pattern for CLIP attention without an attention mask.
Generalizing the input indices for op.Add and op.MatMul and differentiating tracing for fp16 and fp32.
Refactoring test files to support dynamo export and adding new tests for Gemma3 vision attention (SigLip).

Reviewed Changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated no comments.

File	Description
onnxruntime/test/python/transformers/test_gemma3_vision.py	Adds tests and model definitions for Gemma3 vision attention and layer normalization.
onnxruntime/test/python/transformers/test_gelu_fusions.py	Refactors Gelu fusion tests by parameterizing the tests for dynamo export.
onnxruntime/python/tools/transformers/fusion_fastgelu.py	Improves robustness in FastGelu fusion by handling cases where the root input comes directly from the graph input.
onnxruntime/python/tools/transformers/fusion_attention_clip.py	Generalizes pattern matching for attention fusion to support different index configurations and tensor formats.

Files not reviewed (2)

tools/ci_build/github/linux/python/requirements.txt: Language not supported
tools/ci_build/github/windows/python/requirements.txt: Language not supported

Comments suppressed due to low confidence (2)

onnxruntime/python/tools/transformers/fusion_attention_clip.py:152

[nitpick] Consider using a named constant or adding a comment to clarify the purpose of 'None' as a wildcard in the index array for pattern matching.

                [1, None, 0, 0, 0],

onnxruntime/python/tools/transformers/fusion_attention_clip.py:232

[nitpick] Consistently document or use a named constant for wildcard indices (such as 'None') to improve code clarity in pattern matching.

            q_nodes = self.model.match_parent_path(

snnn

Then changes under tools/ci_build are fine.

### Description  Essentially, the vision model is traced differently (this time it's without mask.), and the input indices of op.Add and op.MatMul can be different. Also, fp16 and fp32 need different tracing patterns (op.Cast). 1. Add another traced pattern to CLIP attention to cover no attention_mask case 2. Accept different index of input on op.Add and op.MatMul (be more general) 3. fp16 and fp32 shows different pattern (op.Cast after op.Softmax) 4. Refactor test_fastgelu.py to cover torch.onnx.export(..., dynamo=True) 5. Add gemma3 vision attention (SigLip) test to cover both fp16 and fp32 ### Motivation and Context  To optimize Gemma3 multi-modal model, the changes are needed. https://huggingface.co/google/gemma-3-4b-it NOTE: some related follow-ups (upstream optimizations to onnxscript-optimizer): microsoft/onnxscript#2158 microsoft/onnxscript#2156

titaiwangms added 10 commits March 26, 2025 17:31

draft

5b53e88

support fp32

be02c1b

support fp32 - 2

3bd1896

add dynamo-based graph gelu test

7e8afff

update optimize=True

646cd34

force to add onnx file

6fd3967

add fp16 test

b589751

update ci_test dependency

4ccf50d

Merge branch 'main' into titaiwang/gemma3-vision

f569371

from is None to is not None

67e407a

titaiwangms mentioned this pull request Apr 2, 2025

Support Gemma3 with Clip fused attention #24187

Closed

titaiwangms requested review from Copilot, kunal-vaishnavi and tianleiwu April 2, 2025 18:14

Copilot AI reviewed Apr 2, 2025

View reviewed changes

titaiwangms changed the title ~~Titaiwang/gemma3 vision~~ Support Gemma3 with Clip fused attention Apr 2, 2025

tianleiwu previously approved these changes Apr 2, 2025

View reviewed changes

kunal-vaishnavi previously approved these changes Apr 2, 2025

View reviewed changes

resolve conflict

e7d6160

titaiwangms dismissed stale reviews from kunal-vaishnavi and tianleiwu via e7d6160 April 2, 2025 20:25

Merge branch 'main' into titaiwang/gemma3-vision

91e8473

tianleiwu previously approved these changes Apr 3, 2025

View reviewed changes

kunal-vaishnavi previously approved these changes Apr 3, 2025

View reviewed changes

add optimize=True because it's torch 2.6

ee96271

titaiwangms dismissed stale reviews from kunal-vaishnavi and tianleiwu via ee96271 April 3, 2025 18:44

titaiwangms added 2 commits April 3, 2025 23:03

delete model metadata

99e2883

add onnxscript

501a3ac

snnn approved these changes Apr 4, 2025

View reviewed changes

kunal-vaishnavi approved these changes Apr 4, 2025

View reviewed changes

titaiwangms merged commit e944379 into main Apr 4, 2025
83 of 89 checks passed

titaiwangms deleted the titaiwang/gemma3-vision branch April 4, 2025 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Gemma3 with Clip fused attention#24280

Support Gemma3 with Clip fused attention#24280
titaiwangms merged 15 commits intomainfrom
titaiwang/gemma3-vision

titaiwangms commented Apr 2, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

snnn left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

titaiwangms commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

snnn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

titaiwangms commented Apr 2, 2025 •

edited

Loading