Skip to content

feat: qkv fusing#75

Merged
llcnt merged 17 commits intomainfrom
feat/qkv_fusing_and_fp8_quantization
May 16, 2025
Merged

feat: qkv fusing#75
llcnt merged 17 commits intomainfrom
feat/qkv_fusing_and_fp8_quantization

Conversation

@llcnt
Copy link
Copy Markdown
Collaborator

@llcnt llcnt commented Apr 25, 2025

Description

This PR introduces a new family of algorithm: FUSER FACTORIZER !
The general idea behind matrix fusing is to concatenate $n$ matrices together (eg $W_{large}=[W_{small}, \dots, W_{small}]$) and perform a single $W_{large} \cdot x$ product instead of $n$ smaller $W _{small} \cdot x$ products.
There is no memory gains. And, in general, there is no latency gains.
However, when combined with quantization, matrix fusing can enable speedups (see huggingface/diffusers#9185)

The main changes in the code are:

  • Add a new FUSER FACTORIZER in the new fusing factorizing folder;
  • Add a new fuser factorizer for qkv matrices for diffusers models;
  • Add a new check function that look at the models in the diffusers package that are compatible with fusing (ie have some FusedAttention processors);
  • Add qkv_fusing for all compatible algorithms;
  • Delete the original model (fixture) loaded during the unit test to avoid OOM errors.

Related Issue

Fixes #(issue number)

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

I have added a unit test for the new qkv fuser on sdv1.4.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Notes

Comment thread src/pruna/engine/utils.py Outdated
@nifleisch nifleisch force-pushed the feat/qkv_fusing_and_fp8_quantization branch from 2850640 to aa361d1 Compare May 9, 2025 13:38
@nifleisch nifleisch marked this pull request as ready for review May 9, 2025 13:41
@nifleisch nifleisch requested review from begumcig and sharpenb May 9, 2025 13:41
Copy link
Copy Markdown
Collaborator

@nifleisch nifleisch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🙌

Comment thread src/pruna/algorithms/factorizing/qkv_factorizing.py Outdated
Comment thread src/pruna/engine/model_checks.py Outdated
Comment thread src/pruna/engine/model_checks.py Outdated
Comment thread tests/common.py Outdated
Copy link
Copy Markdown
Member

@sharpenb sharpenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I left some comments :)

@llcnt llcnt requested review from nifleisch and sharpenb May 13, 2025 13:28
Copy link
Copy Markdown
Collaborator

@nifleisch nifleisch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still awesome PR! I only have one small remark about the ModelContext.

Comment thread src/pruna/engine/utils.py Outdated
Comment thread src/pruna/algorithms/quantization/huggingface_diffusers_int8.py Outdated
Comment thread src/pruna/engine/utils.py Outdated
Copy link
Copy Markdown
Member

@sharpenb sharpenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. It can be merged after renaming and fixing the safe_memory_cleanup :)

Comment thread tests/common.py Outdated
Comment thread tests/common.py Outdated
@llcnt llcnt requested a review from sharpenb May 14, 2025 17:47
@davidberenstein1957
Copy link
Copy Markdown
Member

@sharpenb @llcnt wrote a fix here. It seemed to be something broader that also impacted general loading.

#119

Copy link
Copy Markdown
Member

@sharpenb sharpenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only had a last comment :)

@llcnt llcnt force-pushed the feat/qkv_fusing_and_fp8_quantization branch from cbffcc4 to 0c7d7a7 Compare May 16, 2025 15:09
@llcnt llcnt force-pushed the feat/qkv_fusing_and_fp8_quantization branch from 0c7d7a7 to 6f5a331 Compare May 16, 2025 15:19
@llcnt llcnt requested a review from sharpenb May 16, 2025 15:21
Copy link
Copy Markdown
Member

@sharpenb sharpenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool, let's merge it :)

@llcnt llcnt merged commit 367ee9b into main May 16, 2025
6 checks passed
@johannaSommer johannaSommer deleted the feat/qkv_fusing_and_fp8_quantization branch May 16, 2025 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants