feat: qkv fusing by llcnt · Pull Request #75 · PrunaAI/pruna

llcnt · 2025-04-25T16:22:04Z

Description

This PR introduces a new family of algorithm: ~~FUSER~~ FACTORIZER !
The general idea behind matrix fusing is to concatenate $n$ matrices together (eg $W_{large}=[W_{small}, \dots, W_{small}]$) and perform a single $W_{large} \cdot x$ product instead of $n$ smaller $W _{small} \cdot x$ products.
There is no memory gains. And, in general, there is no latency gains.
However, when combined with quantization, matrix fusing can enable speedups (see huggingface/diffusers#9185)

The main changes in the code are:

Add a new ~~FUSER~~ FACTORIZER in the new ~~fusing~~ factorizing folder;
Add a new ~~fuser~~ factorizer for qkv matrices for diffusers models;
Add a new check function that look at the models in the diffusers package that are compatible with fusing (ie have some FusedAttention processors);
Add qkv_fusing for all compatible algorithms;
Delete the original model (fixture) loaded during the unit test to avoid OOM errors.

Related Issue

Fixes #(issue number)

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

I have added a unit test for the new qkv fuser on sdv1.4.

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Notes

nifleisch

LGTM! 🙌

sharpenb

Thanks for the PR! I left some comments :)

nifleisch

Still awesome PR! I only have one small remark about the ModelContext.

sharpenb

Looks very good. It can be merged after renaming and fixing the safe_memory_cleanup :)

davidberenstein1957 · 2025-05-15T08:58:13Z

@sharpenb @llcnt wrote a fix here. It seemed to be something broader that also impacted general loading.

#119

sharpenb

I only had a last comment :)

… removed from cuda device after smashing

…or some reason

sharpenb

Super cool, let's merge it :)

nifleisch reviewed Apr 29, 2025

View reviewed changes

Comment thread src/pruna/engine/utils.py Outdated

nifleisch force-pushed the feat/qkv_fusing_and_fp8_quantization branch from 2850640 to aa361d1 Compare May 9, 2025 13:38

nifleisch marked this pull request as ready for review May 9, 2025 13:41

nifleisch requested review from begumcig and sharpenb May 9, 2025 13:41

nifleisch approved these changes May 9, 2025

View reviewed changes

sharpenb reviewed May 12, 2025

View reviewed changes

Comment thread src/pruna/algorithms/factorizing/qkv_factorizing.py Outdated

sharpenb reviewed May 12, 2025

View reviewed changes

Comment thread src/pruna/engine/model_checks.py Outdated

sharpenb reviewed May 12, 2025

View reviewed changes

Comment thread src/pruna/engine/model_checks.py Outdated

sharpenb reviewed May 12, 2025

View reviewed changes

Comment thread tests/common.py Outdated

sharpenb requested changes May 12, 2025

View reviewed changes

llcnt requested review from nifleisch and sharpenb May 13, 2025 13:28

nifleisch requested changes May 13, 2025

View reviewed changes

Comment thread src/pruna/engine/utils.py Outdated

Comment thread src/pruna/algorithms/quantization/huggingface_diffusers_int8.py Outdated

nifleisch approved these changes May 14, 2025

View reviewed changes

Comment thread src/pruna/engine/utils.py Outdated

llcnt force-pushed the feat/qkv_fusing_and_fp8_quantization branch from 9e5f43b to fc95279 Compare May 14, 2025 13:31

llcnt mentioned this pull request May 14, 2025

[BUG] Can not load a Prunamodel from hub because smash_config does not contain the new family of algo #116

Closed

sharpenb requested changes May 14, 2025

View reviewed changes

Comment thread tests/common.py Outdated

Comment thread tests/common.py Outdated

llcnt requested a review from sharpenb May 14, 2025 17:47

sharpenb reviewed May 16, 2025

View reviewed changes

llcnt force-pushed the feat/qkv_fusing_and_fp8_quantization branch from cbffcc4 to 0c7d7a7 Compare May 16, 2025 15:09

nifleisch and others added 8 commits May 16, 2025 15:19

feat: add qkv fusing in the new family of fusers

ba177e9

feat: add compatible algo

d77818e

feat: add unit test

319050e

fix: manage import for new family test

fe616bc

fix: unit test can lead to oom if the loaded model is not deleted and…

e33ea50

… removed from cuda device after smashing

feat: change fuser name to factorizer

d4a48a3

feat: continue changing fuser name to factorizer for cache and compile

74c010d

feat: re-use the diffusers pipeline check

8c7f602

llcnt added 4 commits May 16, 2025 15:19

fix: manage fuser/factorizer remaining error after rebase

f5f3fab

feat: change name to qkv_diffusers

01b94bc

feat: add a objects_to_be_deleted kwarg to the cleanup fn

976e30f

feat: update docstring

6f5a331

llcnt force-pushed the feat/qkv_fusing_and_fp8_quantization branch from 0c7d7a7 to 6f5a331 Compare May 16, 2025 15:19

llcnt requested a review from sharpenb May 16, 2025 15:21

llcnt added 2 commits May 16, 2025 15:41

feat: merging from davids fix on test hf hub were del by the rebase f…

6d3892e

…or some reason

fix: add removed code from test fix

0d2eeb2

sharpenb approved these changes May 16, 2025

View reviewed changes

llcnt added 3 commits May 16, 2025 16:24

fix: unit test on weak reaf models fail when deleting model

8080a28

fix: use movetodevice fn

66b925e

fix: manage failing test on cpu

42fd6b4

llcnt merged commit 367ee9b into main May 16, 2025
6 checks passed

johannaSommer deleted the feat/qkv_fusing_and_fp8_quantization branch May 16, 2025 17:56

davidberenstein1957 added cacher algorithm and removed cacher labels Aug 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: qkv fusing#75

feat: qkv fusing#75
llcnt merged 17 commits intomainfrom
feat/qkv_fusing_and_fp8_quantization

llcnt commented Apr 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

nifleisch left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sharpenb left a comment

Uh oh!

nifleisch left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sharpenb left a comment

Uh oh!

Uh oh!

Uh oh!

davidberenstein1957 commented May 15, 2025

Uh oh!

sharpenb left a comment

Uh oh!

sharpenb left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

llcnt commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

How Has This Been Tested?

Checklist

Additional Notes

Uh oh!

Uh oh!

nifleisch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sharpenb left a comment

Choose a reason for hiding this comment

Uh oh!

nifleisch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sharpenb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

davidberenstein1957 commented May 15, 2025

Uh oh!

sharpenb left a comment

Choose a reason for hiding this comment

Uh oh!

sharpenb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llcnt commented Apr 25, 2025 •

edited

Loading