feat: Add Bitsandbytes quantization for transformer backend enhancement #1775 and fix: Transformer backend error on CUDA #1774 by fakezeta · Pull Request #1823 · mudler/LocalAI

fakezeta · 2024-03-12T18:39:52Z

Add BitsAndBytes Quantization and fixes embedding on CUDA devices

This PR fixes #1775 and #1774

Notes for Reviewers
LowVRAM toggles INT4 quantization
F16Memory toggles bfloat16 compute type instead of float32

This is my first PR so be kind :)

Signed commits

Yes, I signed my commits.

Add BitsAndBytes Quantization and fixes embedding on CUDA devices

netlify · 2024-03-12T18:40:05Z

✅ Deploy Preview for localai canceled.

Name	Link
🔨 Latest commit	`d304e33`
🔍 Latest deploy log	https://app.netlify.com/sites/localai/deploys/65f33abcb3c03d0008073e92

mudler · 2024-03-12T22:41:15Z

backend/python/transformers/transformers_server.py

+
+        device_map="cpu"
+
+        quantization = BitsAndBytesConfig(


as we do actually have a quantization field in the gRPC requests and BitsAndBytes is one of the available ( see https://huggingface.co/docs/transformers/main_classes/quantization#quantization), we might want to select BitsAndBytesConfig when setting quantization=BitsAndBytesConfig instead? so we keep the backward compatibility change too (this is more of a question).

Otherwise changes looks good to me, thanks @fakezeta !

Thank you @mudler I think it's a good point.

What do you think of using something like:

quantization=bnb_4bit

quantization=bnb_8bit

to select between the two options?

@fakezeta that looks reasonable to me 👍

Manage different BitsAndBytes options with the quantization: parameter in yaml

backend/python/transformers/transformers_server.py

mudler · 2024-03-14T15:01:31Z

Looks good here, however CI seems to fail: https://github.com/mudler/LocalAI/actions/runs/8282698873/job/22664377430?pr=1823

fakezeta · 2024-03-14T17:45:27Z

I was looking at the logs, the problem is related to non-cuda build.
Now test.sh is working also on non-cuda build and without bitsandbytes installed.

mudler

Looking good, thanks @fakezeta !

fixes #1775 and #1774

826dd94

Add BitsAndBytes Quantization and fixes embedding on CUDA devices

mudler reviewed Mar 12, 2024

View reviewed changes

fakezeta added 2 commits March 14, 2024 15:43

Manage 4bit and 8 bit quantization

2f73c8d

Manage different BitsAndBytes options with the quantization: parameter in yaml

Merge branch 'master' into master

0d96ed6

mudler reviewed Mar 14, 2024

View reviewed changes

backend/python/transformers/transformers_server.py Show resolved Hide resolved

mudler previously approved these changes Mar 14, 2024

View reviewed changes

fakezeta added 2 commits March 14, 2024 18:55

fix compilation errors on non CUDA environment

fbdbc58

Merge branch 'master' of https://github.com/fakezeta/LocalAI

d304e33

fakezeta dismissed mudler’s stale review via d304e33 March 14, 2024 17:58

mudler enabled auto-merge (squash) March 14, 2024 18:33

mudler approved these changes Mar 14, 2024

View reviewed changes

mudler added the enhancement New feature or request label Mar 14, 2024

mudler disabled auto-merge March 14, 2024 22:06

mudler merged commit 3882130 into mudler:master Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Bitsandbytes quantization for transformer backend enhancement #1775 and fix: Transformer backend error on CUDA #1774#1823

feat: Add Bitsandbytes quantization for transformer backend enhancement #1775 and fix: Transformer backend error on CUDA #1774#1823
mudler merged 5 commits intomudler:masterfrom
fakezeta:master

fakezeta commented Mar 12, 2024 •

edited

Loading

Uh oh!

netlify bot commented Mar 12, 2024 •

edited

Loading

Uh oh!

mudler Mar 12, 2024 •

edited

Loading

Uh oh!

fakezeta Mar 13, 2024

Uh oh!

mudler Mar 13, 2024

Uh oh!

Uh oh!

mudler commented Mar 14, 2024

Uh oh!

fakezeta commented Mar 14, 2024 •

edited

Loading

Uh oh!

mudler left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

fakezeta commented Mar 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Mar 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for localai canceled.

Uh oh!

mudler Mar 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fakezeta Mar 13, 2024

Choose a reason for hiding this comment

Uh oh!

mudler Mar 13, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mudler commented Mar 14, 2024

Uh oh!

fakezeta commented Mar 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mudler left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fakezeta commented Mar 12, 2024 •

edited

Loading

netlify bot commented Mar 12, 2024 •

edited

Loading

mudler Mar 12, 2024 •

edited

Loading

fakezeta commented Mar 14, 2024 •

edited

Loading