Skip to content

feat: Add Bitsandbytes quantization for transformer backend enhancement #1775 and fix: Transformer backend error on CUDA #1774#1823

Merged
mudler merged 5 commits intomudler:masterfrom
fakezeta:master
Mar 14, 2024
Merged

feat: Add Bitsandbytes quantization for transformer backend enhancement #1775 and fix: Transformer backend error on CUDA #1774#1823
mudler merged 5 commits intomudler:masterfrom
fakezeta:master

Conversation

@fakezeta
Copy link
Collaborator

@fakezeta fakezeta commented Mar 12, 2024

Add BitsAndBytes Quantization and fixes embedding on CUDA devices

This PR fixes #1775 and #1774

Notes for Reviewers
LowVRAM toggles INT4 quantization
F16Memory toggles bfloat16 compute type instead of float32

This is my first PR so be kind :)

Signed commits

  • Yes, I signed my commits.

Add BitsAndBytes Quantization and fixes embedding on CUDA devices
@netlify
Copy link

netlify bot commented Mar 12, 2024

Deploy Preview for localai canceled.

Name Link
🔨 Latest commit d304e33
🔍 Latest deploy log https://app.netlify.com/sites/localai/deploys/65f33abcb3c03d0008073e92


device_map="cpu"

quantization = BitsAndBytesConfig(
Copy link
Owner

@mudler mudler Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we do actually have a quantization field in the gRPC requests and BitsAndBytes is one of the available ( see https://huggingface.co/docs/transformers/main_classes/quantization#quantization), we might want to select BitsAndBytesConfig when setting quantization=BitsAndBytesConfig instead? so we keep the backward compatibility change too (this is more of a question).

Otherwise changes looks good to me, thanks @fakezeta !

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mudler I think it's a good point.

What do you think of using something like:

  • quantization=bnb_4bit
  • quantization=bnb_8bit

to select between the two options?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fakezeta that looks reasonable to me 👍

Manage different BitsAndBytes options with the quantization: parameter in yaml
mudler
mudler previously approved these changes Mar 14, 2024
@mudler
Copy link
Owner

mudler commented Mar 14, 2024

@fakezeta
Copy link
Collaborator Author

fakezeta commented Mar 14, 2024

I was looking at the logs, the problem is related to non-cuda build.
Now test.sh is working also on non-cuda build and without bitsandbytes installed.

@mudler mudler enabled auto-merge (squash) March 14, 2024 18:33
Copy link
Owner

@mudler mudler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thanks @fakezeta !

@mudler mudler added the enhancement New feature or request label Mar 14, 2024
@mudler mudler disabled auto-merge March 14, 2024 22:06
@mudler mudler merged commit 3882130 into mudler:master Mar 14, 2024
truecharts-admin referenced this pull request in trueforge-org/truecharts Mar 17, 2024
…0.0@5cd0285 by renovate (#19391)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) |
minor | `v2.9.0` -> `v2.10.0` |

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>mudler/LocalAI (docker.io/localai/localai)</summary>

###
[`v2.10.0`](https://togithub.com/mudler/LocalAI/releases/tag/v2.10.0)

[Compare
Source](https://togithub.com/mudler/LocalAI/compare/v2.9.0...v2.10.0)

##### LocalAI v2.10.0 Release Notes

Excited to announce the release of LocalAI v2.10.0! This version
introduces significant changes, including breaking changes, numerous bug
fixes, exciting new features, dependency updates, and more. Here's a
summary of what's new:

##### Breaking Changes 🛠

- The `trust_remote_code` setting in the YAML config file of the model
are now consumed for enhanced security measures also for the AutoGPTQ
and transformers backend, thanks to
[@&#8203;dave-gray101](https://togithub.com/dave-gray101)'s contribution
([#&#8203;1799](https://togithub.com/mudler/LocalAI/pull/1799)). If your
model relied on the old behavior and you are sure of what you are doing,
set `trust_remote_code: true` in the YAML config file.

##### Bug Fixes 🐛

- Various fixes have been implemented to enhance the stability and
performance of LocalAI:
- SSE no longer omits empty `finish_reason` fields for better
compatibility with the OpenAI API, fixed by
[@&#8203;mudler](https://togithub.com/mudler)
([#&#8203;1745](https://togithub.com/mudler/LocalAI/pull/1745)).
- Functions now correctly handle scenarios with no results, also
addressed by [@&#8203;mudler](https://togithub.com/mudler)
([#&#8203;1758](https://togithub.com/mudler/LocalAI/pull/1758)).
- A Command Injection Vulnerability has been fixed by
[@&#8203;ouxs-19](https://togithub.com/ouxs-19)
([#&#8203;1778](https://togithub.com/mudler/LocalAI/pull/1778)).
- OpenCL-based builds for llama.cpp have been restored, thanks to
[@&#8203;cryptk](https://togithub.com/cryptk)'s efforts
([#&#8203;1828](https://togithub.com/mudler/LocalAI/pull/1828),
[#&#8203;1830](https://togithub.com/mudler/LocalAI/pull/1830)).
- An issue with OSX build `default.metallib` has been resolved, which
should now allow running the llama-cpp backend on Apple arm64, fixed by
[@&#8203;dave-gray101](https://togithub.com/dave-gray101)
([#&#8203;1837](https://togithub.com/mudler/LocalAI/pull/1837)).

##### Exciting New Features 🎉

-   LocalAI continues to evolve with several new features:
- Ongoing implementation of the assistants API, making great progress
thanks to community contributions, including an initial implementation
by [@&#8203;christ66](https://togithub.com/christ66)
([#&#8203;1761](https://togithub.com/mudler/LocalAI/pull/1761)).
- Addition of diffusers/transformers support for Intel GPU - now you can
generate images and use the `transformer` backend also on Intel GPUs,
implemented by [@&#8203;mudler](https://togithub.com/mudler)
([#&#8203;1746](https://togithub.com/mudler/LocalAI/pull/1746)).
- Introduction of Bitsandbytes quantization for transformer backend
enhancement and a fix for transformer backend error on CUDA by
[@&#8203;fakezeta](https://togithub.com/fakezeta)
([#&#8203;1823](https://togithub.com/mudler/LocalAI/pull/1823)).
- Compatibility layers for Elevenlabs and OpenAI TTS, enhancing
text-to-speech capabilities: Now LocalAI is compatible with Elevenlabs
and OpenAI TTS, thanks to [@&#8203;mudler](https://togithub.com/mudler)
([#&#8203;1834](https://togithub.com/mudler/LocalAI/pull/1834)).
- vLLM now supports `stream: true`! This feature was introduced by
[@&#8203;golgeek](https://togithub.com/golgeek)
([#&#8203;1749](https://togithub.com/mudler/LocalAI/pull/1749)).

##### Dependency Updates 👒

- Our continuous effort to keep dependencies up-to-date includes
multiple updates to `ggerganov/llama.cpp`, `donomii/go-rwkv.cpp`,
`mudler/go-stable-diffusion`, and others, ensuring that LocalAI is built
on the latest and most secure libraries.

##### Other Changes

- Several internal changes have been made to improve the development
process and documentation, including updates to integration guides,
stress reduction on self-hosted runners, and more.

#### Details of What's Changed

##### Breaking Changes 🛠

- feat(autogpt/transformers): consume `trust_remote_code` by
[@&#8203;dave-gray101](https://togithub.com/dave-gray101) in
[https://github.com/mudler/LocalAI/pull/1799](https://togithub.com/mudler/LocalAI/pull/1799)

##### Bug fixes 🐛

- fix(sse): do not omit empty finish_reason by
[@&#8203;mudler](https://togithub.com/mudler) in
[https://github.com/mudler/LocalAI/pull/1745](https://togithub.com/mudler/LocalAI/pull/1745)
- fix(functions): handle correctly when there are no results by
[@&#8203;mudler](https://togithub.com/mudler) in
[https://github.com/mudler/LocalAI/pull/1758](https://togithub.com/mudler/LocalAI/pull/1758)
- fix(tests): re-enable tests after code move by
[@&#8203;mudler](https://togithub.com/mudler) in
[https://github.com/mudler/LocalAI/pull/1764](https://togithub.com/mudler/LocalAI/pull/1764)
- Fix Command Injection Vulnerability by
[@&#8203;ouxs-19](https://togithub.com/ouxs-19) in
[https://github.com/mudler/LocalAI/pull/1778](https://togithub.com/mudler/LocalAI/pull/1778)
- fix: the correct BUILD_TYPE for OpenCL is clblas (with no t) by
[@&#8203;cryptk](https://togithub.com/cryptk) in
[https://github.com/mudler/LocalAI/pull/1828](https://togithub.com/mudler/LocalAI/pull/1828)
- fix: missing OpenCL libraries from docker containers during clblas
docker build by [@&#8203;cryptk](https://togithub.com/cryptk) in
[https://github.com/mudler/LocalAI/pull/1830](https://togithub.com/mudler/LocalAI/pull/1830)
- fix: osx build default.metallib by
[@&#8203;dave-gray101](https://togithub.com/dave-gray101) in
[https://github.com/mudler/LocalAI/pull/1837](https://togithub.com/mudler/LocalAI/pull/1837)

##### Exciting New Features 🎉

- fix: vllm - use AsyncLLMEngine to allow true streaming mode by
[@&#8203;golgeek](https://togithub.com/golgeek) in
[https://github.com/mudler/LocalAI/pull/1749](https://togithub.com/mudler/LocalAI/pull/1749)
- refactor: move remaining api packages to core by
[@&#8203;dave-gray101](https://togithub.com/dave-gray101) in
[https://github.com/mudler/LocalAI/pull/1731](https://togithub.com/mudler/LocalAI/pull/1731)
- Bump vLLM version + more options when loading models in vLLM by
[@&#8203;golgeek](https://togithub.com/golgeek) in
[https://github.com/mudler/LocalAI/pull/1782](https://togithub.com/mudler/LocalAI/pull/1782)
- feat(assistant): Initial implementation of assistants api by
[@&#8203;christ66](https://togithub.com/christ66) in
[https://github.com/mudler/LocalAI/pull/1761](https://togithub.com/mudler/LocalAI/pull/1761)
- feat(intel): add diffusers/transformers support by
[@&#8203;mudler](https://togithub.com/mudler) in
[https://github.com/mudler/LocalAI/pull/1746](https://togithub.com/mudler/LocalAI/pull/1746)
- fix(config): set better defaults for inferencing by
[@&#8203;mudler](https://togithub.com/mudler) in
[https://github.com/mudler/LocalAI/pull/1822](https://togithub.com/mudler/LocalAI/pull/1822)
- fix(docker-compose): update docker compose file by
[@&#8203;mudler](https://togithub.com/mudler) in
[https://github.com/mudler/LocalAI/pull/1824](https://togithub.com/mudler/LocalAI/pull/1824)
- feat(model-help): display help text in markdown by
[@&#8203;mudler](https://togithub.com/mudler) in
[https://github.com/mudler/LocalAI/pull/1825](https://togithub.com/mudler/LocalAI/pull/1825)
- feat: Add Bitsandbytes quantization for transformer backend
enhancement
[#&#8203;1775](https://togithub.com/mudler/LocalAI/issues/1775) and fix:
Transformer backend error on CUDA
[#&#8203;1774](https://togithub.com/mudler/LocalAI/issues/1774) by
[@&#8203;fakezeta](https://togithub.com/fakezeta) in
[https://github.com/mudler/LocalAI/pull/1823](https://togithub.com/mudler/LocalAI/pull/1823)
- feat(tts): add Elevenlabs and OpenAI TTS compatibility layer by
[@&#8203;mudler](https://togithub.com/mudler) in
[https://github.com/mudler/LocalAI/pull/1834](https://togithub.com/mudler/LocalAI/pull/1834)
- feat(embeddings): do not require to be configured by
[@&#8203;mudler](https://togithub.com/mudler) in
[https://github.com/mudler/LocalAI/pull/1842](https://togithub.com/mudler/LocalAI/pull/1842)

##### 👒 Dependencies

- ⬆️ Update docs version mudler/LocalAI by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1752](https://togithub.com/mudler/LocalAI/pull/1752)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1753](https://togithub.com/mudler/LocalAI/pull/1753)
- deps(llama.cpp): update by
[@&#8203;mudler](https://togithub.com/mudler) in
[https://github.com/mudler/LocalAI/pull/1759](https://togithub.com/mudler/LocalAI/pull/1759)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1756](https://togithub.com/mudler/LocalAI/pull/1756)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1767](https://togithub.com/mudler/LocalAI/pull/1767)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1772](https://togithub.com/mudler/LocalAI/pull/1772)
- ⬆️ Update donomii/go-rwkv.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1771](https://togithub.com/mudler/LocalAI/pull/1771)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1779](https://togithub.com/mudler/LocalAI/pull/1779)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1789](https://togithub.com/mudler/LocalAI/pull/1789)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1791](https://togithub.com/mudler/LocalAI/pull/1791)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1794](https://togithub.com/mudler/LocalAI/pull/1794)
- depedencies(sentencentranformers): update dependencies by
[@&#8203;TwinFinz](https://togithub.com/TwinFinz) in
[https://github.com/mudler/LocalAI/pull/1797](https://togithub.com/mudler/LocalAI/pull/1797)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1801](https://togithub.com/mudler/LocalAI/pull/1801)
- ⬆️ Update mudler/go-stable-diffusion by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1802](https://togithub.com/mudler/LocalAI/pull/1802)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1805](https://togithub.com/mudler/LocalAI/pull/1805)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1811](https://togithub.com/mudler/LocalAI/pull/1811)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1827](https://togithub.com/mudler/LocalAI/pull/1827)

##### Other Changes

- ci: add stablediffusion to release by
[@&#8203;sozercan](https://togithub.com/sozercan) in
[https://github.com/mudler/LocalAI/pull/1757](https://togithub.com/mudler/LocalAI/pull/1757)
- Update integrations.md by
[@&#8203;Joshhua5](https://togithub.com/Joshhua5) in
[https://github.com/mudler/LocalAI/pull/1765](https://togithub.com/mudler/LocalAI/pull/1765)
- ci: reduce stress on self-hosted runners by
[@&#8203;mudler](https://togithub.com/mudler) in
[https://github.com/mudler/LocalAI/pull/1776](https://togithub.com/mudler/LocalAI/pull/1776)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1785](https://togithub.com/mudler/LocalAI/pull/1785)
- Revert "feat(assistant): Initial implementation of assistants api" by
[@&#8203;mudler](https://togithub.com/mudler) in
[https://github.com/mudler/LocalAI/pull/1790](https://togithub.com/mudler/LocalAI/pull/1790)
- Edit links in readme and integrations page by
[@&#8203;lunamidori5](https://togithub.com/lunamidori5) in
[https://github.com/mudler/LocalAI/pull/1796](https://togithub.com/mudler/LocalAI/pull/1796)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1813](https://togithub.com/mudler/LocalAI/pull/1813)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1816](https://togithub.com/mudler/LocalAI/pull/1816)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1818](https://togithub.com/mudler/LocalAI/pull/1818)
- fix(doc/examples): set defaults to mirostat by
[@&#8203;mudler](https://togithub.com/mudler) in
[https://github.com/mudler/LocalAI/pull/1820](https://togithub.com/mudler/LocalAI/pull/1820)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1821](https://togithub.com/mudler/LocalAI/pull/1821)
- fix: OSX Build Files for llama.cpp by
[@&#8203;dave-gray101](https://togithub.com/dave-gray101) in
[https://github.com/mudler/LocalAI/pull/1836](https://togithub.com/mudler/LocalAI/pull/1836)
- ⬆️ Update go-skynet/go-llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1835](https://togithub.com/mudler/LocalAI/pull/1835)
- docs(transformers): add docs section about transformers by
[@&#8203;mudler](https://togithub.com/mudler) in
[https://github.com/mudler/LocalAI/pull/1841](https://togithub.com/mudler/LocalAI/pull/1841)
- ⬆️ Update mudler/go-piper by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1844](https://togithub.com/mudler/LocalAI/pull/1844)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1840](https://togithub.com/mudler/LocalAI/pull/1840)

#### New Contributors

- [@&#8203;golgeek](https://togithub.com/golgeek) made their first
contribution in
[https://github.com/mudler/LocalAI/pull/1749](https://togithub.com/mudler/LocalAI/pull/1749)
- [@&#8203;Joshhua5](https://togithub.com/Joshhua5) made their first
contribution in
[https://github.com/mudler/LocalAI/pull/1765](https://togithub.com/mudler/LocalAI/pull/1765)
- [@&#8203;ouxs-19](https://togithub.com/ouxs-19) made their first
contribution in
[https://github.com/mudler/LocalAI/pull/1778](https://togithub.com/mudler/LocalAI/pull/1778)
- [@&#8203;TwinFinz](https://togithub.com/TwinFinz) made their first
contribution in
[https://github.com/mudler/LocalAI/pull/1797](https://togithub.com/mudler/LocalAI/pull/1797)
- [@&#8203;cryptk](https://togithub.com/cryptk) made their first
contribution in
[https://github.com/mudler/LocalAI/pull/1828](https://togithub.com/mudler/LocalAI/pull/1828)
- [@&#8203;fakezeta](https://togithub.com/fakezeta) made their first
contribution in
[https://github.com/mudler/LocalAI/pull/1823](https://togithub.com/mudler/LocalAI/pull/1823)

Thank you to all contributors and users for your continued support and
feedback, making LocalAI better with each release!

**Full Changelog**:
mudler/LocalAI@v2.9.0...v2.10.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://togithub.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yNTAuMSIsInVwZGF0ZWRJblZlciI6IjM3LjI1MC4xIiwidGFyZ2V0QnJhbmNoIjoibWFzdGVyIn0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add Bitsandbytes quantization for transformer backend

2 participants