添加量化代码 #88

gluttony-10 · 2025-05-29T09:47:30Z

1.添加量化代码（代码量化，官方模型即可，无需额外下载）
2.添加中文翻译（可选择，使用--zh启动参数）
3.修正文本
4.添加进度条，参考了#49
5.补充精度转化
6.添加必要依赖
7.修改readme中的安装方法（安装更顺畅）
8.添加readme中的启动方法（添加了两个量化的启动方法，并给出了参考显存）
9.本来想把DF11也加上的，但是依赖的安装顺序就比较复杂，而且需要下载额外模型，因此放弃添加DF11

1.添加量化代码（代码量化，官方模型即可，无需额外下载） 2.添加中文翻译（可选择，使用--zh启动参数） 3.修正文本 4.添加进度条（参考#49） 5.补充精度转化 6.添加必要依赖 7.修改readme中的安装方法（安装更顺畅） 8.添加readme中的启动方法（添加了两个量化的启动方法，并给出了参考显存） 9.本来想把DF11也加上的，但是依赖的安装顺序就比较复杂，而且需要下载额外模型，因此放弃添加DF11

Andy1621 · 2025-05-29T14:59:58Z

Cool! I will check it tomorrow~

Andy1621 · 2025-05-30T09:48:55Z

@gluttony-10 What's your version of bitsandbytes. I encountered an error with INT8 when running it in the default environment.

    511     if mode == 'und':
--> 512         packed_query_states = self.q_proj(packed_query_sequence).view(-1, self.num_heads, self.head_dim)
    513         packed_key_states = self.k_proj(packed_query_sequence).view(-1, self.num_key_value_heads, self.head_dim)
    514         packed_value_states = self.v_proj(packed_query_sequence).view(-1, self.num_key_value_heads, self.head_dim)

File /usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py:1736, in Module._wrapped_call_impl(self, *args, **kwargs)
   1734     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1735 else:
-> 1736     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py:1747, in Module._call_impl(self, *args, **kwargs)
   1742 # If we don't have any hooks, we want to skip the rest of the logic in
   1743 # this function, and just call forward.
   1744 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1745         or _global_backward_pre_hooks or _global_backward_hooks
   1746         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1747     return forward_call(*args, **kwargs)
   1749 result = None
   1750 called_always_called_hooks = set()

File /usr/local/lib/python3.11/dist-packages/accelerate/hooks.py:170, in add_hook_to_module.<locals>.new_forward(module, *args, **kwargs)
    168         output = module._old_forward(*args, **kwargs)
    169 else:
--> 170     output = module._old_forward(*args, **kwargs)
    171 return module._hf_hook.post_forward(module, output)

File /usr/local/lib/python3.11/dist-packages/bitsandbytes/nn/modules.py:797, in Linear8bitLt.forward(self, x)
    794 if self.bias is not None and self.bias.dtype != x.dtype:
    795     self.bias.data = self.bias.data.to(x.dtype)
--> 797 out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
    799 if not self.state.has_fp16_weights:
    800     if self.state.CB is not None and self.state.CxB is not None:
    801         # we converted 8-bit row major to turing/ampere format in the first inference pass
    802         # we no longer need the row-major weight

File /usr/local/lib/python3.11/dist-packages/bitsandbytes/autograd/_functions.py:556, in matmul(A, B, out, state, threshold, bias)
    554 if threshold > 0.0:
    555     state.threshold = threshold
--> 556 return MatMul8bitLt.apply(A, B, out, bias, state)

File /usr/local/lib/python3.11/dist-packages/torch/autograd/function.py:575, in Function.apply(cls, *args, **kwargs)
    572 if not torch._C._are_functorch_transforms_active():
    573     # See NOTE: [functorch vjp and autograd interaction]
    574     args = _functorch.utils.unwrap_dead_wrappers(args)
--> 575     return super().apply(*args, **kwargs)  # type: ignore[misc]
    577 if not is_setup_ctx_defined:
    578     raise RuntimeError(
    579         "In order to use an autograd.Function with functorch transforms "
    580         "(vmap, grad, jvp, jacrev, ...), it must override the setup_context "
    581         "staticmethod. For more details, please see "
    582         "https://pytorch.org/docs/main/notes/extending.func.html"
    583     )

File /usr/local/lib/python3.11/dist-packages/bitsandbytes/autograd/_functions.py:395, in MatMul8bitLt.forward(ctx, A, B, out, bias, state)
    393 if using_igemmlt:
    394     C32A, SA = F.transform(CA, "col32")
--> 395     out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
    396     if bias is None or bias.dtype == torch.float16:
    397         # we apply the fused bias here
    398         output = F.mm_dequant(out32, Sout32, SCA, state.SCB, bias=bias)

File /usr/local/lib/python3.11/dist-packages/bitsandbytes/functional.py:2337, in igemmlt(A, B, SA, SB, out, Sout, dtype)
   2335 if has_error:
   2336     print(f"A: {shapeA}, B: {shapeB}, C: {Sout[0]}; (lda, ldb, ldc): {(lda, ldb, ldc)}; (m, n, k): {(m, n, k)}")
-> 2337     raise Exception("cublasLt ran into an error!")
   2339 torch.cuda.set_device(prev_device)
   2341 return out, Sout

Exception: cublasLt ran into an error!

gluttony-10 · 2025-05-30T10:10:48Z

Package Version

accelerate 1.7.0
aiofiles 24.1.0
annotated-types 0.7.0
anyio 4.9.0
bitsandbytes 0.46.0
certifi 2024.8.30
charset-normalizer 3.4.0
click 8.2.1
colorama 0.4.6
contourpy 1.3.2
cycler 0.12.1
decord 0.6.0
docker-pycreds 0.4.0
einops 0.8.1
exceptiongroup 1.3.0
fastapi 0.115.12
ffmpy 0.5.0
filelock 3.18.0
fonttools 4.58.1
fsspec 2025.3.2
gitdb 4.0.12
GitPython 3.1.44
gradio 5.31.0
gradio_client 1.10.1
groovy 0.1.2
h11 0.16.0
httpcore 1.0.9
httpx 0.28.1
huggingface-hub 0.29.1
idna 3.10
Jinja2 3.1.6
joblib 1.4.2
kiwisolver 1.4.8
llvmlite 0.43.0
markdown-it-py 3.0.0
MarkupSafe 3.0.2
matplotlib 3.7.0
mdurl 0.1.2
more-itertools 10.5.0
mpmath 1.3.0
networkx 3.4.2
ninja 1.11.1.4
numba 0.60.0
numpy 1.24.4
openai-whisper 20240930
opencv-python 4.7.0.72
orjson 3.10.18
packaging 25.0
pandas 2.2.3
pillow 11.2.1
pip 25.1.1
platformdirs 4.3.8
protobuf 6.31.1
psutil 7.0.0
pyarrow 11.0.0
pydantic 2.11.5
pydantic_core 2.33.2
pydub 0.25.1
Pygments 2.19.1
pyparsing 3.2.3
python-dateutil 2.9.0.post0
python-multipart 0.0.20
pytz 2025.2
PyYAML 6.0.2
regex 2024.9.11
requests 2.32.3
rich 14.0.0
ruff 0.11.11
safehttpx 0.1.6
safetensors 0.4.5
scikit-learn 1.5.2
scipy 1.10.1
semantic-version 2.10.0
sentencepiece 0.1.99
sentry-sdk 2.29.1
setproctitle 1.3.6
setuptools 80.7.1
shellingham 1.5.4
six 1.17.0
smmap 5.0.2
sniffio 1.3.1
starlette 0.46.2
sympy 1.13.1
threadpoolctl 3.5.0
tiktoken 0.8.0
tokenizers 0.21.1
tomlkit 0.13.2
torch 2.5.1+cu124
torchvision 0.20.1+cu124
tqdm 4.66.5
transformers 4.49.0
triton-windows 3.3.0.post19
typer 0.16.0
typing_extensions 4.13.2
typing-inspection 0.4.1
tzdata 2025.2
urllib3 2.2.3
uvicorn 0.34.2
wandb 0.19.11
websockets 15.0.1
wheel 0.45.1

Andy1621 · 2025-05-30T18:18:04Z

Okay~ It works on A100, but not on H-series GPU. I will merge it and update some descriptions.

gluttony-10 · 2025-05-30T18:22:59Z

Thank you for merging.
I'm sorry, but I also have no idea that H-GPU cannot run.
Best wishes!

gluttony-10 and others added 2 commits May 29, 2025 17:45

Update app.py

13f3f35

Andy1621 merged commit f517072 into ByteDance-Seed:main May 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

添加量化代码 #88

添加量化代码 #88

Uh oh!

gluttony-10 commented May 29, 2025 •

edited

Loading

Uh oh!

Andy1621 commented May 29, 2025

Uh oh!

Andy1621 commented May 30, 2025

Uh oh!

gluttony-10 commented May 30, 2025

Uh oh!

Andy1621 commented May 30, 2025

Uh oh!

gluttony-10 commented May 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

添加量化代码 #88

添加量化代码 #88

Uh oh!

Conversation

gluttony-10 commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Andy1621 commented May 29, 2025

Uh oh!

Andy1621 commented May 30, 2025

Uh oh!

gluttony-10 commented May 30, 2025

Uh oh!

Andy1621 commented May 30, 2025

Uh oh!

gluttony-10 commented May 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gluttony-10 commented May 29, 2025 •

edited

Loading