[Enhancement]create huggingface_gptneox_convert.py by zhang-ge-hao · Pull Request #569 · NVIDIA/FasterTransformer

zhang-ge-hao · 2023-04-20T06:39:44Z

Let the gptneox HF model file can convert to FT model.

Mainly based on tabby's script, thanks to @wsxiaoys.

But original script got error using tensor parallel.(link)

I fixed the problem and hope to open this script for everyone to use.

Signed-off-by: AkiyamaYummy <842720660@qq.com>

zhang-ge-hao · 2023-04-20T06:51:37Z

Examples:

Get HF model files:

git lfs clone https://huggingface.co/TabbyML/NeoX-70M
git lfs clone https://huggingface.co/TabbyML/NeoX-1.3B

Convert to the 1-gpu model files:

python ../examples/pytorch/gptneox/utils/huggingface_gptneox_convert.py -i ../models/gptneox/model/NeoX-70M -o ../models/gptneox/c-model/NeoX-70M -i_g 1 -m_n gptneox
python ../examples/pytorch/gptneox/utils/huggingface_gptneox_convert.py -i ../models/gptneox/model/NeoX-1.3B -o ../models/gptneox/c-model/NeoX-1.3B -i_g 1 -m_n gptneox

Convert to the 2-gpu model files(to use tensor parallel):

python ../examples/pytorch/gptneox/utils/huggingface_gptneox_convert.py -i ../models/gptneox/model/NeoX-70M -o ../models/gptneox/c-model/NeoX-70M -i_g 2 -m_n gptneox
python ../examples/pytorch/gptneox/utils/huggingface_gptneox_convert.py -i ../models/gptneox/model/NeoX-1.3B -o ../models/gptneox/c-model/NeoX-1.3B -i_g 2 -m_n gptneox

Run and validate 1-GPU model files:

python ../examples/pytorch/gptneox/gptneox_example.py --ckpt_path ../models/gptneox/c-model/NeoX-70M/1-gpu --tokenizer_path ../models/gptneox/model/NeoX-70M --sample_input_file gptneox_input
python ../examples/pytorch/gptneox/gptneox_example.py --ckpt_path ../models/gptneox/c-model/NeoX-1.3B/1-gpu --tokenizer_path ../models/gptneox/model/NeoX-1.3B --sample_input_file gptneox_input

Run and validate tensor parallel model files:

mpirun -n 2 --allow-run-as-root python ../examples/pytorch/gptneox/gptneox_example.py --ckpt_path ../models/gptneox/c-model/NeoX-70M/2-gpu --tokenizer_path ../models/gptneox/model/NeoX-70M --sample_input_file gptneox_input --tensor_para_size 2
mpirun -n 2 --allow-run-as-root python ../examples/pytorch/gptneox/gptneox_example.py --ckpt_path ../models/gptneox/c-model/NeoX-1.3B/2-gpu --tokenizer_path ../models/gptneox/model/NeoX-70M --sample_input_file gptneox_input --tensor_para_size 2

Results:

root@f0305219ab2b:/workspace/FasterTransformer/build# python ../examples/pytorch/gptneox/gptneox_example.py --ckpt_path ../models/gptneox/c-model/NeoX-70M/1-gpu --tokenizer_path ../models/gptneox/model/NeoX-70M --sample_input_file gptneox_input

=============== Arguments ===============
output_len: 32
beam_width: 1
top_k: 1
top_p: 0.0
temperature: 1.0
len_penalty: 0.0
beam_search_diversity_rate: 0.0
tensor_para_size: 1
pipeline_para_size: 1
ckpt_path: ../models/gptneox/c-model/NeoX-70M/1-gpu
tokenizer_path: ../models/gptneox/model/NeoX-70M
lib_path: ./lib/libth_transformer.so
sample_input_file: gptneox_input
max_batch_size: 8
repetition_penalty: 1.0
max_seq_len: 1024
inference_data_type: fp16
time: False
enable_random_seed: False
=========================================

[INFO] batch size: 2
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][WARNING] Skip NCCL initialization since requested tensor/pipeline parallel sizes are equals to 1.
[INFO] batch 0, beam 0:
[Context]
Hello,

[Output]


I'm a bit confused about the name of the game. I'm a bit confused about the name of the game. I'm a bit confused about<|endoftext|><|endoftext|>

[INFO] batch 1, beam 0:
[Context]
Gama start,

[Output]
 and then the next time you see the same thing, you'll see the same thing again.

I'm not sure if I'm going to be able

root@f0305219ab2b:/workspace/FasterTransformer/build# python ../examples/pytorch/gptneox/gptneox_example.py --ckpt_path ../models/gptneox/c-model/NeoX-1.3B/1-gpu --tokenizer_path ../models/gptneox/model/NeoX-1.3B --sample_input_file gptneox_input

=============== Arguments ===============
output_len: 32
beam_width: 1
top_k: 1
top_p: 0.0
temperature: 1.0
len_penalty: 0.0
beam_search_diversity_rate: 0.0
tensor_para_size: 1
pipeline_para_size: 1
ckpt_path: ../models/gptneox/c-model/NeoX-1.3B/1-gpu
tokenizer_path: ../models/gptneox/model/NeoX-1.3B
lib_path: ./lib/libth_transformer.so
sample_input_file: gptneox_input
max_batch_size: 8
repetition_penalty: 1.0
max_seq_len: 1024
inference_data_type: fp16
time: False
enable_random_seed: False
=========================================

[INFO] batch size: 2
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][WARNING] Skip NCCL initialization since requested tensor/pipeline parallel sizes are equals to 1.
[INFO] batch 0, beam 0:
[Context]
Hello,

[Output]
 I'm a newbie. I'm trying to install Ubuntu on my laptop. I have a Dell Inspiron 1525 with Windows 7. I have a USB<|endoftext|><|endoftext|>

[INFO] batch 1, beam 0:
[Context]
Gama start,

[Output]
 and the first of the two-day event.

The first day of the Gama start was a bit of a letdown. The first few laps

root@f0305219ab2b:/workspace/FasterTransformer/build# mpirun -n 2 --allow-run-as-root python ../examples/pytorch/gptneox/gptneox_example.py --ckpt_path ../models/gptneox/c-model/NeoX-70M/2-gpu --tokenizer_path ../models/gptneox/model/NeoX-70M --sample_input_file gptneox_input --tensor_para_size 2

=============== Arguments ===============
output_len: 32
beam_width: 1
top_k: 1
top_p: 0.0
temperature: 1.0
len_penalty: 0.0
beam_search_diversity_rate: 0.0
tensor_para_size: 2
pipeline_para_size: 1
ckpt_path: ../models/gptneox/c-model/NeoX-70M/2-gpu
tokenizer_path: ../models/gptneox/model/NeoX-70M
lib_path: ./lib/libth_transformer.so
sample_input_file: gptneox_input

=============== Arguments ===============
output_len: 32
beam_width: 1
top_k: 1
top_p: 0.0
temperature: 1.0
len_penalty: 0.0
beam_search_diversity_rate: 0.0
max_batch_size: 8
repetition_penalty: 1.0
max_seq_len: 1024
inference_data_type: fp16
time: False
enable_random_seed: False
=========================================

tensor_para_size: 2
pipeline_para_size: 1
ckpt_path: ../models/gptneox/c-model/NeoX-70M/2-gpu
tokenizer_path: ../models/gptneox/model/NeoX-70M
lib_path: ./lib/libth_transformer.so
sample_input_file: gptneox_input
max_batch_size: 8
repetition_penalty: 1.0
max_seq_len: 1024
inference_data_type: fp16
time: False
enable_random_seed: False
=========================================

[INFO] batch size: 2
[INFO] batch size: 2
[INFO] WARNING: Have initialized the process group
[INFO] WARNING: Have initialized the process group
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][INFO] NCCL initialized rank=1 world_size=2 tensor_para=NcclParam[rank=1, world_size=2, nccl_comm=0x55c57ea19a60] pipeline_para=NcclParam[rank=0, world_size=1, nccl_comm=0x55c57d9102b0]
[FT][INFO] NCCL initialized rank=0 world_size=2 tensor_para=NcclParam[rank=0, world_size=2, nccl_comm=0x555d83d01d00] pipeline_para=NcclParam[rank=0, world_size=1, nccl_comm=0x555d83145c90]
[INFO] batch 0, beam 0:
[Context]
Hello,

[Output]


I'm a bit confused about the name of the game. I'm a bit confused about the name of the game. I'm a bit confused about<|endoftext|><|endoftext|>

[INFO] batch 1, beam 0:
[Context]
Gama start,

[Output]
 and then the next time you see the same thing, you'll see the same thing again.

I'm not sure if I'm going to be able

root@f0305219ab2b:/workspace/FasterTransformer/build# mpirun -n 2 --allow-run-as-root python ../examples/pytorch/gptneox/gptneox_example.py --ckpt_path ../models/gptneox/c-model/NeoX-1.3B/2-gpu --tokenizer_path ../models/gptneox/model/NeoX-70M --sample_input_file gptneox_input --tensor_para_size 2

=============== Arguments ===============
output_len: 32
beam_width: 1
top_k: 1
top_p: 0.0
temperature: 1.0
len_penalty: 0.0
beam_search_diversity_rate: 0.0
tensor_para_size: 2
pipeline_para_size: 1
ckpt_path: ../models/gptneox/c-model/NeoX-1.3B/2-gpu
tokenizer_path: ../models/gptneox/model/NeoX-70M
lib_path: ./lib/libth_transformer.so
sample_input_file: gptneox_input
max_batch_size: 8
repetition_penalty: 1.0
max_seq_len: 1024
inference_data_type: fp16
time: False
enable_random_seed: False
=========================================


=============== Arguments ===============
output_len: 32
beam_width: 1
top_k: 1
top_p: 0.0
temperature: 1.0
len_penalty: 0.0
beam_search_diversity_rate: 0.0
tensor_para_size: 2
pipeline_para_size: 1
ckpt_path: ../models/gptneox/c-model/NeoX-1.3B/2-gpu
tokenizer_path: ../models/gptneox/model/NeoX-70M
lib_path: ./lib/libth_transformer.so
sample_input_file: gptneox_input
max_batch_size: 8
repetition_penalty: 1.0
max_seq_len: 1024
inference_data_type: fp16
time: False
enable_random_seed: False
=========================================

[INFO] batch size: 2
[INFO] batch size: 2
[INFO] WARNING: Have initialized the process group
[INFO] WARNING: Have initialized the process group
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][INFO] NCCL initialized rank=0 world_size=2 tensor_para=NcclParam[rank=0, world_size=2, nccl_comm=0x5616bfcc8ac0] pipeline_para=NcclParam[rank=0, world_size=1, nccl_comm=0x5616c1590d90]
[FT][INFO] NCCL initialized rank=1 world_size=2 tensor_para=NcclParam[rank=1, world_size=2, nccl_comm=0x5611d61198b0] pipeline_para=NcclParam[rank=0, world_size=1, nccl_comm=0x5611d5963340]
[INFO] batch 0, beam 0:
[Context]
Hello,

[Output]
 I'm a newbie. I'm trying to install Ubuntu on my laptop. I have a Dell Inspiron 1525 with Windows 7. I have a USB<|endoftext|><|endoftext|>

[INFO] batch 1, beam 0:
[Context]
Gama start,

[Output]
 and the first of the two-day event.

The first day of the Gama start was a bit of a letdown. The first few laps

zhang-ge-hao · 2023-04-20T07:05:21Z

@byshiue

Hi, looking forward to you taking a look.

I add several cases and hope to make it an easy-to-validate PR. 🤗🤗🤗

Signed-off-by: AkiyamaYummy <842720660@qq.com>

wsxiaoys · 2023-04-20T08:51:38Z

Great work! If you’re interested, welcome to also send out a PR to tabby as well

byshiue · 2023-04-20T09:01:41Z

Thank you for the great work. Can you help adding some examples into the gptneox_guide.md?

Signed-off-by: AkiyamaYummy <842720660@qq.com>

zhang-ge-hao · 2023-04-20T09:37:39Z

Thank you for the great work. Can you help adding some examples into the gptneox_guide.md?

@byshiue Updated.

cksac · 2023-04-21T03:32:13Z

will this works for https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b?, which seems same format as gptneox

zhang-ge-hao · 2023-04-21T03:36:18Z

will this works for https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b?, which seems same format as gptneox

if the model can be loaded by GPTNeoXForCausalLM, the script can theoretically convert it.

if not, I will consider adding this support when I have time.

zhang-ge-hao · 2023-04-21T03:57:05Z

@byshiue

Hi, looking forward to you taking a look, again. 🤗🤗🤗

zhang-ge-hao · 2023-04-24T02:23:32Z

@byshiue

If there is anything else I need to change, I will update it as soon as possible.

byshiue · 2023-04-24T02:52:07Z

@byshiue

If there is anything else I need to change, I will update it as soon as possible.

Thank you for the work. We are waiting the internal unit tests and it looks well now.

hmzo · 2023-05-24T08:01:51Z

@AkiyamaYummy

FasterTransformer/examples/pytorch/gptneox/utils/huggingface_gptneox_convert.py

Line 40 in c6e8f60

val = (val / factor) if factor > 1 else val

hello, thank you for the great work! I have a question about this convert script: why do we need scale bias here when tp > 1?

zhang-ge-hao · 2023-05-25T04:33:14Z

@AkiyamaYummy

FasterTransformer/examples/pytorch/gptneox/utils/huggingface_gptneox_convert.py

Line 40 in c6e8f60

val = (val / factor) if factor > 1 else val

hello, thank you for the great work! I have a question about this convert script: why do we need scale bias here when tp > 1?

FT will respectfully add the multi bias in multi GPUs when tp > 1, and all-reduce them afterward.

If not scale the biases, it's equal to they will be added multi times.

* Update beam_search_topk_kernels.cu fix: fix bug of beam search * fix: change int of some kernels to int64_t to prevent overflow * fix: gpt tensor shapes inconsistency (NVIDIA#505) Signed-off-by: AkiyamaYummy <842720660@qq.com> * Update gpt_guide.md (NVIDIA#529) * fix: fix bug of gpt buffer and gpt gemm overflow * Update T5DecodingWeight.cc fix: fix loading bug of t5 * [Enhancement]add pytorch backend support for gptneox (NVIDIA#550) * add pytorch backend support for gptneox Signed-off-by: AkiyamaYummy <842720660@qq.com> * fix early stopping invalid * 1) Some unused parameters and logic have been removed. 2) Revisions that would affect pipeline parallelism have been reverted. 3) The code has been made capable of direct validation on TabbyML/NeoX-1.3B. Signed-off-by: AkiyamaYummy <842720660@qq.com> * Change the names of classes, removing 'parallel' from their names Signed-off-by: AkiyamaYummy <842720660@qq.com> * Format the code. Signed-off-by: AkiyamaYummy <842720660@qq.com> * Only print results when rank is 0. Signed-off-by: AkiyamaYummy <842720660@qq.com> * Add dist.init_process_group(). Signed-off-by: AkiyamaYummy <842720660@qq.com> * update docs Signed-off-by: AkiyamaYummy <842720660@qq.com> --------- Signed-off-by: AkiyamaYummy <842720660@qq.com> * Update cublasMMWrapper.cc Fix the CUBLAS_VERSION checking of cublasMMWrapper * Update cublasMMWrapper.cc * fix overflow in softmax_kernel when process long seqlen and big batch_size (NVIDIA#524) * Update unfused_attention_kernels.cu fix bug of softmax kernel * [Enhancement]create huggingface_gptneox_convert.py (NVIDIA#569) * create huggingface_gptneox_convert.py Signed-off-by: AkiyamaYummy <842720660@qq.com> * adjust HF's multi bin files Signed-off-by: AkiyamaYummy <842720660@qq.com> * update gptneox_guide.md Signed-off-by: AkiyamaYummy <842720660@qq.com> --------- Signed-off-by: AkiyamaYummy <842720660@qq.com> * perf(bloom): improve performance of huggingface_bloom_convert.py, decrease the time cost and the mem using (NVIDIA#568) Co-authored-by: r.yang <r.yang@tianrang-inc.com> * Fix/gpt early stop (NVIDIA#584) * fix: fix bug of early stopping of gpt * [bugfix] Fix 2-shot All Reduce correctness issue (indexing bug). (NVIDIA#672) FasterTransformer 2-shot all reduce is implemented as a reduce-scatter + all-gather. There is an indexing bug in the all-gather step. Prior to this change, 2-shot all reduce was only producing correct results on device 0. Now, all devices have the correct results. * fix: swap tensor bug (NVIDIA#683) * Support size_per_head=112 (NVIDIA#660) * fix multi-gpu build * add support for size_per_head=112 for gpt decoder * remove mpi_cxx from multi-gpu build for now (NVIDIA#705) --------- Signed-off-by: AkiyamaYummy <842720660@qq.com> Co-authored-by: byshiue <bhsueh@nvidia.com> Co-authored-by: _yummy_ <842720660@qq.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com> Co-authored-by: 杨睿 <595403043@qq.com> Co-authored-by: r.yang <r.yang@tianrang-inc.com> Co-authored-by: Rahul Kindi <rkindi@users.noreply.github.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Daya Khudia <37562707+dskhudia@users.noreply.github.com> Co-authored-by: Dean Wyatte <2512762+dwyatte@users.noreply.github.com>

* Merge with main (#1) * Update beam_search_topk_kernels.cu fix: fix bug of beam search * fix: change int of some kernels to int64_t to prevent overflow * fix: gpt tensor shapes inconsistency (NVIDIA#505) Signed-off-by: AkiyamaYummy <842720660@qq.com> * Update gpt_guide.md (NVIDIA#529) * fix: fix bug of gpt buffer and gpt gemm overflow * Update T5DecodingWeight.cc fix: fix loading bug of t5 * [Enhancement]add pytorch backend support for gptneox (NVIDIA#550) * add pytorch backend support for gptneox Signed-off-by: AkiyamaYummy <842720660@qq.com> * fix early stopping invalid * 1) Some unused parameters and logic have been removed. 2) Revisions that would affect pipeline parallelism have been reverted. 3) The code has been made capable of direct validation on TabbyML/NeoX-1.3B. Signed-off-by: AkiyamaYummy <842720660@qq.com> * Change the names of classes, removing 'parallel' from their names Signed-off-by: AkiyamaYummy <842720660@qq.com> * Format the code. Signed-off-by: AkiyamaYummy <842720660@qq.com> * Only print results when rank is 0. Signed-off-by: AkiyamaYummy <842720660@qq.com> * Add dist.init_process_group(). Signed-off-by: AkiyamaYummy <842720660@qq.com> * update docs Signed-off-by: AkiyamaYummy <842720660@qq.com> --------- Signed-off-by: AkiyamaYummy <842720660@qq.com> * Update cublasMMWrapper.cc Fix the CUBLAS_VERSION checking of cublasMMWrapper * Update cublasMMWrapper.cc * fix overflow in softmax_kernel when process long seqlen and big batch_size (NVIDIA#524) * Update unfused_attention_kernels.cu fix bug of softmax kernel * [Enhancement]create huggingface_gptneox_convert.py (NVIDIA#569) * create huggingface_gptneox_convert.py Signed-off-by: AkiyamaYummy <842720660@qq.com> * adjust HF's multi bin files Signed-off-by: AkiyamaYummy <842720660@qq.com> * update gptneox_guide.md Signed-off-by: AkiyamaYummy <842720660@qq.com> --------- Signed-off-by: AkiyamaYummy <842720660@qq.com> * perf(bloom): improve performance of huggingface_bloom_convert.py, decrease the time cost and the mem using (NVIDIA#568) Co-authored-by: r.yang <r.yang@tianrang-inc.com> * Fix/gpt early stop (NVIDIA#584) * fix: fix bug of early stopping of gpt * [bugfix] Fix 2-shot All Reduce correctness issue (indexing bug). (NVIDIA#672) FasterTransformer 2-shot all reduce is implemented as a reduce-scatter + all-gather. There is an indexing bug in the all-gather step. Prior to this change, 2-shot all reduce was only producing correct results on device 0. Now, all devices have the correct results. * fix: swap tensor bug (NVIDIA#683) * Support size_per_head=112 (NVIDIA#660) * fix multi-gpu build * add support for size_per_head=112 for gpt decoder * remove mpi_cxx from multi-gpu build for now (NVIDIA#705) --------- Signed-off-by: AkiyamaYummy <842720660@qq.com> Co-authored-by: byshiue <bhsueh@nvidia.com> Co-authored-by: _yummy_ <842720660@qq.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com> Co-authored-by: 杨睿 <595403043@qq.com> Co-authored-by: r.yang <r.yang@tianrang-inc.com> Co-authored-by: Rahul Kindi <rkindi@users.noreply.github.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Daya Khudia <37562707+dskhudia@users.noreply.github.com> Co-authored-by: Dean Wyatte <2512762+dwyatte@users.noreply.github.com> * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit * commit --------- Signed-off-by: AkiyamaYummy <842720660@qq.com> Co-authored-by: Asim Shankar <asim.shankar@snowflake.com> Co-authored-by: byshiue <bhsueh@nvidia.com> Co-authored-by: _yummy_ <842720660@qq.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com> Co-authored-by: 杨睿 <595403043@qq.com> Co-authored-by: r.yang <r.yang@tianrang-inc.com> Co-authored-by: Rahul Kindi <rkindi@users.noreply.github.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Daya Khudia <37562707+dskhudia@users.noreply.github.com> Co-authored-by: Dean Wyatte <2512762+dwyatte@users.noreply.github.com>

create huggingface_gptneox_convert.py

e789816

Signed-off-by: AkiyamaYummy <842720660@qq.com>

adjust HF's multi bin files

d4b8f7b

Signed-off-by: AkiyamaYummy <842720660@qq.com>

update gptneox_guide.md

a2e50d2

Signed-off-by: AkiyamaYummy <842720660@qq.com>

byshiue merged commit 3460e20 into NVIDIA:main Apr 24, 2023

arnocandel mentioned this pull request Apr 27, 2023

NVIDIA Triton inference support h2oai/h2ogpt#87

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement]create huggingface_gptneox_convert.py#569

[Enhancement]create huggingface_gptneox_convert.py#569
byshiue merged 3 commits intoNVIDIA:mainfrom
zhang-ge-hao:huggingface_gptneox_convert

zhang-ge-hao commented Apr 20, 2023

Uh oh!

zhang-ge-hao commented Apr 20, 2023

Uh oh!

zhang-ge-hao commented Apr 20, 2023

Uh oh!

wsxiaoys commented Apr 20, 2023

Uh oh!

byshiue commented Apr 20, 2023

Uh oh!

zhang-ge-hao commented Apr 20, 2023

Uh oh!

cksac commented Apr 21, 2023

Uh oh!

zhang-ge-hao commented Apr 21, 2023 •

edited

Loading

Uh oh!

zhang-ge-hao commented Apr 21, 2023

Uh oh!

zhang-ge-hao commented Apr 24, 2023

Uh oh!

byshiue commented Apr 24, 2023

Uh oh!

hmzo commented May 24, 2023

Uh oh!

zhang-ge-hao commented May 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

zhang-ge-hao commented Apr 20, 2023

Uh oh!

zhang-ge-hao commented Apr 20, 2023

Examples:

Get HF model files:

Convert to the 1-gpu model files:

Convert to the 2-gpu model files(to use tensor parallel):

Run and validate 1-GPU model files:

Run and validate tensor parallel model files:

Results:

Uh oh!

zhang-ge-hao commented Apr 20, 2023

Uh oh!

wsxiaoys commented Apr 20, 2023

Uh oh!

byshiue commented Apr 20, 2023

Uh oh!

zhang-ge-hao commented Apr 20, 2023

Uh oh!

cksac commented Apr 21, 2023

Uh oh!

zhang-ge-hao commented Apr 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhang-ge-hao commented Apr 21, 2023

Uh oh!

zhang-ge-hao commented Apr 24, 2023

Uh oh!

byshiue commented Apr 24, 2023

Uh oh!

hmzo commented May 24, 2023

Uh oh!

zhang-ge-hao commented May 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zhang-ge-hao commented Apr 21, 2023 •

edited

Loading