how benchmark compute the multi process proformance?

1. I test only use one node 0 to test qwen3-8B "numactl --all -C 0-31 -m 0 python /home/tzk/AI_Test/xFasterTransformer/benchmark/benchmark.py --model_name qwen3-4B --token_path /data/Model_File/Qwen3-4B --model_path /data/Model_File/Qwen3-4B-xft --prompt_path /home/tzk/AI_Test/xFasterTransformer/benchmark/prompt.json --batch_size 2 --iteration 1 --dtype bf16 --token_in 32 --token_out 32 --sonnet_prefix_len 200 --sonnet_count 20"
![Image](https://github.com/user-attachments/assets/cd88f01f-828a-4ea9-a4ec-f8f6cc0de45e)

2. test all node(4 node) "./run_benchmark.sh -m qwen3-4b -mp /data/Model_File/Qwen3-4B-xft -tp /data/Model_File/Qwen3-4B -d bf16 -s 2 -i 1 -in 32 -out 32 -bs 1 -splen 200 -sc 2"

![Image](https://github.com/user-attachments/assets/bf689292-593f-43c3-9568-9ac68ee51334)

when i view the benchmark file, i can't find the code how to mutli my four process performance? Whether these performance metrics are performance metrics for the entire machine environment, I think benchmark.py looks like a single python benchmark.py...... The performance data is, but the actual test results show it's not. I want to know what the specific implementation logic is. I can't figure it out.

![Image](https://github.com/user-attachments/assets/651f406e-e5d9-4768-8438-8dbcfb59a0f4)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how benchmark compute the multi process proformance? #510

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

how benchmark compute the multi process proformance? #510

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions