-
Notifications
You must be signed in to change notification settings - Fork 77
Description
-
I test only use one node 0 to test qwen3-8B "numactl --all -C 0-31 -m 0 python /home/tzk/AI_Test/xFasterTransformer/benchmark/benchmark.py --model_name qwen3-4B --token_path /data/Model_File/Qwen3-4B --model_path /data/Model_File/Qwen3-4B-xft --prompt_path /home/tzk/AI_Test/xFasterTransformer/benchmark/prompt.json --batch_size 2 --iteration 1 --dtype bf16 --token_in 32 --token_out 32 --sonnet_prefix_len 200 --sonnet_count 20"

-
test all node(4 node) "./run_benchmark.sh -m qwen3-4b -mp /data/Model_File/Qwen3-4B-xft -tp /data/Model_File/Qwen3-4B -d bf16 -s 2 -i 1 -in 32 -out 32 -bs 1 -splen 200 -sc 2"
when i view the benchmark file, i can't find the code how to mutli my four process performance? Whether these performance metrics are performance metrics for the entire machine environment, I think benchmark.py looks like a single python benchmark.py...... The performance data is, but the actual test results show it's not. I want to know what the specific implementation logic is. I can't figure it out.

