Skip to content

Conversation

@bukejiyu
Copy link
Collaborator

@bukejiyu bukejiyu commented Sep 3, 2025

新功能增加cache机制,开启后第一次loading会额外有段耗时save cache,后续会默认使用cache进行loading
额外占用 量化后内存
4卡 wint4 测试

Model Name use cache loading耗时 saving cache 耗时 单进程峰值内存占用
ERNIE-4.5-300B-A47B-Base-PT Fasle 184.522s 0s 9.2G
ERNIE-4.5-300B-A47B-Base-PT True 184.522s 134.974 s 9.2G
再次loading ERNIE-4.5-300B-A47B-Base-PT True 29.305s 0s 40G

os.environ['FD_ENABLE_MODEL_LOAD_CACHE'] = '1'
或者
export FD_ENABLE_MODEL_LOAD_CACHE=1

os.environ['FD_ENABLE_MODEL_LOAD_CACHE'] = '1'
llm = LLM(
    model=model_name_or_path,
    num_gpu_blocks_override=1024,
    tensor_parallel_size=1,
    load_choices="default_v1",
    use_cudagraph=False,
    quantization="wint4",
)
output = llm.generate(
    prompts="who are you",
    use_tqdm=True,
    sampling_params=sampling_params,
)

@paddle-bot
Copy link

paddle-bot bot commented Sep 3, 2025

Thanks for your contribution!

@bukejiyu bukejiyu force-pushed the save_model branch 3 times, most recently from 201ed8a to 818ec42 Compare September 4, 2025 07:53
self.down_proj_scale_shape = [layer.num_local_experts, layer.hidden_size]

if layer.fd_config.load_config.load_choices == "default_v1":
if self.quant_config.is_checkpoint_bf16:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里改动的原因是什么?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为要cache里面的权重是量化好的,需要使用 离线量化的权重

time_after_load = time.time()
logger.info(f"Model loading took {time_after_load - time_before_load} seconds")
return result
def paddle_weight_iterator(paddle_file_list: list[str]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle_weight_iterator -> pdparams_weight_iterator,以后paddle的weight说不定换成另外一种格式了

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cache_dir = None
enable_cache = False
if envs.FD_ENABLE_MODEL_CACHE:
model_cache_path = os.path.join(fd_config.model_config.model, cache_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所有的cache_xxx字样换成weight_cache_xxx,取名要具体,cache太笼统了

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

也支持多机save cache?

Copy link
Collaborator Author

@bukejiyu bukejiyu Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该支持的,多机每台机器现在都有一份权重吧?

_save_model(model.state_dict(), os.path.join(tp_cache_dir, "cache.pdparams"))
logger.info(f"Saving model to {cache_dir}")
else:
logger.warning("skip saving")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

什么在skip saving呢?日志信息要写全一点,这么写除了你估计没人能看懂

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@bukejiyu bukejiyu merged commit e52ce1c into PaddlePaddle:develop Sep 7, 2025
65 of 72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants