[feature]Support model loading from cache #3857

bukejiyu · 2025-09-03T10:14:31Z

新功能增加cache机制，开启后第一次loading会额外有段耗时save cache，后续会默认使用cache进行loading
额外占用量化后内存
4卡 wint4 测试

Model Name	use cache	loading耗时	saving cache 耗时	单进程峰值内存占用
ERNIE-4.5-300B-A47B-Base-PT	Fasle	184.522s	0s	9.2G
ERNIE-4.5-300B-A47B-Base-PT	True	184.522s	134.974 s	9.2G
再次loading ERNIE-4.5-300B-A47B-Base-PT	True	29.305s	0s	40G

os.environ['FD_ENABLE_MODEL_LOAD_CACHE'] = '1'
或者
export FD_ENABLE_MODEL_LOAD_CACHE=1

os.environ['FD_ENABLE_MODEL_LOAD_CACHE'] = '1'
llm = LLM(
    model=model_name_or_path,
    num_gpu_blocks_override=1024,
    tensor_parallel_size=1,
    load_choices="default_v1",
    use_cudagraph=False,
    quantization="wint4",
)
output = llm.generate(
    prompts="who are you",
    use_tqdm=True,
    sampling_params=sampling_params,
)

paddle-bot · 2025-09-03T10:14:35Z

Thanks for your contribution!

yuanlehome · 2025-09-04T12:00:05Z

fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py

        self.down_proj_scale_shape = [layer.num_local_experts, layer.hidden_size]

-        if layer.fd_config.load_config.load_choices == "default_v1":
+        if self.quant_config.is_checkpoint_bf16:


这里改动的原因是什么？

因为要cache里面的权重是量化好的，需要使用离线量化的权重

yuanlehome · 2025-09-04T12:03:53Z

fastdeploy/model_executor/load_weight_utils.py

-        time_after_load = time.time()
-        logger.info(f"Model loading took {time_after_load - time_before_load} seconds")
-        return result
+def paddle_weight_iterator(paddle_file_list: list[str]):


paddle_weight_iterator -> pdparams_weight_iterator，以后paddle的weight说不定换成另外一种格式了

yuanlehome · 2025-09-04T12:18:09Z

fastdeploy/model_executor/load_weight_utils.py

+    cache_dir = None
+    enable_cache = False
+    if envs.FD_ENABLE_MODEL_CACHE:
+        model_cache_path = os.path.join(fd_config.model_config.model, cache_path)


所有的cache_xxx字样换成weight_cache_xxx，取名要具体，cache太笼统了

也支持多机save cache？

应该支持的，多机每台机器现在都有一份权重吧？

yuanlehome · 2025-09-04T12:19:48Z

fastdeploy/model_executor/load_weight_utils.py

+                _save_model(model.state_dict(), os.path.join(tp_cache_dir, "cache.pdparams"))
+                logger.info(f"Saving model to {cache_dir}")
+            else:
+                logger.warning("skip saving")


什么在skip saving呢？日志信息要写全一点，这么写除了你估计没人能看懂

bukejiyu requested review from YuanRisheng, qingqing01 and yuanlehome September 3, 2025 10:15

bukejiyu force-pushed the save_model branch 3 times, most recently from 201ed8a to 818ec42 Compare September 4, 2025 07:53

yuanlehome reviewed Sep 4, 2025

View reviewed changes

bukejiyu force-pushed the save_model branch from fcbd933 to 4811192 Compare September 5, 2025 11:14

cache feature

1991432

bukejiyu force-pushed the save_model branch from 60a46ef to 1991432 Compare September 6, 2025 10:44

EmmonsCurse approved these changes Sep 7, 2025

View reviewed changes

bukejiyu merged commit e52ce1c into PaddlePaddle:develop Sep 7, 2025
65 of 72 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature]Support model loading from cache #3857

[feature]Support model loading from cache #3857

Uh oh!

bukejiyu commented Sep 3, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Sep 3, 2025

Uh oh!

yuanlehome Sep 4, 2025

Uh oh!

bukejiyu Sep 4, 2025

Uh oh!

yuanlehome Sep 4, 2025

Uh oh!

bukejiyu Sep 4, 2025

Uh oh!

yuanlehome Sep 4, 2025

Uh oh!

yuanlehome Sep 4, 2025

Uh oh!

bukejiyu Sep 4, 2025 •

edited

Loading

Uh oh!

yuanlehome Sep 4, 2025

Uh oh!

bukejiyu Sep 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[feature]Support model loading from cache #3857

[feature]Support model loading from cache #3857

Uh oh!

Conversation

bukejiyu commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Sep 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bukejiyu Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bukejiyu commented Sep 3, 2025 •

edited

Loading

bukejiyu Sep 4, 2025 •

edited

Loading