Add ServerlessLLM Support by drunkcoding · Pull Request #1 · ServerlessLLM/vllm

drunkcoding · 2024-07-03T09:29:23Z

This PR creates new loader to use ServerlessLLM interface

vllm/model_executor/model_loader/loader.py

future-xy · 2024-07-04T01:56:32Z

vllm/model_executor/model_loader/loader.py

Can we call save models instead of save_tensors?

We are executing save/load state dict, not model, need to use save tensors directly

future-xy · 2024-07-04T01:58:38Z

vllm/model_executor/model_loader/loader.py

+                model = _initialize_model(model_config, self.load_config,
+                                          lora_config, vision_language_config,
+                                          cache_config)
+            state_dict = self._filter_subtensors(model.state_dict())


Will this create another copy of model parameters?

copy exactly from vllm implementation, we inherit the save behaviour

future-xy · 2024-07-04T01:59:50Z

vllm/model_executor/model_loader/loader.py

Essentially, we have one unique rank, right? so why do we need this rank here?

used in with get_cuda_memory_handles API

future-xy · 2024-07-04T02:00:09Z

vllm/model_executor/model_loader/loader.py

What is this 0?

every tensor has its own base address, so GPU offset is always 0

vllm/model_executor/model_loader/loader.py

SiyangShao · 2024-07-04T05:26:00Z

vllm/executor/distributed_gpu_executor.py

Could the save_serverless_llm_state method be copied into normal GPUExecutor so that for single GPU, user do not need to specify backend?

I think both our save and load method do not rely on specific executor, right?

SiyangShao · 2024-07-08T12:49:08Z

vllm/model_executor/model_loader/loader.py

client here is not defined yet

SiyangShao · 2024-07-08T12:50:09Z

vllm/model_executor/model_loader/loader.py

local path seems different compared with load part. I guess it should be os.path.join(path, f"rank_{rank}")?

SiyangShao · 2024-07-08T13:08:07Z

vllm/model_executor/model_loader/loader.py

-        with open(os.path.join(rank_path, "tensor_index.json"), "w") as f:
-            json.dump(tensor_index, f)
+
+        save_dict(state_dict, os.path.join(path, f"rank_{rank}"))


I didn't found the save dict function under serverless_llm_store, where can I build the latest version with save_dict function?

use latest xly/fix-docker-build from serverlessllm

SiyangShao · 2024-07-10T07:07:37Z

vllm/model_executor/model_loader/loader.py

+        # move all tensors to CPU
+        for key, tensor in state_dict.items():
+            state_dict[key] = tensor.cpu().contiguous()
+


We may need to add

os.makedirs(os.path.join(path, f"rank_{rank}"), exist_ok=True)

here or it may have failed to open file error

Failed to open file ./models/opt-125m/rank_0/tensor.data_0

xly added 3 commits June 30, 2024 22:01

add serverless-llm loader

1817c84

custom load api

dccd493

save load sllm

aee2969

drunkcoding requested a review from future-xy July 3, 2024 09:29

future-xy reviewed Jul 4, 2024

View reviewed changes

future-xy requested a review from SiyangShao July 4, 2024 03:18

SiyangShao reviewed Jul 4, 2024

View reviewed changes

early load cpu & single gpu save

a478560

SiyangShao reviewed Jul 8, 2024

View reviewed changes

vllm/model_executor/model_loader/loader.py Outdated

Copy link

Member

SiyangShao Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client here is not defined yet

SiyangShao reviewed Jul 8, 2024

View reviewed changes

This comment was marked as resolved.

Sign in to view

update loader

ff2b521

SiyangShao reviewed Jul 8, 2024

View reviewed changes

Leyang Xue added 2 commits July 9, 2024 15:40

vllm loader new API

c8fda58

set empty tensor

fc6a6e7

SiyangShao suggested changes Jul 10, 2024

View reviewed changes

future-xy mentioned this pull request Jul 22, 2024

模型加载器的 vllm 实现 (model loader implementation for vllm) ServerlessLLM/ServerlessLLM#26

Closed

clean up loader

89fb6d2

SiyangShao mentioned this pull request Aug 7, 2024

Does ServerlessLLM needs to modify the vLLM to support fast checkpoint loading? ServerlessLLM/ServerlessLLM#57

Closed

future-xy merged commit ccd1d8c into main Aug 9, 2024

future-xy mentioned this pull request Sep 13, 2024

Fy/sllm checkpoint #2

Open

Conversation

drunkcoding commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SiyangShao Jul 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

drunkcoding commented Jul 3, 2024 •

edited

Loading

SiyangShao Jul 4, 2024 •

edited

Loading