Conversation
There was a problem hiding this comment.
Can we call save models instead of save_tensors?
There was a problem hiding this comment.
We are executing save/load state dict, not model, need to use save tensors directly
| model = _initialize_model(model_config, self.load_config, | ||
| lora_config, vision_language_config, | ||
| cache_config) | ||
| state_dict = self._filter_subtensors(model.state_dict()) |
There was a problem hiding this comment.
Will this create another copy of model parameters?
There was a problem hiding this comment.
copy exactly from vllm implementation, we inherit the save behaviour
There was a problem hiding this comment.
Essentially, we have one unique rank, right? so why do we need this rank here?
There was a problem hiding this comment.
used in with get_cuda_memory_handles API
There was a problem hiding this comment.
every tensor has its own base address, so GPU offset is always 0
There was a problem hiding this comment.
Could the save_serverless_llm_state method be copied into normal GPUExecutor so that for single GPU, user do not need to specify backend?
There was a problem hiding this comment.
I think both our save and load method do not rely on specific executor, right?
There was a problem hiding this comment.
client here is not defined yet
There was a problem hiding this comment.
local path seems different compared with load part. I guess it should be os.path.join(path, f"rank_{rank}")?
This comment was marked as resolved.
This comment was marked as resolved.
| with open(os.path.join(rank_path, "tensor_index.json"), "w") as f: | ||
| json.dump(tensor_index, f) | ||
|
|
||
| save_dict(state_dict, os.path.join(path, f"rank_{rank}")) |
There was a problem hiding this comment.
I didn't found the save dict function under serverless_llm_store, where can I build the latest version with save_dict function?
There was a problem hiding this comment.
use latest xly/fix-docker-build from serverlessllm
| # move all tensors to CPU | ||
| for key, tensor in state_dict.items(): | ||
| state_dict[key] = tensor.cpu().contiguous() | ||
|
|
There was a problem hiding this comment.
We may need to add
os.makedirs(os.path.join(path, f"rank_{rank}"), exist_ok=True)
here or it may have failed to open file error
Failed to open file ./models/opt-125m/rank_0/tensor.data_0
This PR creates new loader to use ServerlessLLM interface