-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Background: previously, in the model-csi-driver, we provided an HTTP API to mount models into pod:
curl --unix-socket $volume_dir/csi/csi.sock \
-H "Content-Type: application/json" \
-X POST http://localhost/api/v1/volumes/$volume_name/mounts \
-d '{
"mount_id": "$mount_id",
"reference": "$reference",
"exclude_model_weights": true
}'When exclude_model_weights = true, the mount skips fetching model weight files. This is intended for scenarios like rfork: in the target sglang instance, the model mount only needs to provide the non-weights type files before weight loading, while the weights are later loaded from a seed sglang instance via GPU-Direct RDMA.
Initially, exclude_model_weights = true worked by excluding files whose mediaType is application/vnd.cncf.model.weight.v1.*. However, for models like kimi k2, sglang loads tiktoken.model to initialize the tokenizer before loading weights, but since tiktoken.model is identified as a weight-type file from modelspec, it gets excluded, causing sglang to fail to boot.
One option is to improve modctl’s weight-file detection (it cannot rely only on file extensions and may need smarter heuristics), but this cannot fix already-built model images.
Another option is to adjust the model csi API parameters as follows:
- Keep
exclude_model_weights = trueunchanged: it still excludes files withmediaType: application/vnd.cncf.model.weight.v1.*. - Add
exclude_file_patterns: [](.gitignore compatible syntax) to let users exclude/include specific files by filename patterns.
This enables usage like:
{
exclude_model_weights: true,
exclude_file_patterns: ["model.safetensors.index.json", "!tiktoken.model"]
}or fully controlled by the user, such as:
{
exclude_file_patterns: ["*.safetensors", "model.safetensors.index.json"]
}Define the precedence as exclude_file_patterns > exclude_model_weights, so users can flexibly control which files are included or excluded during mounting, addressing on-demand loading issues like the kimi k2 case.
WDYT?