Skip to content

Provide a more flexible API to specify exclude / include file patterns #17

@imeoer

Description

@imeoer

Background: previously, in the model-csi-driver, we provided an HTTP API to mount models into pod:

curl --unix-socket $volume_dir/csi/csi.sock \
  -H "Content-Type: application/json" \
  -X POST http://localhost/api/v1/volumes/$volume_name/mounts \
  -d '{
    "mount_id": "$mount_id",
    "reference": "$reference",
    "exclude_model_weights": true
  }'

When exclude_model_weights = true, the mount skips fetching model weight files. This is intended for scenarios like rfork: in the target sglang instance, the model mount only needs to provide the non-weights type files before weight loading, while the weights are later loaded from a seed sglang instance via GPU-Direct RDMA.

Initially, exclude_model_weights = true worked by excluding files whose mediaType is application/vnd.cncf.model.weight.v1.*. However, for models like kimi k2, sglang loads tiktoken.model to initialize the tokenizer before loading weights, but since tiktoken.model is identified as a weight-type file from modelspec, it gets excluded, causing sglang to fail to boot.

One option is to improve modctl’s weight-file detection (it cannot rely only on file extensions and may need smarter heuristics), but this cannot fix already-built model images.

Another option is to adjust the model csi API parameters as follows:

  1. Keep exclude_model_weights = true unchanged: it still excludes files with mediaType: application/vnd.cncf.model.weight.v1.*.
  2. Add exclude_file_patterns: [] (.gitignore compatible syntax) to let users exclude/include specific files by filename patterns.

This enables usage like:

{
  exclude_model_weights: true,
  exclude_file_patterns: ["model.safetensors.index.json", "!tiktoken.model"]
}

or fully controlled by the user, such as:

{
  exclude_file_patterns: ["*.safetensors", "model.safetensors.index.json"]
}

Define the precedence as exclude_file_patterns > exclude_model_weights, so users can flexibly control which files are included or excluded during mounting, addressing on-demand loading issues like the kimi k2 case.

WDYT?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions