Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion config/models/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ kind: Kustomization

resources:
- meta/Llama-3.3-70B-instruct.yaml
- meta/Llama-3.3-70B-instruct-FP8-Dynamic.yaml
- meta/llama-4-maverick-17b-128e-instruct-fp8.yaml
- meta/llama-4-scout-17b-16e-instruct.yaml
- intfloat/e5-mistral-7b-instruct.yaml
Expand Down
12 changes: 0 additions & 12 deletions config/models/meta/Llama-3.3-70B-instruct-FP8-Dynamic.yaml

This file was deleted.

44 changes: 39 additions & 5 deletions config/runtimes/srt/deepseek-rdma-pd-rt.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ spec:
MC_TE_METRIC=true;
SGLANG_TBO_DEBUG=1;
python3 -m sglang.launch_server
--port 30000
--port 8080
Comment thread
YouNeedCryDear marked this conversation as resolved.
--host 0.0.0.0
--model-path ${MODEL_PATH}
--disaggregation-ib-device mlx5_0,mlx5_1,mlx5_3,mlx5_4
Expand Down Expand Up @@ -158,7 +158,7 @@ spec:
--dist-init-addr $(LWS_LEADER_ADDRESS):5000
--nnodes ${LWS_GROUP_SIZE}
--node-rank ${LWS_WORKER_INDEX}
--port 30000
--port 8080
--trust-remote-code
--ep-num-redundant-experts 32
--moe-dense-tp-size 1
Expand Down Expand Up @@ -214,7 +214,7 @@ spec:
- -c
- >
python3 -m sglang.launch_server
--port 30000
--port 8080
--host 0.0.0.0
--chunked-prefill-size 262144
--page-size 64
Expand Down Expand Up @@ -302,7 +302,7 @@ spec:
--dist-init-addr $(LWS_LEADER_ADDRESS):5000
--nnodes ${LWS_GROUP_SIZE}
--node-rank ${LWS_WORKER_INDEX}
--port 30000
--port 8080
--decode-log-interval 1
--host 0.0.0.0
--trust-remote-code
Expand All @@ -325,4 +325,38 @@ spec:
- name: SGL_ENABLE_JIT_DEEPGEMM
value: "1"
- name: GLOO_SOCKET_IFNAME
value: eth0
value: eth0
routerConfig:
runner:
name: router
image: ghcr.io/moirai-internal/sgl-router:0.1.4.30f2a44
resources:
limits:
cpu: "1"
memory: "2Gi"
ports:
- containerPort: 8080
name: http
command:
- sh
- -c
- >
python3 -m sglang_router.launch_router
--host 0.0.0.0
--port 8080
--pd-disaggregation
--policy power_of_two
--service-discovery
--service-discovery-namespace "${NAMESPACE}"
--service-discovery-port 8080
--prefill-selector component=engine ome.io/inferenceservice=${INFERENCESERVICE_NAME}
--decode-selector component=decoder ome.io/inferenceservice=${INFERENCESERVICE_NAME}
Comment thread
YouNeedCryDear marked this conversation as resolved.
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: INFERENCESERVICE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.labels['ome.io/inferenceservice']
2 changes: 1 addition & 1 deletion config/runtimes/srt/llama-3-1-70b-instruct-pd-rt.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ spec:
version: "4.42.3"
modelFormat:
name: safetensors
version: "1"
version: "1.0.0"
modelArchitecture: LlamaForCausalLM
autoSelect: false
priority: 1
Expand Down
226 changes: 0 additions & 226 deletions config/runtimes/srt/llama-3-2-11b-vision-instruct-pd-rt.yaml

This file was deleted.

2 changes: 1 addition & 1 deletion config/runtimes/srt/llama-3-2-1b-instruct-pd-rt.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ spec:
version: "4.45.0.dev0"
modelFormat:
name: safetensors
version: "1"
version: "1.0.0"
modelArchitecture: LlamaForCausalLM
autoSelect: false
priority: 1
Expand Down
2 changes: 1 addition & 1 deletion config/runtimes/srt/llama-3-2-3b-instruct-pd-rt.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ spec:
version: "4.45.0.dev0"
modelFormat:
name: safetensors
version: "1"
version: "1.0.0"
modelArchitecture: LlamaForCausalLM
autoSelect: false
priority: 1
Expand Down
Loading
Loading