Add docs for Model.create, update default values and fix per_worker concurrency #332

yunfeng-scale · 2023-10-18T20:21:13Z

clients/python/llmengine/model.py

yixu34 · 2023-10-18T20:31:12Z

clients/python/llmengine/model.py


            checkpoint_path (`Optional[str]`):
-                Path to the checkpoint for the LLM. For now we only support loading a tar file from AWS S3.
+                Path to the checkpoint for the LLM. Can be either a folder (preferred since there's no untar) or a tar file.


Can be either a folder

I think we want to be precise and consistent around our guidance relating to remote files/directories. Would suggest cross-checking with the Files API and making sure that the explanation around this makes sense. e.g. we may want to clarify that this is meant to be some remote path that's accessible by the LLM Engine deployment, etc.

Can be either a folder (preferred since there's no untar)
Might be good to explain why skipping the untar is desirable - is it cold start time?

checking code, i don't think we only support creating endpoints from files created with Files API.

yes skipping untar is good to cold start time

yixu34 · 2023-10-18T20:32:26Z

clients/python/llmengine/model.py

-                throughput requirements. 2. Determine a value for the maximum number of
-                concurrent requests in the workload. Divide this number by ``max_workers``. Doing
-                this ensures that the number of workers will "climb" to ``max_workers``.
+                Number of Uvicorn workers per pod. Recommendation is set to 2.


Might want to hide details of Uvicorn.

Also I don't think this is correct? IIRC per_worker is basically a scaling sensitivity parameter. cc @seanshi-scale

hmm, this actually decides both # of workers and HPA target concurrency https://github.com/scaleapi/llm-engine/blob/main/model-engine/model_engine_server/infra/gateways/k8s_resource_parser.py#L65
i'll make some updates

didn't understand why we want to set some ratio between HPA target and per_worker. removing the ratio

clients/python/llmengine/model.py

yixu34 · 2023-10-19T00:28:48Z

model-engine/model_engine_server/infra/gateways/k8s_resource_parser.py

 import re
 from typing import Union

-MAX_CONCURRENCY_TO_TARGET_CONCURRENCY_RATIO = 2.0


Wouldn't this technically change our autoscaling sensitivity? cc @seanshi-scale

Which might be fine, just being aware of production behavior changes.

yes this would but i think people are not aware of this ratio at all (like users specify per worker concurrency target to be 10 and HPA uses 5)

yixu34

One more high-level comment: after

clients/python/llmengine/model.py

…to yunfeng-retry-cds

yixu34

Only thing I had left were around the examples:

framework version tag
labels

clients/python/llmengine/model.py

Add docs for Model.create

71b2b56

yunfeng-scale requested a review from a team October 18, 2023 20:22

yixu34 reviewed Oct 18, 2023

View reviewed changes

ian-scale reviewed Oct 18, 2023

View reviewed changes

clients/python/llmengine/model.py Show resolved Hide resolved

yunfeng-scale added 2 commits October 18, 2023 14:14

comments

cb288c6

comments

7ec242f

yunfeng-scale requested a review from yixu34 October 18, 2023 22:54

yixu34 reviewed Oct 19, 2023

View reviewed changes

fixes

a21f40d

yunfeng-scale requested a review from yixu34 October 19, 2023 03:54

yunfeng-scale changed the title ~~Add docs for Model.create~~ Add docs for Model.create, update default values and fix per_worker concurrency Oct 19, 2023

Merge branch 'main' into yunfeng-retry-cds

b82878d

yixu34 reviewed Oct 19, 2023

View reviewed changes

clients/python/llmengine/model.py Show resolved Hide resolved

yunfeng-scale added 3 commits October 19, 2023 11:25

add examples

47ec971

Merge branch 'yunfeng-retry-cds' of github.com:scaleapi/llm-engine in…

bb0075b

…to yunfeng-retry-cds

fix

159ab5c

yunfeng-scale requested a review from yixu34 October 19, 2023 18:28

yixu34 reviewed Oct 19, 2023

View reviewed changes

clients/python/llmengine/model.py Outdated Show resolved Hide resolved

clients/python/llmengine/model.py Outdated Show resolved Hide resolved

yunfeng-scale and others added 2 commits October 19, 2023 20:01

comments

938949b

Merge branch 'main' into yunfeng-retry-cds

25c3bae

yunfeng-scale requested a review from yixu34 October 20, 2023 03:03

yixu34 approved these changes Oct 20, 2023

View reviewed changes

Merge branch 'main' into yunfeng-retry-cds

fa0b42b

yunfeng-scale enabled auto-merge (squash) October 20, 2023 20:22

yunfeng-scale merged commit 271156c into main Oct 20, 2023

yunfeng-scale deleted the yunfeng-retry-cds branch October 20, 2023 20:48

yunfeng-scale mentioned this pull request Nov 1, 2023

Integrate TensorRT-LLM #358

Merged

Add docs for Model.create, update default values and fix per_worker concurrency #332

Add docs for Model.create, update default values and fix per_worker concurrency #332

Uh oh!

Conversation

yunfeng-scale commented Oct 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yixu34 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yixu34 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yunfeng-scale commented Oct 18, 2023 •

edited

Loading