Skip to content

Conversation

@yunfeng-scale
Copy link
Contributor

@yunfeng-scale yunfeng-scale commented Oct 18, 2023

image
image

@yunfeng-scale yunfeng-scale requested a review from a team October 18, 2023 20:22
checkpoint_path (`Optional[str]`):
Path to the checkpoint for the LLM. For now we only support loading a tar file from AWS S3.
Path to the checkpoint for the LLM. Can be either a folder (preferred since there's no untar) or a tar file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be either a folder

I think we want to be precise and consistent around our guidance relating to remote files/directories. Would suggest cross-checking with the Files API and making sure that the explanation around this makes sense. e.g. we may want to clarify that this is meant to be some remote path that's accessible by the LLM Engine deployment, etc.

Can be either a folder (preferred since there's no untar)
Might be good to explain why skipping the untar is desirable - is it cold start time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking code, i don't think we only support creating endpoints from files created with Files API.

yes skipping untar is good to cold start time

throughput requirements. 2. Determine a value for the maximum number of
concurrent requests in the workload. Divide this number by ``max_workers``. Doing
this ensures that the number of workers will "climb" to ``max_workers``.
Number of Uvicorn workers per pod. Recommendation is set to 2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to hide details of Uvicorn.

Also I don't think this is correct? IIRC per_worker is basically a scaling sensitivity parameter. cc @seanshi-scale

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this actually decides both # of workers and HPA target concurrency https://github.com/scaleapi/llm-engine/blob/main/model-engine/model_engine_server/infra/gateways/k8s_resource_parser.py#L65
i'll make some updates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't understand why we want to set some ratio between HPA target and per_worker. removing the ratio

@yunfeng-scale yunfeng-scale requested a review from yixu34 October 18, 2023 22:54
import re
from typing import Union

MAX_CONCURRENCY_TO_TARGET_CONCURRENCY_RATIO = 2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this technically change our autoscaling sensitivity? cc @seanshi-scale

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which might be fine, just being aware of production behavior changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this would but i think people are not aware of this ratio at all (like users specify per worker concurrency target to be 10 and HPA uses 5)

@yunfeng-scale yunfeng-scale requested a review from yixu34 October 19, 2023 03:54
@yunfeng-scale yunfeng-scale changed the title Add docs for Model.create Add docs for Model.create, update default values and fix per_worker concurrency Oct 19, 2023
Copy link
Member

@yixu34 yixu34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more high-level comment: after

@yunfeng-scale yunfeng-scale requested a review from yixu34 October 19, 2023 18:28
Copy link
Member

@yixu34 yixu34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only thing I had left were around the examples:

  1. framework version tag
  2. labels

@yunfeng-scale yunfeng-scale requested a review from yixu34 October 20, 2023 03:03
@yunfeng-scale yunfeng-scale enabled auto-merge (squash) October 20, 2023 20:22
@yunfeng-scale yunfeng-scale merged commit 271156c into main Oct 20, 2023
@yunfeng-scale yunfeng-scale deleted the yunfeng-retry-cds branch October 20, 2023 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants