Skip to content

Conversation

@yunfeng-scale
Copy link
Contributor

@yunfeng-scale yunfeng-scale commented Oct 19, 2023

hardcode to use 2 http forwarder workers:

  1. we're only assigning 500m CPU to forwarder
  2. given our traffic don't think we need more than 2

@yunfeng-scale yunfeng-scale requested a review from a team October 19, 2023 21:31
{{- end }}
- --num-workers
- "${PER_WORKER}"
- "${FORWARDER_WORKER_COUNT}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we always just have one celery worker and per_worker is used to determine the concurrency https://github.com/scaleapi/llm-engine/blob/main/model-engine/model_engine_server/inference/forwarding/celery_forwarder.py#L141

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo we could think of ways to maybe increase the number of celery tasks in flight to increase async task throughput, which may involve increasing number of workers, maybe not super necessary now though

@yunfeng-scale yunfeng-scale enabled auto-merge (squash) October 20, 2023 00:19
@yunfeng-scale yunfeng-scale merged commit 1a3b5e0 into main Oct 20, 2023
@yunfeng-scale yunfeng-scale deleted the yunfeng-hardcode-forwarder-count branch October 20, 2023 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants