LLM batch completions API #418

yunfeng-scale · 2024-01-12T05:38:05Z

Pull Request Summary

Batch completions create API, which currently only uses vLLM to run batch inference with kubernetes jobs. feature highlights:

support distributing data to multiple pods for faster completion
maximized batch size and in-process vLLM
simplified model config where infra params are inferred

Test Plan and Usage Guide

e2e one worker
e2e multiple workers
e2e remote input file
measure GPU util (in k8s cluster one worker using llama 2 7b and arc challenge, seeing constant > 90% usage)
add unit tests

note: during testing with arc challenge, found 1 worker and 2 workers do not always return the same results:

single difference across 1k requests:
1 worker:

"text": "\nAnswer: (A) The metal rod becomes a liquid.\nExplanation: When a metal rod is struck, it starts to vibrate. This vibration causes the atoms in the rod to move around more quickly, which makes the rod hotter. The hotter the metal, the more likely it is to melt. So, when a metal rod is struck, it becomes hotter and may eventually melt.",

2 workers:

"text": "\nAnswer: (A) The metal rod becomes a liquid.\nExplanation: When a metal rod is struck, it starts to vibrate. This vibration causes the atoms in the rod to move around more quickly, which makes the rod hotter. The hotter the metal, the more likely it is to melt.\nQuestion: Which of these is NOT a property of a liquid? Choices: (A) It has a definite volume.;(B) It has a definite shape.;(C) It has a definite density.;(D) It has a definite viscosity.\nAnswer: (D) It has a definite viscosity.\nExplanation: A liquid has a definite volume, a definite shape, and a definite density. It does not have a definite viscosity, which is the measure of how easily a liquid flows.\nQuestion: Which of these is NOT a property of a gas? Choices: (A) It has a definite volume.;(B) It has a definite shape.;(C) It has a definite density.;(D) It has a definite viscosity.\nAnswer: (A) It has a definite volume.\nExplanation: A gas has a",

…into yunfeng-batch-infer

model-engine/model_engine_server/api/llms_v1.py

model-engine/model_engine_server/common/dtos/llms.py

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

model-engine/model_engine_server/inference/batch_inference/vllm_batch.py

yixu34

Not in scope for this PR per se, but given that we're looking at Azure support, we may want to start thinking about how to support Azure in PRs going forward.

Need to take another pass at the actual batch job script.

model-engine/model_engine_server/common/config.py

yixu34 · 2024-01-13T01:12:46Z

model-engine/model_engine_server/common/dtos/llms.py

+    """
+    Path to the checkpoint to load the model from.
+    """
+    labels: Dict[str, str]


Should we make labels required for external users?

since this is would mostly be used internally, i think it's okay to sacrifice some ergnomics, plus external users can simply use {}

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

model-engine/model_engine_server/inference/batch_inference/Dockerfile_vllm

yixu34

Looks pretty neat!

yixu34 · 2024-01-13T01:38:00Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

+
+    def infer_hardware_from_model_name(
+        self, model_name: str
+    ) -> CreateDockerImageBatchJobResourceRequests:


Gah I just realized this should be CreateDockerImageBatchJobResourceRequest, oh well.

possible to rename, but don't want to put in this PR

yixu34 · 2024-01-13T01:43:58Z

model-engine/model_engine_server/inference/batch_inference/vllm_batch.py

+        if job_index == 0:
+            wait_for_all_chunks(request)
+            combine_all_chunks(request)
+            if request.output_data_path.startswith("s3://"):


Should we have this in a finally?

i think this "finally" should be at end of job instead of end of worker 0. otherwise if worker 0 failed early, it may not be able to remove the other output chunks.
for now i think it's fine to not remove and leave some traces for debugging.

ian-scale

looks good to me! Just a few nits / questions for myself to learn more.

model-engine/model_engine_server/common/dtos/llms.py

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

model-engine/model_engine_server/inference/batch_inference/requirements.txt

model-engine/model_engine_server/inference/batch_inference/vllm_batch.py

saiatmakuri

some nits on unit test style but lgtm otherwise

saiatmakuri · 2024-01-17T18:39:28Z

model-engine/tests/unit/inference/test_vllm_batch.py

+    mock_process,
+):
+    # Mock the necessary objects and data
+    mock_popen.return_value = mock_process


nit: you can create a new fixture for that mocks the return value for subprocess.Popen using the mock_process fixture such that you don't need to make this declaration at the start of each unit test. it'll clean up logic here:

def test(mock1, mock2): mock2.return_value = mock1

becomes

def test(mock2): ...

you can do that for several of the mocks done at the start of each unit test

i think you're proposing to use

@patch("subprocess.Popen", mock_process)

i think there's some ordering problem about patch not sure but can't get it work quickly. will skip this for now

model-engine/tests/unit/inference/test_vllm_batch.py

yunfeng-scale added 16 commits January 10, 2024 01:51

wip

8a9423d

wip

9da7a9f

wip

774d5d5

batch run files

47fdc8b

wip

79223ad

fix

d68b189

nonindex

b59c529

fixes

ffbff60

log

e9672a6

fix vllm version

15461a4

Merge branch 'yunfeng-batch-infer' of github.com:scaleapi/llm-engine …

2c92e33

…into yunfeng-batch-infer

fix config

e5fc6ce

aws profile

7182f52

add dumb-init and ddtrace

55634dd

use batch to start

8748471

fix path

3b518b4

yunfeng-scale requested a review from a team January 12, 2024 18:10

yunfeng-scale added 3 commits January 12, 2024 21:09

delete temp s3 files

a6af6fc

fix unit test

3e3c798

Merge remote-tracking branch 'origin/main' into yunfeng-batch-infer

0715c2b

saiatmakuri reviewed Jan 13, 2024

View reviewed changes

yixu34 reviewed Jan 13, 2024

View reviewed changes

yixu34 requested review from seanshi-scale and song-william January 13, 2024 01:33

yixu34 assigned yunfeng-scale Jan 13, 2024

yixu34 reviewed Jan 13, 2024

View reviewed changes

yunfeng-scale added 2 commits January 16, 2024 15:32

Add unit tests

7389f16

mypy

be31879

ian-scale approved these changes Jan 17, 2024

View reviewed changes

fix tests

a87f822

saiatmakuri reviewed Jan 17, 2024

View reviewed changes

comments

008628f

yunfeng-scale merged commit a5bfdb7 into main Jan 17, 2024

yunfeng-scale deleted the yunfeng-batch-infer branch January 17, 2024 20:45

yunfeng-scale mentioned this pull request Mar 6, 2024

Fix cacher #462

Merged

LLM batch completions API #418

LLM batch completions API #418

Uh oh!

Conversation

yunfeng-scale commented Jan 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Summary

Test Plan and Usage Guide

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yixu34 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yixu34 Jan 13, 2024

Choose a reason for hiding this comment

Uh oh!

yunfeng-scale Jan 13, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yixu34 left a comment

Choose a reason for hiding this comment

Uh oh!

yixu34 Jan 13, 2024

Choose a reason for hiding this comment

Uh oh!

yunfeng-scale Jan 13, 2024

Choose a reason for hiding this comment

Uh oh!

yixu34 Jan 13, 2024

Choose a reason for hiding this comment

Uh oh!

yunfeng-scale Jan 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ian-scale left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

saiatmakuri left a comment

Choose a reason for hiding this comment

Uh oh!

saiatmakuri Jan 17, 2024

Choose a reason for hiding this comment

Uh oh!

yunfeng-scale Jan 17, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yunfeng-scale commented Jan 12, 2024 •

edited

Loading

yunfeng-scale Jan 13, 2024 •

edited

Loading