Skip to content

Conversation

@yunfeng-scale
Copy link
Contributor

@yunfeng-scale yunfeng-scale commented Jan 12, 2024

Pull Request Summary

Batch completions create API, which currently only uses vLLM to run batch inference with kubernetes jobs. feature highlights:

  1. support distributing data to multiple pods for faster completion
  2. maximized batch size and in-process vLLM
  3. simplified model config where infra params are inferred

Test Plan and Usage Guide

  • e2e one worker
  • e2e multiple workers
  • e2e remote input file
  • measure GPU util (in k8s cluster one worker using llama 2 7b and arc challenge, seeing constant > 90% usage)
  • add unit tests

note: during testing with arc challenge, found 1 worker and 2 workers do not always return the same results:

single difference across 1k requests:
1 worker:

"text": "\nAnswer: (A) The metal rod becomes a liquid.\nExplanation: When a metal rod is struck, it starts to vibrate. This vibration causes the atoms in the rod to move around more quickly, which makes the rod hotter. The hotter the metal, the more likely it is to melt. So, when a metal rod is struck, it becomes hotter and may eventually melt.",

2 workers:

"text": "\nAnswer: (A) The metal rod becomes a liquid.\nExplanation: When a metal rod is struck, it starts to vibrate. This vibration causes the atoms in the rod to move around more quickly, which makes the rod hotter. The hotter the metal, the more likely it is to melt.\nQuestion: Which of these is NOT a property of a liquid? Choices: (A) It has a definite volume.;(B) It has a definite shape.;(C) It has a definite density.;(D) It has a definite viscosity.\nAnswer: (D) It has a definite viscosity.\nExplanation: A liquid has a definite volume, a definite shape, and a definite density. It does not have a definite viscosity, which is the measure of how easily a liquid flows.\nQuestion: Which of these is NOT a property of a gas? Choices: (A) It has a definite volume.;(B) It has a definite shape.;(C) It has a definite density.;(D) It has a definite viscosity.\nAnswer: (A) It has a definite volume.\nExplanation: A gas has a",

@yunfeng-scale yunfeng-scale requested a review from a team January 12, 2024 18:10
Copy link
Member

@yixu34 yixu34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in scope for this PR per se, but given that we're looking at Azure support, we may want to start thinking about how to support Azure in PRs going forward.

Need to take another pass at the actual batch job script.

"""
Path to the checkpoint to load the model from.
"""
labels: Dict[str, str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make labels required for external users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is would mostly be used internally, i think it's okay to sacrifice some ergnomics, plus external users can simply use {}

Copy link
Member

@yixu34 yixu34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty neat!


def infer_hardware_from_model_name(
self, model_name: str
) -> CreateDockerImageBatchJobResourceRequests:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah I just realized this should be CreateDockerImageBatchJobResourceRequest, oh well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible to rename, but don't want to put in this PR

if job_index == 0:
wait_for_all_chunks(request)
combine_all_chunks(request)
if request.output_data_path.startswith("s3://"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have this in a finally?

Copy link
Contributor Author

@yunfeng-scale yunfeng-scale Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this "finally" should be at end of job instead of end of worker 0. otherwise if worker 0 failed early, it may not be able to remove the other output chunks.
for now i think it's fine to not remove and leave some traces for debugging.

Copy link
Contributor

@ian-scale ian-scale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me! Just a few nits / questions for myself to learn more.

Copy link
Contributor

@saiatmakuri saiatmakuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits on unit test style but lgtm otherwise

mock_process,
):
# Mock the necessary objects and data
mock_popen.return_value = mock_process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can create a new fixture for that mocks the return value for subprocess.Popen using the mock_process fixture such that you don't need to make this declaration at the start of each unit test. it'll clean up logic here:

def test(mock1, mock2):
  mock2.return_value = mock1

becomes

def test(mock2):
  ...

you can do that for several of the mocks done at the start of each unit test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you're proposing to use

@patch("subprocess.Popen", mock_process)

i think there's some ordering problem about patch not sure but can't get it work quickly. will skip this for now

@yunfeng-scale yunfeng-scale merged commit a5bfdb7 into main Jan 17, 2024
@yunfeng-scale yunfeng-scale deleted the yunfeng-batch-infer branch January 17, 2024 20:45
@yunfeng-scale yunfeng-scale mentioned this pull request Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants