Skip to content

Conversation

@yunfeng-scale
Copy link
Contributor

@yunfeng-scale yunfeng-scale commented Feb 28, 2024

Pull Request Summary

  1. Add batch inference to llm engine guides
  2. Cache vllm batch inference docker image
  3. Use lower GPU mem utilization to avoid random OOM, also some logging if there's abnormal GPU mem usage at start

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

@yunfeng-scale yunfeng-scale requested a review from a team February 28, 2024 01:59

RUN apt-get update && \
apt-get install -y dumb-init && \
apt-get install -y dumb-init psmisc && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, what does psmisc do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

psmisc provides fuser command

@yunfeng-scale yunfeng-scale merged commit 39ef7c4 into main Mar 2, 2024
@yunfeng-scale yunfeng-scale deleted the yunfeng-batch-infer-improv branch March 2, 2024 02:17
@yunfeng-scale yunfeng-scale mentioned this pull request Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants