Skip to content

Conversation

@dmchoiboi
Copy link
Collaborator

Pull Request Summary

What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.

Enforce that model checkpoints are used when creating bundles to prevent outages/issues in prod where a hf repo becomes unavailable

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

if not checkpoint_path:
raise InvalidRequestException(f"No checkpoint path found for model {model_name}")

if checkpoint_path.startswith("s3://"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: change the check to be if not ... and throw error so you can avoid a level of nesting. then honestly you could create a helper fn to validate checkpoint_path with both checks there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking more about this, ig we make an assumption that the URLs are S3, which is not true in the multi tenant world. since it'll be a lot more code change, no need to do here, but would be good to make a note to check infra_config().cloud_provider being "aws" later on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 about not being cloud-specific (anymore). @squeakymouse is in the process of refactoring this, would be good for her to review.

Copy link
Collaborator Author

@dmchoiboi dmchoiboi Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this check triggered me a bit as well, but i wanted to hold off on any big refactors around "cloud agnosticism" until I had a fuller picture on what's needed to achieve it (e.g. we'd need to refactor ModelInfo so it doesn't hardcode s3_repo, and i'm sure there other things we'd need to change)

@yixu34 yixu34 requested a review from squeakymouse April 26, 2024 04:52
@dmchoiboi dmchoiboi force-pushed the dmchoi/disable-hf-repo branch from 56eedc4 to 1d200f2 Compare April 26, 2024 05:20
@dmchoiboi dmchoiboi requested a review from saiatmakuri April 26, 2024 05:30
)
else:

if not checkpoint_path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also use get_checkpoint_path here too imo. it's technically net new functionality but should be welcome for tensorrt too

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to @seanshi-scale, trt-llm provider is different in that we won't need model_name on creation so we won't necessarily have it in the control flow

@dmchoiboi dmchoiboi merged commit 0079f7e into main Apr 26, 2024
@dmchoiboi dmchoiboi deleted the dmchoi/disable-hf-repo branch April 26, 2024 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants