Skip to content

NeMo 2.0 In-framework deployment support#11233

Closed
oyilmaz-nvidia wants to merge 16 commits intomainfrom
onur/nemo2_inframework_support
Closed

NeMo 2.0 In-framework deployment support#11233
oyilmaz-nvidia wants to merge 16 commits intomainfrom
onur/nemo2_inframework_support

Conversation

@oyilmaz-nvidia
Copy link
Collaborator

What does this PR do ?

Adds inframework deployment support and log probability output from trt-llm

hemildesai and others added 5 commits November 7, 2024 13:19
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@oyilmaz-nvidia oyilmaz-nvidia marked this pull request as ready for review November 11, 2024 20:41
oyilmaz-nvidia and others added 2 commits November 11, 2024 15:44
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

@staticmethod
def get_deployable(
nemo_checkpoint_filepath: str = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typing should be Optional[str] here

nemo_checkpoint_filepath: str = None,
num_devices: int = 1,
num_nodes: int = 1,
tensor_model_parallel_size: int = 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Params num_devices & num_nodes are quite tied to tensor_model_parallel_size & pipeline_model_parallel_size.

How about making tensor_model_parallel_size & pipeline_model_parallel_size optional and setting them by default to num_devices & num_nodes, respectitvely (if they are None)?

Some assertion for consistency like num_devices * num_nodes == tensor_model_parallel_size * pipeline_model_parallel_size would also be useful.

(Not sure how context_parallel_size changes all this).



class MegatronLLMDeployableNemo2(ITritonDeployable):
"""Triton inference server compatible deploy class for a .nemo model file"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is no longer a .nemo file I think, just a folder.

else:
output_log_probs.append(lp)
output_infer["log_probs"] = np.array(output_log_probs)
except Exception as error:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This possibly hides errors by inserting them in output directory. Can we just raise them right away, i.e. remove the try / catch clause?

if "sentences" in result_dict.keys():
output = result_dict["sentences"]
else:
return "Unknown output keyword."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an error?

@janekl
Copy link
Collaborator

janekl commented Nov 19, 2024

Left some minor comments, overall LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2024

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Dec 4, 2024
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2024

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.


Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.deploy.nlp.megatronllm_deployable
nemo/deploy/nlp/megatronllm_deployable.py:266:0: C0301: Line too long (144/119) (line-too-long)
nemo/deploy/nlp/megatronllm_deployable.py:270:0: C0301: Line too long (248/119) (line-too-long)
nemo/deploy/nlp/megatronllm_deployable.py:297:0: C0301: Line too long (120/119) (line-too-long)
nemo/deploy/nlp/megatronllm_deployable.py:302:0: C0301: Line too long (122/119) (line-too-long)
nemo/deploy/nlp/megatronllm_deployable.py:412:0: C0301: Line too long (158/119) (line-too-long)
nemo/deploy/nlp/megatronllm_deployable.py:52:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/nlp/megatronllm_deployable.py:102:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/nlp/megatronllm_deployable.py:106:0: C0115: Missing class docstring (missing-class-docstring)
nemo/deploy/nlp/megatronllm_deployable.py:109:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/nlp/megatronllm_deployable.py:407:4: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.deploy.nlp.query_llm
nemo/deploy/nlp/query_llm.py:29:0: C0115: Missing class docstring (missing-class-docstring)
nemo/deploy/nlp/query_llm.py:16:0: W0611: Unused abstractmethod imported from abc (unused-import)
************* Module nemo.deploy.utils
nemo/deploy/utils.py:30:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/utils.py:63:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/utils.py:75:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/utils.py:80:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/utils.py:87:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/utils.py:92:0: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.export.trt_llm.tensorrt_llm_run
nemo/export/trt_llm/tensorrt_llm_run.py:506:0: C0301: Line too long (125/119) (line-too-long)
nemo/export/trt_llm/tensorrt_llm_run.py:510:0: C0301: Line too long (136/119) (line-too-long)
nemo/export/trt_llm/tensorrt_llm_run.py:514:0: C0301: Line too long (123/119) (line-too-long)
nemo/export/trt_llm/tensorrt_llm_run.py:557:0: C0301: Line too long (181/119) (line-too-long)
nemo/export/trt_llm/tensorrt_llm_run.py:841:0: C0301: Line too long (153/119) (line-too-long)
nemo/export/trt_llm/tensorrt_llm_run.py:524:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/tensorrt_llm_run.py:533:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/tensorrt_llm_run.py:591:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/tensorrt_llm_run.py:33:0: W0611: Unused Mapping imported from tensorrt_llm.mapping (unused-import)
************* Module scripts.deploy.nlp.deploy_inframework_triton
scripts/deploy/nlp/deploy_inframework_triton.py:31:0: C0116: Missing function or method docstring (missing-function-docstring)
scripts/deploy/nlp/deploy_inframework_triton.py:52:0: C0116: Missing function or method docstring (missing-function-docstring)
scripts/deploy/nlp/deploy_inframework_triton.py:59:0: C0116: Missing function or method docstring (missing-function-docstring)
************* Module scripts.deploy.nlp.query_inframework
scripts/deploy/nlp/query_inframework.py:21:0: C0116: Missing function or method docstring (missing-function-docstring)
scripts/deploy/nlp/query_inframework.py:42:0: C0116: Missing function or method docstring (missing-function-docstring)
scripts/deploy/nlp/query_inframework.py:65:0: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.78/10

Thank you for improving NeMo's documentation!

@github-actions github-actions bot removed the stale label Dec 5, 2024
@oyilmaz-nvidia
Copy link
Collaborator Author

Closing this since there is a newer version of this PR: #11523

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments