NeMo 2.0 In-framework deployment support#11233
Conversation
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
…into onur/nemo2_inframework_support
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
|
|
||
| @staticmethod | ||
| def get_deployable( | ||
| nemo_checkpoint_filepath: str = None, |
There was a problem hiding this comment.
Typing should be Optional[str] here
| nemo_checkpoint_filepath: str = None, | ||
| num_devices: int = 1, | ||
| num_nodes: int = 1, | ||
| tensor_model_parallel_size: int = 1, |
There was a problem hiding this comment.
Params num_devices & num_nodes are quite tied to tensor_model_parallel_size & pipeline_model_parallel_size.
How about making tensor_model_parallel_size & pipeline_model_parallel_size optional and setting them by default to num_devices & num_nodes, respectitvely (if they are None)?
Some assertion for consistency like num_devices * num_nodes == tensor_model_parallel_size * pipeline_model_parallel_size would also be useful.
(Not sure how context_parallel_size changes all this).
|
|
||
|
|
||
| class MegatronLLMDeployableNemo2(ITritonDeployable): | ||
| """Triton inference server compatible deploy class for a .nemo model file""" |
There was a problem hiding this comment.
It is no longer a .nemo file I think, just a folder.
| else: | ||
| output_log_probs.append(lp) | ||
| output_infer["log_probs"] = np.array(output_log_probs) | ||
| except Exception as error: |
There was a problem hiding this comment.
This possibly hides errors by inserting them in output directory. Can we just raise them right away, i.e. remove the try / catch clause?
| if "sentences" in result_dict.keys(): | ||
| output = result_dict["sentences"] | ||
| else: | ||
| return "Unknown output keyword." |
|
Left some minor comments, overall LGTM |
|
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
…VIDIA/NeMo into onur/nemo2_inframework_support
|
beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base. Your code was analyzed with PyLint. The following annotations have been identified: Thank you for improving NeMo's documentation! |
|
Closing this since there is a newer version of this PR: #11523 |
What does this PR do ?
Adds inframework deployment support and log probability output from trt-llm