NeMo 2.0 In-framework deployment support#11233

Closed

oyilmaz-nvidia wants to merge 16 commits intomainfrom

onur/nemo2_inframework_support

Collaborator

oyilmaz-nvidia commented Nov 8, 2024

What does this PR do ?

Adds inframework deployment support and log probability output from trt-llm

hemildesai and others added 5 commits

November 7, 2024 13:19


          Fix llm.generate

8b0d06e

Signed-off-by: Hemil Desai <hemild@nvidia.com>

fix

701b2d8

Signed-off-by: Hemil Desai <hemild@nvidia.com>


          Apply isort and black reformatting

2caca8f

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

Fix

e2c3dd6

Signed-off-by: Hemil Desai <hemild@nvidia.com>


          log probability output from trt-llm

5a4639f

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

oyilmaz-nvidia mentioned this pull request

Log_props output in TRT-LLM #11009

Closed

oyilmaz-nvidia added 2 commits

November 8, 2024 11:52


          Merge branch 'hemil/fix-inference' of https://github.com/NVIDIA/NeMo …

0ec4db8

…into onur/nemo2_inframework_support


          nemo2 ckpt support in inframework deployment

95ed0f3

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

github-advanced-security bot found potential problems

View reviewed changes

nemo/deploy/nlp/megatronllm_deployable.py Fixed Show fixed Hide fixed

oyilmaz-nvidia mentioned this pull request

Inframework deployment for nemo 2 ckpt #11165

Closed

oyilmaz-nvidia added 5 commits

November 8, 2024 13:53


          nemo 2 is working

49d6dde

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>


          log prob added

5b305fe

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>


          log probs added

5122cb6

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>


          use only openai output format

6822f54

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>


          log prob param name changed

8d7b0fd

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

oyilmaz-nvidia marked this pull request as ready for review

November 11, 2024 20:41

oyilmaz-nvidia and others added 2 commits

November 11, 2024 15:44


          merge conflict

fc5ce67

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>


          Apply isort and black reformatting

dede20e

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

oyilmaz-nvidia requested a review from athitten

November 11, 2024 20:48

oyilmaz-nvidia mentioned this pull request

Integrate lm-eval-harness for evaluations in NeMo #10621

Merged

8 tasks

janekl reviewed

View reviewed changes

nemo/deploy/nlp/megatronllm_deployable.py

+                  @staticmethod
+                  def get_deployable(
+                      nemo_checkpoint_filepath: str = None,

Collaborator

janekl Nov 19, 2024

Typing should be Optional[str] here

janekl reviewed

View reviewed changes

nemo/deploy/nlp/megatronllm_deployable.py

+                      nemo_checkpoint_filepath: str = None,
+                      num_devices: int = 1,
+                      num_nodes: int = 1,
+                      tensor_model_parallel_size: int = 1,

Collaborator

janekl Nov 19, 2024

Params num_devices & num_nodes are quite tied to tensor_model_parallel_size & pipeline_model_parallel_size.

How about making tensor_model_parallel_size & pipeline_model_parallel_size optional and setting them by default to num_devices & num_nodes, respectitvely (if they are None)?

Some assertion for consistency like num_devices * num_nodes == tensor_model_parallel_size * pipeline_model_parallel_size would also be useful.

(Not sure how context_parallel_size changes all this).

janekl reviewed

View reviewed changes

nemo/deploy/nlp/megatronllm_deployable.py



		class MegatronLLMDeployableNemo2(ITritonDeployable):
		"""Triton inference server compatible deploy class for a .nemo model file"""

Collaborator

janekl Nov 19, 2024

It is no longer a .nemo file I think, just a folder.

janekl reviewed

View reviewed changes

nemo/deploy/nlp/megatronllm_deployable.py

+                                  else:
+                                      output_log_probs.append(lp)
+                              output_infer["log_probs"] = np.array(output_log_probs)
+                      except Exception as error:

Collaborator

janekl Nov 19, 2024

This possibly hides errors by inserting them in output directory. Can we just raise them right away, i.e. remove the try / catch clause?

janekl reviewed

View reviewed changes

nemo/deploy/nlp/query_llm.py

+                              if "sentences" in result_dict.keys():
+                                  output = result_dict["sentences"]
+                              else:
+                                  return "Unknown output keyword."

Collaborator

janekl Nov 19, 2024

Should this be an error?

Collaborator

janekl commented Nov 19, 2024

Left some minor comments, overall LGTM

Contributor

github-actions bot commented Dec 4, 2024

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions bot added the stale label

oyilmaz-nvidia added 2 commits

December 4, 2024 15:26


          revert trt-llm updates

8c4ee58

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>


          Merge branch 'onur/nemo2_inframework_support' of https://github.com/N…

1e28670

…VIDIA/NeMo into onur/nemo2_inframework_support

Contributor

github-actions bot commented Dec 4, 2024

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.

Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.deploy.nlp.megatronllm_deployable
nemo/deploy/nlp/megatronllm_deployable.py:266:0: C0301: Line too long (144/119) (line-too-long)
nemo/deploy/nlp/megatronllm_deployable.py:270:0: C0301: Line too long (248/119) (line-too-long)
nemo/deploy/nlp/megatronllm_deployable.py:297:0: C0301: Line too long (120/119) (line-too-long)
nemo/deploy/nlp/megatronllm_deployable.py:302:0: C0301: Line too long (122/119) (line-too-long)
nemo/deploy/nlp/megatronllm_deployable.py:412:0: C0301: Line too long (158/119) (line-too-long)
nemo/deploy/nlp/megatronllm_deployable.py:52:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/nlp/megatronllm_deployable.py:102:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/nlp/megatronllm_deployable.py:106:0: C0115: Missing class docstring (missing-class-docstring)
nemo/deploy/nlp/megatronllm_deployable.py:109:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/nlp/megatronllm_deployable.py:407:4: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.deploy.nlp.query_llm
nemo/deploy/nlp/query_llm.py:29:0: C0115: Missing class docstring (missing-class-docstring)
nemo/deploy/nlp/query_llm.py:16:0: W0611: Unused abstractmethod imported from abc (unused-import)
************* Module nemo.deploy.utils
nemo/deploy/utils.py:30:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/utils.py:63:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/utils.py:75:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/utils.py:80:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/utils.py:87:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/deploy/utils.py:92:0: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.export.trt_llm.tensorrt_llm_run
nemo/export/trt_llm/tensorrt_llm_run.py:506:0: C0301: Line too long (125/119) (line-too-long)
nemo/export/trt_llm/tensorrt_llm_run.py:510:0: C0301: Line too long (136/119) (line-too-long)
nemo/export/trt_llm/tensorrt_llm_run.py:514:0: C0301: Line too long (123/119) (line-too-long)
nemo/export/trt_llm/tensorrt_llm_run.py:557:0: C0301: Line too long (181/119) (line-too-long)
nemo/export/trt_llm/tensorrt_llm_run.py:841:0: C0301: Line too long (153/119) (line-too-long)
nemo/export/trt_llm/tensorrt_llm_run.py:524:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/tensorrt_llm_run.py:533:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/tensorrt_llm_run.py:591:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/export/trt_llm/tensorrt_llm_run.py:33:0: W0611: Unused Mapping imported from tensorrt_llm.mapping (unused-import)
************* Module scripts.deploy.nlp.deploy_inframework_triton
scripts/deploy/nlp/deploy_inframework_triton.py:31:0: C0116: Missing function or method docstring (missing-function-docstring)
scripts/deploy/nlp/deploy_inframework_triton.py:52:0: C0116: Missing function or method docstring (missing-function-docstring)
scripts/deploy/nlp/deploy_inframework_triton.py:59:0: C0116: Missing function or method docstring (missing-function-docstring)
************* Module scripts.deploy.nlp.query_inframework
scripts/deploy/nlp/query_inframework.py:21:0: C0116: Missing function or method docstring (missing-function-docstring)
scripts/deploy/nlp/query_inframework.py:42:0: C0116: Missing function or method docstring (missing-function-docstring)
scripts/deploy/nlp/query_inframework.py:65:0: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.78/10

Thank you for improving NeMo's documentation!

github-actions bot removed the stale label

Collaborator Author

oyilmaz-nvidia commented Dec 9, 2024

Closing this since there is a newer version of this PR: #11523

oyilmaz-nvidia closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet