Integrate lm-eval-harness for evaluations in NeMo by athitten · Pull Request #10621 · NVIDIA-NeMo/NeMo

athitten · 2024-09-25T18:28:31Z

What does this PR do ?

Integrates lm-evaluation-harness into NeMo to run evaluations on standard academic benchmarks.
Evaluations can be run by first deploying the model on PyTriton server via deploy method followed by running the evaluate method to do the actual evaluation.

Collection: This is an independent module and does not affect any collection.

Changelog

Refactor deploy method in nemo/collections/llm/api.py by moving utility funcs to nemo/collections/llm/evaluation/eval_utils.py. Deploy method takes care of exporting nemo model to trtllm and deploying it on PyTriton server.
Deploy method also starts rest service using uvicorn that runs a FastAPI application which exposes OpenAI end point /v1/completions.
The FastAPI application is defined in nemo/deploy/service/rest_model_api.py which contains the code to interact with the PyTriton Server to perform the actual model inference via /v1/completions endpoint.
Evaluate method is added to nemo/collections/llm/api.py to evaluate nemo model deployed on PyTriton server (via trtllm) using lm-evaluation-harness.
Evaluate method takes in as input the rest service url "http://rest_service_http_address:rest_service_port/v1" and other evaluation params like eval task, num_fewshot, limit etc., and inference params like temperature, top_p, top_k, max_tokens_to_generate
Evaluate method instantiates NeMoFWLMEval class defined in nemo/collections/llm/evaluation/eval_utils.py that creates a wrapper to interface eval prompts from lm-eval-harness with the model deployed on PyTriton.
NeMoFWLMEval class defines two types of tasks: generate_until (for ex: gsm8k) and loglikelihood tasks (for ex: mmlu).
The PR also exposes gather_context_logits, gather_generation_logits flag to build the trtllm engine with logits. Eval code needs the generation logits to compute the logProb of the actual output token or label, hence its enabled (False by default).

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

nemo/collections/llm/api.py

github-actions · 2024-10-29T01:59:59Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

nemo/export/tensorrt_llm.py

nemo/collections/llm/api.py

nemo/export/tensorrt_llm.py

nemo/collections/llm/evaluation/eval_utils.py

github-actions · 2024-11-08T05:15:09Z

[🤖]: Hi @athitten 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

athitten · 2024-11-08T06:33:21Z

nemo/collections/llm/api.py

-    with open("nemo/deploy/service/config.json", "w") as f:
-        json.dump(args_dict, f)
-
-


get_trtllm_deployable() has been moved to nemo/collections/llm/evaluation/eval_utils.py

store_args_to_json() has been removed to avoid creating a new file to store the args. Instead triton args are stored as env vars in deploy() method to access in rest_model_api.py

github-actions · 2024-11-09T08:05:56Z

[🤖]: Hi @athitten 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

hemildesai

Just a few minor comments

nemo/deploy/nlp/query_llm.py

scripts/export/convert_nemo2_for_export.py

nemo/lightning/pytorch/callbacks/debugging.py

nemo/collections/llm/evaluation/eval_utils.py

nemo/collections/llm/evaluation/base.py

+        self.add_bos = add_bos
+        super().__init__()
+
+    def _generate_tokens_logits(self, payload, return_text: bool = False, return_logits: bool = False):


oyilmaz-nvidia

Can you please check this PR #11233 and make sure the export and deploy updates don't overlap with it?