AMMO Integration with Llama2 Post-Training Quantization Example and Tests by janekl · Pull Request #8444 · NVIDIA-NeMo/NeMo

janekl · 2024-02-16T18:33:51Z

What does this PR do ?

Integrating AMMO library to the project and providing utilities for quantizing models with Llama2 PTQ example.

Different quantization algorithms are available including INT8 SmoothQuant, INT4 AWQ, and FP8.

Main class Quantizer from the nemo.export.quantize submodule produces directory or .qnemo tarball to be consumed by TensorRT-LLM toolbox for efficient inference. This will be a part of NeMo Framework Inference Container.

Collection: [NLP]

Changelog

Adding nvidia-ammo package to requirements
Adding nemo.export.quantize submodule for quantizing models
Adding tests.setup module to facilitate Jenkins setup
Adding PTQ test to Jenkins

Usage

Example for INT8 SmoothQuant method:

python examples/nlp/language_modeling/megatron_llama_quantization.py \
    model_file=llama2-7b-fp16.nemo \
    model_save=llama2-7b-int8_sq.qnemo \
    quantization.algorithm=int8_sq \
    export.decoder_type=llama \
    export.inference_tensor_parallel=1

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

For more transparent and easier review process some components were isolated into individual MRs:

nemo/export/quantize/quantizer.py

tests/setup/data/create_sample_jsonl.py

+        "Once upon a time, in the middle of a dense forest, there was a small house, where lived a pretty little girl "
+        "named Little Red Riding Hood.",


tests/setup/data/create_sample_jsonl.py

+        "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore "
+        "magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea "
+        "commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat "
+        "nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit "
+        "anim id est laborum...",


Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

for more information, see https://pre-commit.ci

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

for more information, see https://pre-commit.ci

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

ericharper · 2024-03-06T16:16:23Z

reinstall.sh

    python -m build --no-isolation --wheel
    DIST_FILE=$(find ./dist -name "*.whl" | head -n 1)
-    ${PIP} install "${DIST_FILE}[all]"
+    ${PIP} install --extra-index-url https://pypi.nvidia.com "${DIST_FILE}[all]"


Don't add the --extra-index-url here, install ammo separately

ericharper · 2024-03-06T16:19:13Z

nemo/utils/model_utils.py

    return checkpoint_dir
+
+
+def save_artifacts(model, output_dir: str, use_abspath: bool = False) -> None:


Do we need this or can we use the existing implementation? I.e. the save/restore connector?

Should be able to use register_artifact for this

I took time to revisit this. This helper actually just copies artifacts from a source Nemo model (tar or dir) to a folder with quantized weights.

I would prefer using the helper because of two main reasons:

Most importantly the quantized model is actually a directory produced with AMMO export step here: https://github.com/NVIDIA/NeMo/blob/b56ff60381b80d0add4456297dab0fb52b30cf1e/nemo/export/quantize/quantizer.py#L184-L191 as opposed to a Nemo model offering register_artifact method.

Artifacts saved with register_artifact are prefixed with a MD5 hash. On the other hand, utils in Nemo Inference container typically assume hardcoded "plain" paths like tokenizer.model instead of 449ae6fd76d84842bf152e4ae4701764_tokenizer.model (for example). So I would need to perform and extra operation to remove this prefix somehwere.

Defining the save_artifacts helper gives me the flexibility I need. Are you OK with this?

ericharper · 2024-03-06T16:26:17Z

nemo/export/quantize/quantizer.py

+import tarfile
+from typing import List, Optional
+
+import ammo.torch.quantization as atq


import guard this

ericharper · 2024-03-06T16:26:26Z

nemo/export/quantize/quantizer.py

+
+import ammo.torch.quantization as atq
+import torch.distributed as dist
+from ammo.torch.export import export_model_config


import guard this

ericharper · 2024-03-06T16:27:20Z

nemo/export/quantize/quantizer.py

+
+        1. Loading a Nemo model from disk using appropriate parallelism strategy
+        2. Calibrating the model to obtain appropriate algorithm-specific scaling factors
+        3. Producing .qnemo tarball with model config (JSON), quantized weights (safetensors)


We use extracted .nemo files for llm, i.e. just directories, the idea of a .qnemo tarball probably doesn't make sense

We could enable producing either directory or tarball depending on user choice via model_save.endswith(".qnemo"). This ".qnemo" was an initial suggestion as for what to pass to a Nemo Inference container.

I agree that directories are more convenient to work with.

CC @oyilmaz-nvidia

Both options are enabled now via 3a7f07e

ericharper · 2024-03-06T16:27:41Z

nemo/export/quantize/quantizer.py

+        3. Producing .qnemo tarball with model config (JSON), quantized weights (safetensors)
+           and tokenizer config (yaml).
+
+    The .qnemo file produced is intended consumed by TensorRT-LLM toolbox for inference.


We use extracted .nemo files for llm, i.e. just directories, the idea of a .qnemo tarball probably doesn't make sense

Both are enabled -- addressed in #8444 (comment) above

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

for more information, see https://pre-commit.ci

janekl · 2024-03-11T16:39:34Z

jenkins

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

…hecks Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

janekl · 2024-03-12T14:02:58Z

jenkins

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

janekl · 2024-03-12T21:26:36Z

jenkins

ericharper · 2024-03-13T15:12:32Z

examples/nlp/language_modeling/megatron_llama_quantization.py

+
+def get_calib_dataloader(data="cnn_dailymail", batch_size=64, calib_size=512, max_sequence_length=512):
+    if data == "pileval":
+        dataset = load_dataset("json", data_files="https://the-eye.eu/public/AI/pile/val.jsonl.zst", split="train")


This link doesn't work. This one should be okay: https://huggingface.co/datasets/monology/pile-uncopyrighted

ericharper

LGTM. Thanks!

Please send a follow up PR with documentation.

…ests (NVIDIA-NeMo#8444) * AMMO integration with Llama2 PTQ example and tests Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Jenkins megatron_llama_quantization.py test setup Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * License headers Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add AMMO to requirements_nlp.txt with --extra-index-url for pip install Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Bump AMMO version to latest Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Guards workaround on spec definition Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Save artifacts and tokenizer config at once Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Extend nemo.utils package with new tools Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reorganize & reformat Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Tests for FP8 and INT4 AWQ Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_config helper function Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Unused import removal Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix FP8 Jenkins test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix TP=2 test cont'd: no need to use mpirun Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Allow for patches in AMMO versioning Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Drop AWQ test for now (need to debug) Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Allow for patches in AMMO versioning cont'd Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Use AMMO spec from MCore as it has been published Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Make AMMO optional dependency and properly import guard it Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Llama2 AWQ test and update some paths Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable specifying quantization.algorithm=null for baseline accuracy checks Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable exporting qnemo tarball or just to a directory Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Drop AWQ testing for now Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Test case for export.inference_tensor_parallel=2 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Flag to export TRT-LLM config.json Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Agoniii <815244047@qq.com>

…ests (#8444) * AMMO integration with Llama2 PTQ example and tests Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Jenkins megatron_llama_quantization.py test setup Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * License headers Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add AMMO to requirements_nlp.txt with --extra-index-url for pip install Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Bump AMMO version to latest Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Guards workaround on spec definition Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Save artifacts and tokenizer config at once Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Extend nemo.utils package with new tools Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reorganize & reformat Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Tests for FP8 and INT4 AWQ Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_config helper function Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Unused import removal Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix FP8 Jenkins test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix TP=2 test cont'd: no need to use mpirun Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Allow for patches in AMMO versioning Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Drop AWQ test for now (need to debug) Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Allow for patches in AMMO versioning cont'd Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Use AMMO spec from MCore as it has been published Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Make AMMO optional dependency and properly import guard it Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Llama2 AWQ test and update some paths Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable specifying quantization.algorithm=null for baseline accuracy checks Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable exporting qnemo tarball or just to a directory Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Drop AWQ testing for now Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Test case for export.inference_tensor_parallel=2 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Flag to export TRT-LLM config.json Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

…ests (#8444) * AMMO integration with Llama2 PTQ example and tests Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Jenkins megatron_llama_quantization.py test setup Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * License headers Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add AMMO to requirements_nlp.txt with --extra-index-url for pip install Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Bump AMMO version to latest Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Guards workaround on spec definition Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Save artifacts and tokenizer config at once Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Extend nemo.utils package with new tools Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reorganize & reformat Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Tests for FP8 and INT4 AWQ Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_config helper function Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Unused import removal Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix FP8 Jenkins test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix TP=2 test cont'd: no need to use mpirun Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Allow for patches in AMMO versioning Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Drop AWQ test for now (need to debug) Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Allow for patches in AMMO versioning cont'd Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Use AMMO spec from MCore as it has been published Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Make AMMO optional dependency and properly import guard it Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Llama2 AWQ test and update some paths Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable specifying quantization.algorithm=null for baseline accuracy checks Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable exporting qnemo tarball or just to a directory Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Drop AWQ testing for now Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Test case for export.inference_tensor_parallel=2 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Flag to export TRT-LLM config.json Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Pablo Garay <pagaray@nvidia.com>

…ests (NVIDIA-NeMo#8444) * AMMO integration with Llama2 PTQ example and tests Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Jenkins megatron_llama_quantization.py test setup Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * License headers Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add AMMO to requirements_nlp.txt with --extra-index-url for pip install Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Bump AMMO version to latest Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Guards workaround on spec definition Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Save artifacts and tokenizer config at once Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Extend nemo.utils package with new tools Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reorganize & reformat Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Tests for FP8 and INT4 AWQ Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_config helper function Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Unused import removal Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix FP8 Jenkins test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix TP=2 test cont'd: no need to use mpirun Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Allow for patches in AMMO versioning Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Drop AWQ test for now (need to debug) Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Allow for patches in AMMO versioning cont'd Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Use AMMO spec from MCore as it has been published Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Make AMMO optional dependency and properly import guard it Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Llama2 AWQ test and update some paths Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable specifying quantization.algorithm=null for baseline accuracy checks Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable exporting qnemo tarball or just to a directory Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Drop AWQ testing for now Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Test case for export.inference_tensor_parallel=2 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Flag to export TRT-LLM config.json Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

github-actions bot added NLP CI labels Feb 16, 2024

github-advanced-security bot found potential problems Feb 16, 2024

View reviewed changes

github-actions bot added core Changes to NeMo Core TTS ASR common labels Feb 19, 2024

janekl force-pushed the jlasek/ammo_integration branch from 5b2744a to ec47227 Compare February 19, 2024 09:56

github-actions bot removed core Changes to NeMo Core TTS ASR common labels Feb 19, 2024

janekl force-pushed the jlasek/ammo_integration branch 5 times, most recently from ceebcb4 to 69305d5 Compare February 23, 2024 08:14

janekl force-pushed the jlasek/ammo_integration branch from 16a17f9 to b151a2a Compare February 29, 2024 11:59

janekl and others added 12 commits March 4, 2024 10:55

AMMO integration with Llama2 PTQ example and tests

e6b0db1

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Jenkins megatron_llama_quantization.py test setup

41b3f6d

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

71d9529

for more information, see https://pre-commit.ci

License headers

9c2d7f4

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Add AMMO to requirements_nlp.txt with --extra-index-url for pip install

ae88e47

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Bump AMMO version to latest

6ca03d4

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Guards workaround on spec definition

5170db5

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Save artifacts and tokenizer config at once

543dea1

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Extend nemo.utils package with new tools

785167f

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

0f29ac4

for more information, see https://pre-commit.ci

Reorganize & reformat

acfb441

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Tests for FP8 and INT4 AWQ

e03cb87

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

ericharper reviewed Mar 6, 2024

View reviewed changes

janekl and others added 2 commits March 8, 2024 14:56

Make AMMO optional dependency and properly import guard it

ae0498d

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

b56ff60

for more information, see https://pre-commit.ci

janekl added 3 commits March 12, 2024 09:49

Add Llama2 AWQ test and update some paths

01f215d

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Enable specifying quantization.algorithm=null for baseline accuracy c…

fe1eeba

…hecks Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Enable exporting qnemo tarball or just to a directory

3a7f07e

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

janekl force-pushed the jlasek/ammo_integration branch from a14f766 to 3a7f07e Compare March 12, 2024 13:48

janekl added 4 commits March 12, 2024 21:53

Drop AWQ testing for now

ac52816

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Test case for export.inference_tensor_parallel=2

81e8e07

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Flag to export TRT-LLM config.json

bf03390

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Merge branch 'main' into jlasek/ammo_integration

d38af45

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

Merge branch 'main' into jlasek/ammo_integration

d05f8f5

ericharper reviewed Mar 13, 2024

View reviewed changes

ericharper approved these changes Mar 13, 2024

View reviewed changes

ericharper merged commit cb3f2bc into main Mar 13, 2024

ericharper deleted the jlasek/ammo_integration branch March 13, 2024 15:28

janekl mentioned this pull request Mar 19, 2024

Quantization fixes #8701

Merged

8 tasks

janekl mentioned this pull request Dec 20, 2024

modelopt=0.21.0 update #11513

Merged

8 tasks

		"Once upon a time, in the middle of a dense forest, there was a small house, where lived a pretty little girl "
		"named Little Red Riding Hood.",

		return checkpoint_dir


		def save_artifacts(model, output_dir: str, use_abspath: bool = False) -> None:

Conversation

janekl commented Feb 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

Jenkins CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

Uh oh!

Check warning

Check warning

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janekl Mar 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janekl commented Mar 11, 2024

Uh oh!

janekl commented Mar 12, 2024

Uh oh!

janekl commented Mar 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericharper left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

janekl commented Feb 16, 2024 •

edited

Loading

janekl Mar 12, 2024 •

edited

Loading