Skip to content

AMMO Integration with Llama2 Post-Training Quantization Example and Tests#8444

Merged
ericharper merged 33 commits intomainfrom
jlasek/ammo_integration
Mar 13, 2024
Merged

AMMO Integration with Llama2 Post-Training Quantization Example and Tests#8444
ericharper merged 33 commits intomainfrom
jlasek/ammo_integration

Conversation

@janekl
Copy link
Collaborator

@janekl janekl commented Feb 16, 2024

What does this PR do ?

Integrating AMMO library to the project and providing utilities for quantizing models with Llama2 PTQ example.

Different quantization algorithms are available including INT8 SmoothQuant, INT4 AWQ, and FP8.

Main class Quantizer from the nemo.export.quantize submodule produces directory or .qnemo tarball to be consumed by TensorRT-LLM toolbox for efficient inference. This will be a part of NeMo Framework Inference Container.

Collection: [NLP]

Changelog

  • Adding nvidia-ammo package to requirements
  • Adding nemo.export.quantize submodule for quantizing models
  • Adding tests.setup module to facilitate Jenkins setup
  • Adding PTQ test to Jenkins

Usage

Example for INT8 SmoothQuant method:

python examples/nlp/language_modeling/megatron_llama_quantization.py \
    model_file=llama2-7b-fp16.nemo \
    model_save=llama2-7b-int8_sq.qnemo \
    quantization.algorithm=int8_sq \
    export.decoder_type=llama \
    export.inference_tensor_parallel=1

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

For more transparent and easier review process some components were isolated into individual MRs:

Comment on lines +19 to +34
"Once upon a time, in the middle of a dense forest, there was a small house, where lived a pretty little girl "
"named Little Red Riding Hood.",

Check warning

Code scanning / CodeQL

Implicit string concatenation in a list

Implicit string concatenation. Maybe missing a comma?
Comment on lines +21 to +39
"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore "
"magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea "
"commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat "
"nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit "
"anim id est laborum...",

Check warning

Code scanning / CodeQL

Implicit string concatenation in a list

Implicit string concatenation. Maybe missing a comma?
@github-actions github-actions bot added core Changes to NeMo Core TTS ASR common labels Feb 19, 2024
@janekl janekl force-pushed the jlasek/ammo_integration branch from 5b2744a to ec47227 Compare February 19, 2024 09:56
@github-actions github-actions bot removed core Changes to NeMo Core TTS ASR common labels Feb 19, 2024
@janekl janekl force-pushed the jlasek/ammo_integration branch 5 times, most recently from ceebcb4 to 69305d5 Compare February 23, 2024 08:14
@janekl janekl force-pushed the jlasek/ammo_integration branch from 16a17f9 to b151a2a Compare February 29, 2024 11:59
janekl and others added 12 commits March 4, 2024 10:55
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
reinstall.sh Outdated
python -m build --no-isolation --wheel
DIST_FILE=$(find ./dist -name "*.whl" | head -n 1)
${PIP} install "${DIST_FILE}[all]"
${PIP} install --extra-index-url https://pypi.nvidia.com "${DIST_FILE}[all]"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add the --extra-index-url here, install ammo separately

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return checkpoint_dir


def save_artifacts(model, output_dir: str, use_abspath: bool = False) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this or can we use the existing implementation? I.e. the save/restore connector?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to use register_artifact for this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took time to revisit this. This helper actually just copies artifacts from a source Nemo model (tar or dir) to a folder with quantized weights.

I would prefer using the helper because of two main reasons:

  1. Most importantly the quantized model is actually a directory produced with AMMO export step here: https://github.com/NVIDIA/NeMo/blob/b56ff60381b80d0add4456297dab0fb52b30cf1e/nemo/export/quantize/quantizer.py#L184-L191 as opposed to a Nemo model offering register_artifact method.
  2. Artifacts saved with register_artifact are prefixed with a MD5 hash. On the other hand, utils in Nemo Inference container typically assume hardcoded "plain" paths like tokenizer.model instead of 449ae6fd76d84842bf152e4ae4701764_tokenizer.model (for example). So I would need to perform and extra operation to remove this prefix somehwere.

Defining the save_artifacts helper gives me the flexibility I need. Are you OK with this?

import tarfile
from typing import List, Optional

import ammo.torch.quantization as atq
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import guard this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


import ammo.torch.quantization as atq
import torch.distributed as dist
from ammo.torch.export import export_model_config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import guard this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


1. Loading a Nemo model from disk using appropriate parallelism strategy
2. Calibrating the model to obtain appropriate algorithm-specific scaling factors
3. Producing .qnemo tarball with model config (JSON), quantized weights (safetensors)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use extracted .nemo files for llm, i.e. just directories, the idea of a .qnemo tarball probably doesn't make sense

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could enable producing either directory or tarball depending on user choice via model_save.endswith(".qnemo"). This ".qnemo" was an initial suggestion as for what to pass to a Nemo Inference container.

I agree that directories are more convenient to work with.

CC @oyilmaz-nvidia

Copy link
Collaborator Author

@janekl janekl Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both options are enabled now via 3a7f07e

3. Producing .qnemo tarball with model config (JSON), quantized weights (safetensors)
and tokenizer config (yaml).

The .qnemo file produced is intended consumed by TensorRT-LLM toolbox for inference.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use extracted .nemo files for llm, i.e. just directories, the idea of a .qnemo tarball probably doesn't make sense

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are enabled -- addressed in #8444 (comment) above

janekl and others added 2 commits March 8, 2024 14:56
@janekl
Copy link
Collaborator Author

janekl commented Mar 11, 2024

jenkins

janekl added 3 commits March 12, 2024 09:49
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
…hecks

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
@janekl janekl force-pushed the jlasek/ammo_integration branch from a14f766 to 3a7f07e Compare March 12, 2024 13:48
@janekl
Copy link
Collaborator Author

janekl commented Mar 12, 2024

jenkins

janekl added 4 commits March 12, 2024 21:53
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
@janekl
Copy link
Collaborator Author

janekl commented Mar 12, 2024

jenkins


def get_calib_dataloader(data="cnn_dailymail", batch_size=64, calib_size=512, max_sequence_length=512):
if data == "pileval":
dataset = load_dataset("json", data_files="https://the-eye.eu/public/AI/pile/val.jsonl.zst", split="train")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link doesn't work. This one should be okay: https://huggingface.co/datasets/monology/pile-uncopyrighted

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Please send a follow up PR with documentation.

@ericharper ericharper merged commit cb3f2bc into main Mar 13, 2024
@ericharper ericharper deleted the jlasek/ammo_integration branch March 13, 2024 15:28
Agoniii pushed a commit to Agoniii/NeMo that referenced this pull request Mar 15, 2024
…ests (NVIDIA-NeMo#8444)

* AMMO integration with Llama2 PTQ example and tests

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Jenkins megatron_llama_quantization.py test setup

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* License headers

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add AMMO to requirements_nlp.txt with --extra-index-url for pip install

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Bump AMMO version to latest

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Guards workaround on spec definition

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Save artifacts and tokenizer config at once

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Extend nemo.utils package with new tools

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reorganize & reformat

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Tests for FP8 and INT4 AWQ

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add load_config helper function

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Unused import removal

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Fix FP8 Jenkins test

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Fix TP=2 test cont'd: no need to use mpirun

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Allow for patches in AMMO versioning

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Drop AWQ test for now (need to debug)

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Allow for patches in AMMO versioning cont'd

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Use AMMO spec from MCore as it has been published

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Make AMMO optional dependency and properly import guard it

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add Llama2 AWQ test and update some paths

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Enable specifying quantization.algorithm=null for baseline accuracy checks

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Enable exporting qnemo tarball or just to a directory

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Drop AWQ testing for now

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Test case for export.inference_tensor_parallel=2

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Flag to export TRT-LLM config.json

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Agoniii <815244047@qq.com>
JRD971000 pushed a commit that referenced this pull request Mar 15, 2024
…ests (#8444)

* AMMO integration with Llama2 PTQ example and tests

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Jenkins megatron_llama_quantization.py test setup

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* License headers

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add AMMO to requirements_nlp.txt with --extra-index-url for pip install

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Bump AMMO version to latest

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Guards workaround on spec definition

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Save artifacts and tokenizer config at once

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Extend nemo.utils package with new tools

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reorganize & reformat

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Tests for FP8 and INT4 AWQ

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add load_config helper function

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Unused import removal

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Fix FP8 Jenkins test

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Fix TP=2 test cont'd: no need to use mpirun

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Allow for patches in AMMO versioning

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Drop AWQ test for now (need to debug)

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Allow for patches in AMMO versioning cont'd

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Use AMMO spec from MCore as it has been published

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Make AMMO optional dependency and properly import guard it

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add Llama2 AWQ test and update some paths

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Enable specifying quantization.algorithm=null for baseline accuracy checks

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Enable exporting qnemo tarball or just to a directory

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Drop AWQ testing for now

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Test case for export.inference_tensor_parallel=2

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Flag to export TRT-LLM config.json

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
pablo-garay pushed a commit that referenced this pull request Mar 19, 2024
…ests (#8444)

* AMMO integration with Llama2 PTQ example and tests

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Jenkins megatron_llama_quantization.py test setup

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* License headers

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add AMMO to requirements_nlp.txt with --extra-index-url for pip install

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Bump AMMO version to latest

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Guards workaround on spec definition

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Save artifacts and tokenizer config at once

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Extend nemo.utils package with new tools

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reorganize & reformat

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Tests for FP8 and INT4 AWQ

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add load_config helper function

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Unused import removal

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Fix FP8 Jenkins test

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Fix TP=2 test cont'd: no need to use mpirun

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Allow for patches in AMMO versioning

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Drop AWQ test for now (need to debug)

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Allow for patches in AMMO versioning cont'd

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Use AMMO spec from MCore as it has been published

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Make AMMO optional dependency and properly import guard it

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add Llama2 AWQ test and update some paths

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Enable specifying quantization.algorithm=null for baseline accuracy checks

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Enable exporting qnemo tarball or just to a directory

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Drop AWQ testing for now

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Test case for export.inference_tensor_parallel=2

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Flag to export TRT-LLM config.json

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
@janekl janekl mentioned this pull request Mar 19, 2024
8 tasks
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
…ests (NVIDIA-NeMo#8444)

* AMMO integration with Llama2 PTQ example and tests

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Jenkins megatron_llama_quantization.py test setup

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* License headers

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add AMMO to requirements_nlp.txt with --extra-index-url for pip install

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Bump AMMO version to latest

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Guards workaround on spec definition

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Save artifacts and tokenizer config at once

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Extend nemo.utils package with new tools

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reorganize & reformat

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Tests for FP8 and INT4 AWQ

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add load_config helper function

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Unused import removal

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Fix FP8 Jenkins test

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Fix TP=2 test cont'd: no need to use mpirun

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Allow for patches in AMMO versioning

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Drop AWQ test for now (need to debug)

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Allow for patches in AMMO versioning cont'd

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Use AMMO spec from MCore as it has been published

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Make AMMO optional dependency and properly import guard it

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add Llama2 AWQ test and update some paths

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Enable specifying quantization.algorithm=null for baseline accuracy checks

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Enable exporting qnemo tarball or just to a directory

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Drop AWQ testing for now

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Test case for export.inference_tensor_parallel=2

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Flag to export TRT-LLM config.json

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@janekl janekl mentioned this pull request Dec 20, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments