Integrate Automated QDQ placement tool - Part 3#703
Integrate Automated QDQ placement tool - Part 3#703willg-nv wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
|
@vishalpandya1990 could you help me review this PR? thanks! |
Sorry for the delay. Added Ajinkya for review. |
3454bba to
4b9d789
Compare
20ae533 to
99d3c0d
Compare
202b3e2 to
8674964
Compare
Signed-off-by: Will Guo <willg@nvidia.com>
8674964 to
b348350
Compare
|
|
||
| if needs_fp8_conversion: | ||
| logger.debug("Converting INT8 to FP8") | ||
| model = int8_to_fp8(model) |
There was a problem hiding this comment.
Is this conversion function needed or can we insert Q/DQ nodes already at the correct precision?
There was a problem hiding this comment.
Tried the following test code:
def test_export_quantized_model(self):
"""Test exporting quantized model with Q/DQ."""
model = create_simple_conv_model()
autotuner = QDQAutotuner(model)
config = self._create_test_config()
autotuner.initialize(config)
with open("/tmp/autotuner_model.quant.onnx", "w") as f: # tempfile.NamedTemporaryFile(suffix=".onnx", delete=False) as f:
output_path = f.name
try:
# Export baseline without Q/DQ insertion
autotuner.export_onnx(output_path, insert_qdq=True)
# Verify file was created
assert os.path.exists(output_path)
# Verify it's a valid ONNX model
exported_model = onnx.load(output_path)
assert exported_model is not None
# Verify that it contains Q/DQ nodes
qdq_nodes = [n for n in exported_model.graph.node if n.op_type in ["QuantizeLinear", "DequantizeLinear"]]
assert qdq_nodes, "Q/DQ nodes not found in quantized model"
print("✓ QDQAutotuner export quantized model")
finally:
print()
# if os.path.exists(output_path):
# os.unlink(output_path)But the simple Conv->Relu model didn't get quantized. Is this expected?
[modelopt][onnx] - DEBUG - Region 0 (level 0)
[modelopt][onnx] - DEBUG - → Pattern signature: Conv->Relu
[modelopt][onnx] - DEBUG - → No scheme available, skipping
[modelopt][onnx] - DEBUG - Matched 0/1 regions, total 0 unique insertion points
[modelopt][onnx] - DEBUG - Inserting 0 Q/DQ pairs into graph
[modelopt][onnx] - DEBUG - Serializing to ONNX format
[modelopt][onnx] - INFO - Exported INT8 model with 0 Q/DQ pairs → /tmp/autotuner_model.quant.onnx
✓ QDQAutotuner export quantized modelThere was a problem hiding this comment.
I think the above result is expected. because export_onnx(insert_qdq=True) means use the autotune insertion points to insert Q/DQ. Since the regions in autotuner is not tuned, there should be no QDQ node inserted.
There was a problem hiding this comment.
For model = int8_to_fp8(model), I don't know how to create fp8 QDQ ONNX natively. So I use int8 Q/DQ nodes and converts to fp8.
| scheme_idx = autotuner.generate() | ||
|
|
||
| # Should return a valid index (>= 0) or -1 if no more unique schemes | ||
| assert isinstance(scheme_idx, int) |
There was a problem hiding this comment.
What's the expected scheme_idx for create_simple_conv_model()? Please update this assert accordingly. Thanks.
|
Can we add a test file for
|
| # TensorRT Benchmark | ||
| trt_group = parser.add_argument_group("TensorRT Benchmark") | ||
| trt_group.add_argument( | ||
| "--use_trtexec", |
There was a problem hiding this comment.
The following CLI fails to perform benchmark / quantize the model (this uses TensorRTPyBenchmark):
$ python -m modelopt.onnx.quantization.autotune --onnx_path=conv_relu.onnxError:
[modelopt][onnx] - ERROR - Benchmark instance not initialized
[modelopt][onnx] - INFO - Results: 3.73 ms → failed (invalid measurement)
This failure happens because pycuda was not installed. After installing that dependency, no error is thrown but the model is not quantized.
- @ajrasane should we create another optional_dep in setup.py with autotune's dependencies?
There was a problem hiding this comment.
If --use_trtexec is used, autotune does not fail but also doesn't generate a quantized model.
This is due to Latency being used as a measurement instead of GPU Compute Time.
There was a problem hiding this comment.
If it is just pycuda, we can probably just include this in the modelopt onnx dependencies. But if we have more dependencies, then it would be better to create a new section in setup.py with autotune dependencies.
There was a problem hiding this comment.
@willg-nv how should we approach the tensorrt / trtexec requirements for autotune? Are we just adding a disclaimer for the user in the README or adding that in setup.py?
Suggestion for |
## What does this PR do? **Type of change:** new feature **Overview:** This PR integrates an automatical QDQ placment tool into ModelOpt. This PR is the 1/4 parts of the change, it contains the following changes: 1. Defines common types: Region, RegionType, Error types 2. Defines InsertionPoints (the logical localtion to place QDQ pairs), InsertionScheme (a set of insertion points) 3. Unit tests for new types Part 1: #701 Part 2: #702 Part 3: #703 Part 4: #704 ## Usage ```python # Region type usage: region = Region(region_id=1, level=0, region_type=RegionType.LEAF) assert region.get_id() == 1 assert region.get_level() == 0 region.add_node(1) # 1 is the index of ONNX graph node ... point = NodeInputInsertionPoint(node_index=0, input_index=2) assert point.node_index == 0 # relative node index in region assert point.input_index == 2 # relative input tensor index in specific node resolved = point.resolve(region, graph) ... ``` ## Testing Implement unit tests, all tests could get passed. ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: No, document change will be included in part 4. - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No, this could be done when all parts of the change are merged. ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added foundational autotuner infrastructure for quantization optimization, including region hierarchies and insertion scheme management. * Introduced insertion point system for managing quantize/dequantize operation placement across ONNX graph regions. * Added utility functions for tensor consumer mapping and boolean operation identification. * **Tests** * Added comprehensive test coverage for autotuner components, insertion points, and region management. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Will Guo <willg@nvidia.com>
## What does this PR do? **Type of change:** new feature **Overview:** This PR integrates an automatical QDQ placment tool into ModelOpt. This PR is the 1/4 parts of the change, it contains the following changes: 1. Defines common types: Region, RegionType, Error types 2. Defines InsertionPoints (the logical localtion to place QDQ pairs), InsertionScheme (a set of insertion points) 3. Unit tests for new types Part 1: #701 Part 2: #702 Part 3: #703 Part 4: #704 ## Usage ```python # Region type usage: region = Region(region_id=1, level=0, region_type=RegionType.LEAF) assert region.get_id() == 1 assert region.get_level() == 0 region.add_node(1) # 1 is the index of ONNX graph node ... point = NodeInputInsertionPoint(node_index=0, input_index=2) assert point.node_index == 0 # relative node index in region assert point.input_index == 2 # relative input tensor index in specific node resolved = point.resolve(region, graph) ... ``` ## Testing Implement unit tests, all tests could get passed. ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: No, document change will be included in part 4. - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No, this could be done when all parts of the change are merged. ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added foundational autotuner infrastructure for quantization optimization, including region hierarchies and insertion scheme management. * Introduced insertion point system for managing quantize/dequantize operation placement across ONNX graph regions. * Added utility functions for tensor consumer mapping and boolean operation identification. * **Tests** * Added comprehensive test coverage for autotuner components, insertion points, and region management. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Will Guo <willg@nvidia.com>
What does this PR do?
Type of change: new feature
Overview: This PR integrates automated QDQ placement tool to ModelOpt. This PR is 3/4 of the change. This PR contains the following changes:
Part 1: #701
Part 2: #702
Part 3: #703
Part 4: #704
Usage
Testing
Implemented unit tests for QDQAutotuner and Config classes.
Before your PR is "Ready for review"
Additional Information