[OVQuantizer] Apply Fixes and Integrate into the Llama Example Workflow#9
Conversation
|
|
||
| :param target_node: FX node representing a weighted operation (e.g., Linear, Conv). | ||
| :param nncf_graph: NNCFGraph used to determine weight port indices. | ||
|
|
| def _get_weight_edge( | ||
| target_node: torch.fx.Node, | ||
| nncf_graph: NNCFGraph, | ||
| ): |
There was a problem hiding this comment.
| ): | |
| ) -> tuple[torch.fx.Node, torch.fx.Node]: |
| :param graph: The underlying FX graph. | ||
| :param nncf_graph: The corresponding NNCF graph. | ||
| :param node_vs_torch_annotation: A mapping of FX nodes to quantization annotations. | ||
|
|
| model: torch.fx.GraphModule, | ||
| graph: torch.fx.Graph, | ||
| nncf_graph: NNCFGraph, | ||
| node_vs_torch_annotation: DefaultDict[torch.fx.Node, QuantizationAnnotation], |
There was a problem hiding this comment.
Could you please create the defaultdicts in each function separately and remove the node_vs_torch_annotation parameter?
| else: | ||
| return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype) |
There was a problem hiding this comment.
| else: | |
| return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype) | |
| return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype) |
| q_weight: torch.Tensor, | ||
| original_weight: torch.Tensor, | ||
| ) -> BaseWeightsDecompressor: | ||
| if zero_point is not None: |
There was a problem hiding this comment.
What if we invert the condition here? IMHO is None is clearer than is not None :)
| if zero_point is not None: | |
| if zero_point is None: |
| q_weight: torch.Tensor, | ||
| original_weight: torch.Tensor, | ||
| ) -> BaseWeightsDecompressor: | ||
| if zero_point is not None: |
There was a problem hiding this comment.
The same comment as above regarding the condition
| observer: Type[UniformQuantizationObserverBase] | ||
|
|
||
| extra_args: Dict[str, Any] = {} |
There was a problem hiding this comment.
Let's use the wc_param as an actual keyword here. A dict is not needed here
| observer: Type[UniformQuantizationObserverBase] | |
| extra_args: Dict[str, Any] = {} | |
| observer: Type[WeightObserverBase] |
| ) | ||
| return QuantizationSpec( | ||
| dtype=dtype, | ||
| observer_or_fake_quant_ctr=observer.with_args(**extra_args), |
There was a problem hiding this comment.
Can we call the constructor directly here?
| return qnn_quantizer, quant_dtype | ||
|
|
||
|
|
||
| def get_ov_quantizer( |
There was a problem hiding this comment.
The ignored scope in this function is very-very model specific. I suggest to name this function get_ov_quantizer_for_modelname and to add a small docstring to it
Co-authored-by: Daniil Lyakhov <daniil.lyakhov@intel.com>
21c43fe
into
cavusmustafa:openvino_llama_support
Summary
OpenVINO Quantizer is refactored and mixed precision by manually setting ignored scope is added.
To use this openvino quantizer path,
--pt2e_quantize openvino_8da4wcan be used for INT4 weight compression and--pt2e_quantize openvino_8da8wfor INT8 weight compression.