Add onnx subfunctions ci test by vbaddi · Pull Request #43 · qraniumcitest/efficient-transformers

vbaddi · 2025-11-03T09:10:00Z

No description provided.

Signed-off-by: qraniumcitest <rmakar@qti.qualcomm.com>

Signed-off-by: vbaddi <quic_vbaddi@quicinc.com>

qraniumcitest · 2025-11-14T06:43:35Z

From GHES (comment) by @RBagla

Code Assistant

Reviewed Commits: cb7da87, 6b2a01b, bd0d9ef, 7c53cdd

cb7da87: updated notebooks (updated notebooks quic/efficient-transformers#543)

Updated the correct code with updated syntax, removed device_group
parameter in model.compile()

Signed-off-by: Sharvari Medhe smedhe@qti.qualcomm.com

6b2a01b: Create Mirror_Fork_PRs_to_GHES.yml

Signed-off-by: qraniumcitest rmakar@qti.qualcomm.com

bd0d9ef: fix: add support for enabling onnx subfunction

Signed-off-by: vbaddi quic_vbaddi@quicinc.com

7c53cdd: nit: add onnxslim package

Signed-off-by: vbaddi quic_vbaddi@quicinc.com

Pull Request Overview

This PR introduces significant enhancements to the ONNX export pipeline, including support for ONNX functions, custom operations, and improved model transformations. The changes span across multiple core modules including export logic, cache utilities, model transformations, and configuration constants.

Files Changed Summary

File	Lines Changed	Issues Found	Highest Severity
`.github/workflows/Mirror_Fork_PRs_to_GHES.yml`	+230	2	Medium
`QEfficient/__init__.py`	+6	0	-
`QEfficient/base/modeling_qeff.py`	+11	1	Low
`QEfficient/base/onnx_transforms.py`	+162	1	Medium
`QEfficient/transformers/cache_utils.py`	+26	1	Low
`QEfficient/transformers/models/gemma3/modeling_gemma3.py`	+1	0	-
`QEfficient/transformers/models/modeling_auto.py`	+14	0	-
`QEfficient/transformers/models/pytorch_transforms.py`	+26	0	-
`QEfficient/utils/constants.py`	+2	0	-
`QEfficient/utils/patches.py`	+121	1	Medium
`notebooks/QEfficientGPT2.ipynb`	+5	0	-
`notebooks/QEfficientMPT.ipynb`	+5	0	-
`pyproject.toml`	+1	0	-

Critical Issues Identified

[Security] Hardcoded Credentials Exposure - GitHub workflow contains debug logging that could expose sensitive tokens (Medium severity)
[Functionality] Missing Error Handling - Custom operation registration lacks validation and error handling (Medium severity)
[Performance] Debug Print Statement - Production code contains debug print that should be removed (Low severity)
[Maintainability] Incomplete Environment Variable Check - Cache utility function doesn't validate environment variable usage (Low severity)

Key Accomplishments

Added ONNX functions support with decoder layer export capabilities
Implemented custom operation transforms (RMSNorm, CtxScatter, CtxGather)
Introduced ONNX slim transformations for model optimization
Added torch patches for ONNX export compatibility
Updated ONNX opset version from 13 to 17
Enhanced KV cache handling with dynamic invalid index values

[Security] Hardcoded Token Exposure in Debug Logging - Medium Severity

The GitHub Actions workflow contains debug logging statements that print the length of the GHES_PAT token and make test API calls with the token. While the token itself isn't directly printed, these debug statements could leak information about the token in workflow logs and should be removed before production use.

Issue Details:

Lines 202-205 contain debug code that logs token length and makes test API calls
This information could be used by attackers to infer token characteristics
Debug code should not be present in production workflows

Fixed Code Snippet:

RESP="$(curl -sS -H "Authorization: token ${GHES_PAT}" \
              -H "Accept: application/vnd.github+json" \
              "${API}/pulls?state=open&head=${GHES_OWNER}:${BRANCH}" \
              -w "\n%{http_code}")"
HTTP_CODE="$(printf '%s\n' "$RESP" | tail -n1)"
JSON="$(printf '%s\n' "$RESP" | sed '$d')"
echo "HTTP_CODE=${HTTP_CODE}"

[Functionality] Missing Error Handling in Custom Operation Registration - Medium Severity

The CustomOpTransform.register_custom_op() method in onnx_transforms.py lacks validation and error handling. It doesn't check if an operation with the same name already exists, which could lead to silent overwrites of previously registered operations. Additionally, there's no validation of the input parameters.

Issue Details:

No check for duplicate operation names before registration
No validation that func_class and onnxscript_func are valid/compatible
Silent failures could occur if invalid operations are registered
Could lead to unexpected behavior during ONNX export

Fixed Code Snippet:

@classmethod
def register_custom_op(cls, op_name: str, func_class: Any, onnxscript_func: Any):
    """Register a custom operation."""
    if not op_name or not isinstance(op_name, str):
        raise ValueError(f"Invalid op_name: {op_name}. Must be a non-empty string.")
    
    if op_name in cls._custom_ops:
        logger.warning(f"Custom operation '{op_name}' is already registered. Overwriting.")
    
    if not callable(onnxscript_func):
        raise TypeError(f"onnxscript_func for '{op_name}' must be callable")
    
    cls._custom_ops[op_name] = (func_class, onnxscript_func)
    logger.debug(f"Registered custom operation: {op_name}")

[Performance] Debug Print Statement in Production Code - Low Severity

The _get_invalid_idx_value() function in cache_utils.py contains a debug print statement at line 441 that should be removed from production code. This print statement will output to console during every cache update operation, which could impact performance and clutter logs.

Issue Details:

Line 441 contains print(f"value of INVALID IDX VALUE is {invalid_idx_value}")
This is clearly debug code that was left in accidentally
Will execute on every KV cache update, potentially thousands of times per inference
Should use proper logging instead if needed for debugging

Fixed Code Snippet:

invalid_mask = ctx_indices > gather_limit
invalid_idx_value = _get_invalid_idx_value()
ctx_indices = torch.where(invalid_mask, invalid_idx_value, ctx_indices)

[Maintainability] Incomplete Environment Variable Validation - Low Severity

The _get_invalid_idx_value() function in cache_utils.py checks the QEFF_USE_ONNX_FUNCTIONS environment variable but doesn't validate its value beyond a simple string comparison. The function uses .lower() == "true" which means any value other than exactly "true" (case-insensitive) will be treated as false, including typos or other truthy values.

Issue Details:

No validation for invalid environment variable values
Silent fallback behavior could mask configuration errors
Users might set the variable to "1", "yes", "True" (with different casing) and expect it to work
Consider adding logging or warnings for unexpected values

Fixed Code Snippet:

def _get_invalid_idx_value():
    """
    Get the appropriate invalid index value for CtxGather operations.

    For ONNX export with functions, we use 0 to avoid INT32_MAX constants
    that cause issues when functions are inlined at runtime.

    Returns:
        int: Invalid index value (0 for ONNX functions, INT32_MAX otherwise)
    """
    if torch.onnx.is_in_onnx_export():
        # Check if ONNX functions are being used
        use_onnx_functions_str = os.environ.get("QEFF_USE_ONNX_FUNCTIONS", "false").lower()
        use_onnx_functions = use_onnx_functions_str in ("true", "1", "yes")
        
        if use_onnx_functions_str not in ("true", "false", "1", "0", "yes", "no"):
            import logging
            logger = logging.getLogger(__name__)
            logger.warning(f"Invalid value for QEFF_USE_ONNX_FUNCTIONS: '{use_onnx_functions_str}'. Using default (false).")
        
        if use_onnx_functions:
            # For ONNX functions: use 0 to avoid function inlining issues
            return 0
        else:
            # For regular ONNX export: use INT32_MAX as before
            return torch.iinfo(torch.int32).max
    else:
        # For runtime: use 0
        return 0

** Version 1.0**

Help us improve!

How useful was this code feedback? Not very useful 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ Very useful

How much time did it save you (in hours)? 0 | <1 | 1-3 | >4

Let us know your detailed feedback

Source: https://github.qualcomm.com/qranium/efficient-transformers/pull/42#issuecomment-1007487

qraniumcitest · 2025-11-14T07:11:24Z

From GHES (comment) by @RBagla

Code Assistant

Reviewed Commits: cb7da87, 6b2a01b, bd0d9ef, 7c53cdd

cb7da87: updated notebooks (updated notebooks quic/efficient-transformers#543)

Updated the correct code with updated syntax, removed device_group
parameter in model.compile()

Signed-off-by: Sharvari Medhe smedhe@qti.qualcomm.com

6b2a01b: Create Mirror_Fork_PRs_to_GHES.yml

Signed-off-by: qraniumcitest rmakar@qti.qualcomm.com

bd0d9ef: fix: add support for enabling onnx subfunction

Signed-off-by: vbaddi quic_vbaddi@quicinc.com

7c53cdd: nit: add onnxslim package

Signed-off-by: vbaddi quic_vbaddi@quicinc.com

Pull Request Overview

This PR introduces significant enhancements to the ONNX export pipeline, including support for ONNX functions, custom operations, and improved model transformations. The changes span across multiple core modules including export logic, cache utilities, model transformations, and configuration constants.

Files Changed Summary

File	Lines Changed	Issues Found	Highest Severity
`.github/workflows/Mirror_Fork_PRs_to_GHES.yml`	+230	2	Medium
`QEfficient/__init__.py`	+6	0	-
`QEfficient/base/modeling_qeff.py`	+11	1	Low
`QEfficient/base/onnx_transforms.py`	+162	1	Medium
`QEfficient/transformers/cache_utils.py`	+26	1	Low
`QEfficient/transformers/models/gemma3/modeling_gemma3.py`	+1	0	-
`QEfficient/transformers/models/modeling_auto.py`	+14	0	-
`QEfficient/transformers/models/pytorch_transforms.py`	+26	0	-
`QEfficient/utils/constants.py`	+2	0	-
`QEfficient/utils/patches.py`	+121	1	Medium
`notebooks/QEfficientGPT2.ipynb`	+5	0	-
`notebooks/QEfficientMPT.ipynb`	+5	0	-
`pyproject.toml`	+1	0	-

Critical Issues Identified

[Security] Hardcoded Credentials Exposure - GitHub workflow contains debug logging that could expose sensitive tokens (Medium severity)
[Functionality] Missing Error Handling - ONNX transform operations lack proper exception handling for edge cases (Medium severity)
[Performance] Debug Print Statement - Production code contains debug print that should be removed (Low severity)
[Maintainability] Incomplete Patch Implementation - Monkey patch may not handle all edge cases properly (Medium severity)

[Security - Medium] Potential Token Exposure in GitHub Workflow Debug Logging

The GitHub Actions workflow contains debug logging statements that print the length of the GHES_PAT token and make test API calls. While the token itself isn't directly printed, these debug statements could leak information about the token in workflow logs and should be removed before production use.

Lines 202-205 contain debugging code that:

Prints the token length which could help attackers estimate token format
Makes test API calls that could be logged
Should be removed or placed behind a debug flag

Fixed Code Snippet:

          HTTP_CODE="$(printf '%s\n' "$RESP" | tail -n1)"
          JSON="$(printf '%s\n' "$RESP" | sed '$d')"
          echo "HTTP_CODE=${HTTP_CODE}"
          if [ "${HTTP_CODE}" != "200" ]; then

[Functionality - Medium] Missing Error Handling in ONNX Transform Operations

The OnnxSlimTransform.apply() method raises a RuntimeError when temp_onnx_path is not provided, but this error handling is inconsistent with other transforms. The method should handle this more gracefully or ensure the parameter is always provided through the calling chain.

Additionally, the onnxslim.slim() operation on line 132 could fail for various reasons (malformed ONNX, unsupported operations, etc.) but there's no try-catch block to handle these failures gracefully.

Fixed Code Snippet:

        transformed = False
        onnx_slim_transform = True  # kwargs.get("enable_onnx_slim_transform", False)
        temp_onnx_path = kwargs.get("temp_onnx_path", None)
        if not temp_onnx_path:
            err_str = "temp_onnx_path is required for onnx-slim transform."
            raise RuntimeError(err_str)
        if onnx_slim_transform:
            try:
                transformed = True
                slimmed_model = onnxslim.slim(model)
                onnx.save(slimmed_model, temp_onnx_path)
                return slimmed_model, transformed
            except Exception as e:
                logger.warning(f"ONNX slim transformation failed: {e}. Returning original model.")
                return model, False
        return model, transformed

[Performance - Low] Debug Print Statement in Production Code

Line 441 in cache_utils.py contains a debug print statement that outputs the invalid index value. This print statement should be removed from production code as it:

Adds unnecessary I/O overhead during inference
Clutters logs with debug information
Was likely added for debugging and forgotten

Fixed Code Snippet:

            invalid_mask = ctx_indices > gather_limit
            invalid_idx_value = _get_invalid_idx_value()
            ctx_indices = torch.where(invalid_mask, invalid_idx_value, ctx_indices)

[Maintainability - Medium] Incomplete Monkey Patch Implementation May Not Handle All Edge Cases

The monkey patch in patches.py (lines 38-40) unconditionally sets onnx_attrs to an empty dictionary after retrieving it from the module. The comment indicates this is a fix for transformers v4.55+, but this approach:

Discards any actual attributes that were collected in the pre-hook
May cause issues if the original code expects these attributes
Doesn't provide a version-specific check to apply the fix only when needed

This could lead to unexpected behavior with different versions of transformers or in edge cases where module attributes are actually needed.

Fixed Code Snippet:

            graph = tracing_state.graph()
            onnx_attrs = {}
            if hasattr(module, attr_name):
                onnx_attrs = getattr(module, attr_name)
                delattr(module, attr_name)
            # FIX: use empty dict to avoid type mismatch with _jit_pass_onnx_track_scope_attributes
            # Observed in transformers v4.55 and above
            # Only apply fix if onnx_attrs contains non-serializable types
            try:
                _C._jit_pass_onnx_track_scope_attributes(graph, onnx_attrs)
            except (TypeError, RuntimeError):
                # Fallback to empty dict if type mismatch occurs
                _C._jit_pass_onnx_track_scope_attributes(graph, {})

** Version 1.0**

Help us improve!

How useful was this code feedback? Not very useful 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ Very useful

How much time did it save you (in hours)? 0 | <1 | 1-3 | >4

Let us know your detailed feedback

Source: https://github.qualcomm.com/qranium/efficient-transformers/pull/42#issuecomment-1007532

qraniumcitest · 2025-11-18T06:00:07Z

From GHES (comment) by @qgeniecodeassistant[bot]

Code Assistant

Reviewed Commits: cb7da87, 6b2a01b, bd0d9ef, 7c53cdd

cb7da87: updated notebooks (updated notebooks quic/efficient-transformers#543)

Updated the correct code with updated syntax, removed device_group
parameter in model.compile()

Signed-off-by: Sharvari Medhe smedhe@qti.qualcomm.com

6b2a01b: Create Mirror_Fork_PRs_to_GHES.yml

Signed-off-by: qraniumcitest rmakar@qti.qualcomm.com

bd0d9ef: fix: add support for enabling onnx subfunction

Signed-off-by: vbaddi quic_vbaddi@quicinc.com

7c53cdd: nit: add onnxslim package

Signed-off-by: vbaddi quic_vbaddi@quicinc.com

Pull Request Overview

This PR introduces significant enhancements to the ONNX export pipeline, including support for ONNX functions, custom operations, and improved model transformations. The changes span across multiple core modules including export logic, cache utilities, model transformations, and configuration constants.

Files Changed Summary

File	Lines Changed	Issues Found	Highest Severity
`.github/workflows/Mirror_Fork_PRs_to_GHES.yml`	+230	2	Medium
`QEfficient/__init__.py`	+6	0	-
`QEfficient/base/modeling_qeff.py`	+11	1	Low
`QEfficient/base/onnx_transforms.py`	+162	1	Medium
`QEfficient/transformers/cache_utils.py`	+26	1	Low
`QEfficient/transformers/models/gemma3/modeling_gemma3.py`	+1	0	-
`QEfficient/transformers/models/modeling_auto.py`	+14	0	-
`QEfficient/transformers/models/pytorch_transforms.py`	+26	0	-
`QEfficient/utils/constants.py`	+2	0	-
`QEfficient/utils/patches.py`	+121	1	Medium
`notebooks/QEfficientGPT2.ipynb`	+5	0	-
`notebooks/QEfficientMPT.ipynb`	+5	0	-
`pyproject.toml`	+1	0	-

Critical Issues Identified

[Security] Hardcoded Credentials Exposure - GitHub workflow contains debug logging that could expose sensitive tokens (Medium severity)
[Functionality] Missing Error Handling - ONNX transform operations lack proper exception handling for edge cases (Medium severity)
[Performance] Debug Print Statement - Production code contains debug print that should be removed (Low severity)
[Maintainability] Incomplete Patch Implementation - Monkey patch may not handle all edge cases properly (Medium severity)

[Security - Medium] Potential Token Exposure in GitHub Workflow Debug Logging

The GitHub Actions workflow contains debug logging statements that print the length of the GHES_PAT token and make test API calls. While the token itself isn't directly printed, these debug statements could leak information about the token in workflow logs and should be removed before production use.

Lines 202-205 contain debugging code that:

Prints the token length which could help attackers estimate token format
Makes test API calls that could be logged
Should be removed or placed behind a debug flag

Fixed Code Snippet:

          HTTP_CODE="$(printf '%s\n' "$RESP" | tail -n1)"
          JSON="$(printf '%s\n' "$RESP" | sed '$d')"
          echo "HTTP_CODE=${HTTP_CODE}"
          if [ "${HTTP_CODE}" != "200" ]; then

[Functionality - Medium] Missing Error Handling in ONNX Transform Operations

The OnnxSlimTransform.apply() method raises a RuntimeError when temp_onnx_path is not provided, but this error handling is inconsistent with other transforms. The method should handle this more gracefully or ensure the parameter is always provided through the calling chain.

Additionally, the onnxslim.slim() operation on line 132 could fail for various reasons (malformed ONNX, unsupported operations, etc.) but there's no try-catch block to handle these failures gracefully.

Fixed Code Snippet:

        transformed = False
        onnx_slim_transform = True  # kwargs.get("enable_onnx_slim_transform", False)
        temp_onnx_path = kwargs.get("temp_onnx_path", None)
        if not temp_onnx_path:
            err_str = "temp_onnx_path is required for onnx-slim transform."
            raise RuntimeError(err_str)
        if onnx_slim_transform:
            try:
                transformed = True
                slimmed_model = onnxslim.slim(model)
                onnx.save(slimmed_model, temp_onnx_path)
                return slimmed_model, transformed
            except Exception as e:
                logger.warning(f"ONNX slim transformation failed: {e}. Returning original model.")
                return model, False
        return model, transformed

[Performance - Low] Debug Print Statement in Production Code

Line 441 in cache_utils.py contains a debug print statement that outputs the invalid index value. This print statement should be removed from production code as it:

Adds unnecessary I/O overhead during inference
Clutters logs with debug information
Was likely added for debugging and forgotten

Fixed Code Snippet:

            invalid_mask = ctx_indices > gather_limit
            invalid_idx_value = _get_invalid_idx_value()
            ctx_indices = torch.where(invalid_mask, invalid_idx_value, ctx_indices)

[Maintainability - Medium] Incomplete Monkey Patch Implementation May Not Handle All Edge Cases

The monkey patch in patches.py (lines 38-40) unconditionally sets onnx_attrs to an empty dictionary after retrieving it from the module. The comment indicates this is a fix for transformers v4.55+, but this approach:

Discards any actual attributes that were collected in the pre-hook
May cause issues if the original code expects these attributes
Doesn't provide a version-specific check to apply the fix only when needed

This could lead to unexpected behavior with different versions of transformers or in edge cases where module attributes are actually needed.

Fixed Code Snippet:

            graph = tracing_state.graph()
            onnx_attrs = {}
            if hasattr(module, attr_name):
                onnx_attrs = getattr(module, attr_name)
                delattr(module, attr_name)
            # FIX: use empty dict to avoid type mismatch with _jit_pass_onnx_track_scope_attributes
            # Observed in transformers v4.55 and above
            # Only apply fix if onnx_attrs contains non-serializable types
            try:
                _C._jit_pass_onnx_track_scope_attributes(graph, onnx_attrs)
            except (TypeError, RuntimeError):
                # Fallback to empty dict if type mismatch occurs
                _C._jit_pass_onnx_track_scope_attributes(graph, {})

** Version 1.3.6**

Help us improve!

How useful was this code feedback? Not very useful 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ Very useful

How much time did it save you (in hours)? 0 | <1 | 1-3 | >4

Let us know your detailed feedback

Source: https://github.qualcomm.com/qranium/efficient-transformers/pull/47#issuecomment-1010632

qraniumcitest and others added 3 commits October 29, 2025 10:00

Create Mirror_Fork_PRs_to_GHES.yml

6b2a01b

Signed-off-by: qraniumcitest <rmakar@qti.qualcomm.com>

fix: add support for enabling onnx subfunction

bd0d9ef

Signed-off-by: vbaddi <quic_vbaddi@quicinc.com>

nit: add onnxslim package

7c53cdd

Signed-off-by: vbaddi <quic_vbaddi@quicinc.com>

vbaddi requested review from quic-hemagnih and quic-rishinr as code owners November 3, 2025 09:10

qraniumcitest closed this Nov 12, 2025

qraniumcitest reopened this Nov 12, 2025

qraniumcitest closed this Nov 18, 2025

qraniumcitest reopened this Nov 18, 2025

qraniumcitest force-pushed the main branch from 33d6bab to f4ff803 Compare November 18, 2025 09:34

qraniumcitest force-pushed the main branch from b0c0a43 to 32f30c0 Compare January 22, 2026 15:01

qraniumcitest closed this Jan 22, 2026

qraniumcitest deleted the add_onnx_subfunctions_ci_v1 branch January 30, 2026 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add onnx subfunctions ci test#43

Add onnx subfunctions ci test#43
vbaddi wants to merge 3 commits intomainfrom
add_onnx_subfunctions_ci_v1

vbaddi commented Nov 3, 2025

Uh oh!

qraniumcitest commented Nov 14, 2025

Uh oh!

qraniumcitest commented Nov 14, 2025

Uh oh!

qraniumcitest commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vbaddi commented Nov 3, 2025

Uh oh!

qraniumcitest commented Nov 14, 2025

Code Assistant

Pull Request Overview

Files Changed Summary

Critical Issues Identified

Key Accomplishments

[Security] Hardcoded Token Exposure in Debug Logging - Medium Severity

[Functionality] Missing Error Handling in Custom Operation Registration - Medium Severity

[Performance] Debug Print Statement in Production Code - Low Severity

[Maintainability] Incomplete Environment Variable Validation - Low Severity

Uh oh!

qraniumcitest commented Nov 14, 2025

Code Assistant

Pull Request Overview

Files Changed Summary

Critical Issues Identified

[Security - Medium] Potential Token Exposure in GitHub Workflow Debug Logging

[Functionality - Medium] Missing Error Handling in ONNX Transform Operations

[Performance - Low] Debug Print Statement in Production Code

[Maintainability - Medium] Incomplete Monkey Patch Implementation May Not Handle All Edge Cases

Uh oh!

qraniumcitest commented Nov 18, 2025

Code Assistant

Pull Request Overview

Files Changed Summary

Critical Issues Identified

[Security - Medium] Potential Token Exposure in GitHub Workflow Debug Logging

[Functionality - Medium] Missing Error Handling in ONNX Transform Operations

[Performance - Low] Debug Print Statement in Production Code

[Maintainability - Medium] Incomplete Monkey Patch Implementation May Not Handle All Edge Cases

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants