Skip to content

Add onnx subfunctions ci test#43

Closed
vbaddi wants to merge 3 commits intomainfrom
add_onnx_subfunctions_ci_v1
Closed

Add onnx subfunctions ci test#43
vbaddi wants to merge 3 commits intomainfrom
add_onnx_subfunctions_ci_v1

Conversation

@vbaddi
Copy link
Copy Markdown
Collaborator

@vbaddi vbaddi commented Nov 3, 2025

No description provided.

qraniumcitest and others added 3 commits October 29, 2025 10:00
Signed-off-by: qraniumcitest <rmakar@qti.qualcomm.com>
Signed-off-by: vbaddi <quic_vbaddi@quicinc.com>
Signed-off-by: vbaddi <quic_vbaddi@quicinc.com>
@qraniumcitest
Copy link
Copy Markdown
Owner

From GHES (comment) by @RBagla

Code Assistant

Reviewed Commits: cb7da87, 6b2a01b, bd0d9ef, 7c53cdd

Updated the correct code with updated syntax, removed device_group
parameter in model.compile()

Signed-off-by: Sharvari Medhe smedhe@qti.qualcomm.com

  • 6b2a01b: Create Mirror_Fork_PRs_to_GHES.yml

Signed-off-by: qraniumcitest rmakar@qti.qualcomm.com

  • bd0d9ef: fix: add support for enabling onnx subfunction

Signed-off-by: vbaddi quic_vbaddi@quicinc.com

  • 7c53cdd: nit: add onnxslim package

Signed-off-by: vbaddi quic_vbaddi@quicinc.com

Pull Request Overview

This PR introduces significant enhancements to the ONNX export pipeline, including support for ONNX functions, custom operations, and improved model transformations. The changes span across multiple core modules including export logic, cache utilities, model transformations, and configuration constants.

Files Changed Summary

File Lines Changed Issues Found Highest Severity
.github/workflows/Mirror_Fork_PRs_to_GHES.yml +230 2 Medium
QEfficient/__init__.py +6 0 -
QEfficient/base/modeling_qeff.py +11 1 Low
QEfficient/base/onnx_transforms.py +162 1 Medium
QEfficient/transformers/cache_utils.py +26 1 Low
QEfficient/transformers/models/gemma3/modeling_gemma3.py +1 0 -
QEfficient/transformers/models/modeling_auto.py +14 0 -
QEfficient/transformers/models/pytorch_transforms.py +26 0 -
QEfficient/utils/constants.py +2 0 -
QEfficient/utils/patches.py +121 1 Medium
notebooks/QEfficientGPT2.ipynb +5 0 -
notebooks/QEfficientMPT.ipynb +5 0 -
pyproject.toml +1 0 -

Critical Issues Identified

  1. [Security] Hardcoded Credentials Exposure - GitHub workflow contains debug logging that could expose sensitive tokens (Medium severity)
  2. [Functionality] Missing Error Handling - Custom operation registration lacks validation and error handling (Medium severity)
  3. [Performance] Debug Print Statement - Production code contains debug print that should be removed (Low severity)
  4. [Maintainability] Incomplete Environment Variable Check - Cache utility function doesn't validate environment variable usage (Low severity)

Key Accomplishments

  • Added ONNX functions support with decoder layer export capabilities
  • Implemented custom operation transforms (RMSNorm, CtxScatter, CtxGather)
  • Introduced ONNX slim transformations for model optimization
  • Added torch patches for ONNX export compatibility
  • Updated ONNX opset version from 13 to 17
  • Enhanced KV cache handling with dynamic invalid index values

[Security] Hardcoded Token Exposure in Debug Logging - Medium Severity

The GitHub Actions workflow contains debug logging statements that print the length of the GHES_PAT token and make test API calls with the token. While the token itself isn't directly printed, these debug statements could leak information about the token in workflow logs and should be removed before production use.

Issue Details:

  • Lines 202-205 contain debug code that logs token length and makes test API calls
  • This information could be used by attackers to infer token characteristics
  • Debug code should not be present in production workflows

Fixed Code Snippet:

RESP="$(curl -sS -H "Authorization: token ${GHES_PAT}" \
              -H "Accept: application/vnd.github+json" \
              "${API}/pulls?state=open&head=${GHES_OWNER}:${BRANCH}" \
              -w "\n%{http_code}")"
HTTP_CODE="$(printf '%s\n' "$RESP" | tail -n1)"
JSON="$(printf '%s\n' "$RESP" | sed '$d')"
echo "HTTP_CODE=${HTTP_CODE}"

[Functionality] Missing Error Handling in Custom Operation Registration - Medium Severity

The CustomOpTransform.register_custom_op() method in onnx_transforms.py lacks validation and error handling. It doesn't check if an operation with the same name already exists, which could lead to silent overwrites of previously registered operations. Additionally, there's no validation of the input parameters.

Issue Details:

  • No check for duplicate operation names before registration
  • No validation that func_class and onnxscript_func are valid/compatible
  • Silent failures could occur if invalid operations are registered
  • Could lead to unexpected behavior during ONNX export

Fixed Code Snippet:

@classmethod
def register_custom_op(cls, op_name: str, func_class: Any, onnxscript_func: Any):
    """Register a custom operation."""
    if not op_name or not isinstance(op_name, str):
        raise ValueError(f"Invalid op_name: {op_name}. Must be a non-empty string.")
    
    if op_name in cls._custom_ops:
        logger.warning(f"Custom operation '{op_name}' is already registered. Overwriting.")
    
    if not callable(onnxscript_func):
        raise TypeError(f"onnxscript_func for '{op_name}' must be callable")
    
    cls._custom_ops[op_name] = (func_class, onnxscript_func)
    logger.debug(f"Registered custom operation: {op_name}")

[Performance] Debug Print Statement in Production Code - Low Severity

The _get_invalid_idx_value() function in cache_utils.py contains a debug print statement at line 441 that should be removed from production code. This print statement will output to console during every cache update operation, which could impact performance and clutter logs.

Issue Details:

  • Line 441 contains print(f"value of INVALID IDX VALUE is {invalid_idx_value}")
  • This is clearly debug code that was left in accidentally
  • Will execute on every KV cache update, potentially thousands of times per inference
  • Should use proper logging instead if needed for debugging

Fixed Code Snippet:

invalid_mask = ctx_indices > gather_limit
invalid_idx_value = _get_invalid_idx_value()
ctx_indices = torch.where(invalid_mask, invalid_idx_value, ctx_indices)

[Maintainability] Incomplete Environment Variable Validation - Low Severity

The _get_invalid_idx_value() function in cache_utils.py checks the QEFF_USE_ONNX_FUNCTIONS environment variable but doesn't validate its value beyond a simple string comparison. The function uses .lower() == "true" which means any value other than exactly "true" (case-insensitive) will be treated as false, including typos or other truthy values.

Issue Details:

  • No validation for invalid environment variable values
  • Silent fallback behavior could mask configuration errors
  • Users might set the variable to "1", "yes", "True" (with different casing) and expect it to work
  • Consider adding logging or warnings for unexpected values

Fixed Code Snippet:

def _get_invalid_idx_value():
    """
    Get the appropriate invalid index value for CtxGather operations.

    For ONNX export with functions, we use 0 to avoid INT32_MAX constants
    that cause issues when functions are inlined at runtime.

    Returns:
        int: Invalid index value (0 for ONNX functions, INT32_MAX otherwise)
    """
    if torch.onnx.is_in_onnx_export():
        # Check if ONNX functions are being used
        use_onnx_functions_str = os.environ.get("QEFF_USE_ONNX_FUNCTIONS", "false").lower()
        use_onnx_functions = use_onnx_functions_str in ("true", "1", "yes")
        
        if use_onnx_functions_str not in ("true", "false", "1", "0", "yes", "no"):
            import logging
            logger = logging.getLogger(__name__)
            logger.warning(f"Invalid value for QEFF_USE_ONNX_FUNCTIONS: '{use_onnx_functions_str}'. Using default (false).")
        
        if use_onnx_functions:
            # For ONNX functions: use 0 to avoid function inlining issues
            return 0
        else:
            # For regular ONNX export: use INT32_MAX as before
            return torch.iinfo(torch.int32).max
    else:
        # For runtime: use 0
        return 0

** Version 1.0**

Help us improve!

How useful was this code feedback? Not very useful 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ Very useful

How much time did it save you (in hours)? 0 | <1 | 1-3 | >4

Let us know your detailed feedback

Source: https://github.qualcomm.com/qranium/efficient-transformers/pull/42#issuecomment-1007487

@qraniumcitest
Copy link
Copy Markdown
Owner

From GHES (comment) by @RBagla

Code Assistant

Reviewed Commits: cb7da87, 6b2a01b, bd0d9ef, 7c53cdd

Updated the correct code with updated syntax, removed device_group
parameter in model.compile()

Signed-off-by: Sharvari Medhe smedhe@qti.qualcomm.com

  • 6b2a01b: Create Mirror_Fork_PRs_to_GHES.yml

Signed-off-by: qraniumcitest rmakar@qti.qualcomm.com

  • bd0d9ef: fix: add support for enabling onnx subfunction

Signed-off-by: vbaddi quic_vbaddi@quicinc.com

  • 7c53cdd: nit: add onnxslim package

Signed-off-by: vbaddi quic_vbaddi@quicinc.com

Pull Request Overview

This PR introduces significant enhancements to the ONNX export pipeline, including support for ONNX functions, custom operations, and improved model transformations. The changes span across multiple core modules including export logic, cache utilities, model transformations, and configuration constants.

Files Changed Summary

File Lines Changed Issues Found Highest Severity
.github/workflows/Mirror_Fork_PRs_to_GHES.yml +230 2 Medium
QEfficient/__init__.py +6 0 -
QEfficient/base/modeling_qeff.py +11 1 Low
QEfficient/base/onnx_transforms.py +162 1 Medium
QEfficient/transformers/cache_utils.py +26 1 Low
QEfficient/transformers/models/gemma3/modeling_gemma3.py +1 0 -
QEfficient/transformers/models/modeling_auto.py +14 0 -
QEfficient/transformers/models/pytorch_transforms.py +26 0 -
QEfficient/utils/constants.py +2 0 -
QEfficient/utils/patches.py +121 1 Medium
notebooks/QEfficientGPT2.ipynb +5 0 -
notebooks/QEfficientMPT.ipynb +5 0 -
pyproject.toml +1 0 -

Critical Issues Identified

  1. [Security] Hardcoded Credentials Exposure - GitHub workflow contains debug logging that could expose sensitive tokens (Medium severity)
  2. [Functionality] Missing Error Handling - ONNX transform operations lack proper exception handling for edge cases (Medium severity)
  3. [Performance] Debug Print Statement - Production code contains debug print that should be removed (Low severity)
  4. [Maintainability] Incomplete Patch Implementation - Monkey patch may not handle all edge cases properly (Medium severity)

[Security - Medium] Potential Token Exposure in GitHub Workflow Debug Logging

The GitHub Actions workflow contains debug logging statements that print the length of the GHES_PAT token and make test API calls. While the token itself isn't directly printed, these debug statements could leak information about the token in workflow logs and should be removed before production use.

Lines 202-205 contain debugging code that:

  • Prints the token length which could help attackers estimate token format
  • Makes test API calls that could be logged
  • Should be removed or placed behind a debug flag

Fixed Code Snippet:

          HTTP_CODE="$(printf '%s\n' "$RESP" | tail -n1)"
          JSON="$(printf '%s\n' "$RESP" | sed '$d')"
          echo "HTTP_CODE=${HTTP_CODE}"
          if [ "${HTTP_CODE}" != "200" ]; then

[Functionality - Medium] Missing Error Handling in ONNX Transform Operations

The OnnxSlimTransform.apply() method raises a RuntimeError when temp_onnx_path is not provided, but this error handling is inconsistent with other transforms. The method should handle this more gracefully or ensure the parameter is always provided through the calling chain.

Additionally, the onnxslim.slim() operation on line 132 could fail for various reasons (malformed ONNX, unsupported operations, etc.) but there's no try-catch block to handle these failures gracefully.

Fixed Code Snippet:

        transformed = False
        onnx_slim_transform = True  # kwargs.get("enable_onnx_slim_transform", False)
        temp_onnx_path = kwargs.get("temp_onnx_path", None)
        if not temp_onnx_path:
            err_str = "temp_onnx_path is required for onnx-slim transform."
            raise RuntimeError(err_str)
        if onnx_slim_transform:
            try:
                transformed = True
                slimmed_model = onnxslim.slim(model)
                onnx.save(slimmed_model, temp_onnx_path)
                return slimmed_model, transformed
            except Exception as e:
                logger.warning(f"ONNX slim transformation failed: {e}. Returning original model.")
                return model, False
        return model, transformed

[Performance - Low] Debug Print Statement in Production Code

Line 441 in cache_utils.py contains a debug print statement that outputs the invalid index value. This print statement should be removed from production code as it:

  • Adds unnecessary I/O overhead during inference
  • Clutters logs with debug information
  • Was likely added for debugging and forgotten

Fixed Code Snippet:

            invalid_mask = ctx_indices > gather_limit
            invalid_idx_value = _get_invalid_idx_value()
            ctx_indices = torch.where(invalid_mask, invalid_idx_value, ctx_indices)

[Maintainability - Medium] Incomplete Monkey Patch Implementation May Not Handle All Edge Cases

The monkey patch in patches.py (lines 38-40) unconditionally sets onnx_attrs to an empty dictionary after retrieving it from the module. The comment indicates this is a fix for transformers v4.55+, but this approach:

  1. Discards any actual attributes that were collected in the pre-hook
  2. May cause issues if the original code expects these attributes
  3. Doesn't provide a version-specific check to apply the fix only when needed

This could lead to unexpected behavior with different versions of transformers or in edge cases where module attributes are actually needed.

Fixed Code Snippet:

            graph = tracing_state.graph()
            onnx_attrs = {}
            if hasattr(module, attr_name):
                onnx_attrs = getattr(module, attr_name)
                delattr(module, attr_name)
            # FIX: use empty dict to avoid type mismatch with _jit_pass_onnx_track_scope_attributes
            # Observed in transformers v4.55 and above
            # Only apply fix if onnx_attrs contains non-serializable types
            try:
                _C._jit_pass_onnx_track_scope_attributes(graph, onnx_attrs)
            except (TypeError, RuntimeError):
                # Fallback to empty dict if type mismatch occurs
                _C._jit_pass_onnx_track_scope_attributes(graph, {})

** Version 1.0**

Help us improve!

How useful was this code feedback? Not very useful 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ Very useful

How much time did it save you (in hours)? 0 | <1 | 1-3 | >4

Let us know your detailed feedback

Source: https://github.qualcomm.com/qranium/efficient-transformers/pull/42#issuecomment-1007532

@qraniumcitest
Copy link
Copy Markdown
Owner

From GHES (comment) by @qgeniecodeassistant[bot]

Code Assistant

Reviewed Commits: cb7da87, 6b2a01b, bd0d9ef, 7c53cdd

Updated the correct code with updated syntax, removed device_group
parameter in model.compile()

Signed-off-by: Sharvari Medhe smedhe@qti.qualcomm.com

  • 6b2a01b: Create Mirror_Fork_PRs_to_GHES.yml

Signed-off-by: qraniumcitest rmakar@qti.qualcomm.com

  • bd0d9ef: fix: add support for enabling onnx subfunction

Signed-off-by: vbaddi quic_vbaddi@quicinc.com

  • 7c53cdd: nit: add onnxslim package

Signed-off-by: vbaddi quic_vbaddi@quicinc.com

Pull Request Overview

This PR introduces significant enhancements to the ONNX export pipeline, including support for ONNX functions, custom operations, and improved model transformations. The changes span across multiple core modules including export logic, cache utilities, model transformations, and configuration constants.

Files Changed Summary

File Lines Changed Issues Found Highest Severity
.github/workflows/Mirror_Fork_PRs_to_GHES.yml +230 2 Medium
QEfficient/__init__.py +6 0 -
QEfficient/base/modeling_qeff.py +11 1 Low
QEfficient/base/onnx_transforms.py +162 1 Medium
QEfficient/transformers/cache_utils.py +26 1 Low
QEfficient/transformers/models/gemma3/modeling_gemma3.py +1 0 -
QEfficient/transformers/models/modeling_auto.py +14 0 -
QEfficient/transformers/models/pytorch_transforms.py +26 0 -
QEfficient/utils/constants.py +2 0 -
QEfficient/utils/patches.py +121 1 Medium
notebooks/QEfficientGPT2.ipynb +5 0 -
notebooks/QEfficientMPT.ipynb +5 0 -
pyproject.toml +1 0 -

Critical Issues Identified

  1. [Security] Hardcoded Credentials Exposure - GitHub workflow contains debug logging that could expose sensitive tokens (Medium severity)
  2. [Functionality] Missing Error Handling - ONNX transform operations lack proper exception handling for edge cases (Medium severity)
  3. [Performance] Debug Print Statement - Production code contains debug print that should be removed (Low severity)
  4. [Maintainability] Incomplete Patch Implementation - Monkey patch may not handle all edge cases properly (Medium severity)

[Security - Medium] Potential Token Exposure in GitHub Workflow Debug Logging

The GitHub Actions workflow contains debug logging statements that print the length of the GHES_PAT token and make test API calls. While the token itself isn't directly printed, these debug statements could leak information about the token in workflow logs and should be removed before production use.

Lines 202-205 contain debugging code that:

  • Prints the token length which could help attackers estimate token format
  • Makes test API calls that could be logged
  • Should be removed or placed behind a debug flag

Fixed Code Snippet:

          HTTP_CODE="$(printf '%s\n' "$RESP" | tail -n1)"
          JSON="$(printf '%s\n' "$RESP" | sed '$d')"
          echo "HTTP_CODE=${HTTP_CODE}"
          if [ "${HTTP_CODE}" != "200" ]; then

[Functionality - Medium] Missing Error Handling in ONNX Transform Operations

The OnnxSlimTransform.apply() method raises a RuntimeError when temp_onnx_path is not provided, but this error handling is inconsistent with other transforms. The method should handle this more gracefully or ensure the parameter is always provided through the calling chain.

Additionally, the onnxslim.slim() operation on line 132 could fail for various reasons (malformed ONNX, unsupported operations, etc.) but there's no try-catch block to handle these failures gracefully.

Fixed Code Snippet:

        transformed = False
        onnx_slim_transform = True  # kwargs.get("enable_onnx_slim_transform", False)
        temp_onnx_path = kwargs.get("temp_onnx_path", None)
        if not temp_onnx_path:
            err_str = "temp_onnx_path is required for onnx-slim transform."
            raise RuntimeError(err_str)
        if onnx_slim_transform:
            try:
                transformed = True
                slimmed_model = onnxslim.slim(model)
                onnx.save(slimmed_model, temp_onnx_path)
                return slimmed_model, transformed
            except Exception as e:
                logger.warning(f"ONNX slim transformation failed: {e}. Returning original model.")
                return model, False
        return model, transformed

[Performance - Low] Debug Print Statement in Production Code

Line 441 in cache_utils.py contains a debug print statement that outputs the invalid index value. This print statement should be removed from production code as it:

  • Adds unnecessary I/O overhead during inference
  • Clutters logs with debug information
  • Was likely added for debugging and forgotten

Fixed Code Snippet:

            invalid_mask = ctx_indices > gather_limit
            invalid_idx_value = _get_invalid_idx_value()
            ctx_indices = torch.where(invalid_mask, invalid_idx_value, ctx_indices)

[Maintainability - Medium] Incomplete Monkey Patch Implementation May Not Handle All Edge Cases

The monkey patch in patches.py (lines 38-40) unconditionally sets onnx_attrs to an empty dictionary after retrieving it from the module. The comment indicates this is a fix for transformers v4.55+, but this approach:

  1. Discards any actual attributes that were collected in the pre-hook
  2. May cause issues if the original code expects these attributes
  3. Doesn't provide a version-specific check to apply the fix only when needed

This could lead to unexpected behavior with different versions of transformers or in edge cases where module attributes are actually needed.

Fixed Code Snippet:

            graph = tracing_state.graph()
            onnx_attrs = {}
            if hasattr(module, attr_name):
                onnx_attrs = getattr(module, attr_name)
                delattr(module, attr_name)
            # FIX: use empty dict to avoid type mismatch with _jit_pass_onnx_track_scope_attributes
            # Observed in transformers v4.55 and above
            # Only apply fix if onnx_attrs contains non-serializable types
            try:
                _C._jit_pass_onnx_track_scope_attributes(graph, onnx_attrs)
            except (TypeError, RuntimeError):
                # Fallback to empty dict if type mismatch occurs
                _C._jit_pass_onnx_track_scope_attributes(graph, {})

** Version 1.3.6**

Help us improve!

How useful was this code feedback? Not very useful 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ Very useful

How much time did it save you (in hours)? 0 | <1 | 1-3 | >4

Let us know your detailed feedback

Source: https://github.qualcomm.com/qranium/efficient-transformers/pull/47#issuecomment-1010632

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants