Skip to content

fix: add support for enabling onnx subfunction ci#42

Closed
vbaddi wants to merge 2 commits intomainfrom
add_onnx_subfunctions_ci
Closed

fix: add support for enabling onnx subfunction ci#42
vbaddi wants to merge 2 commits intomainfrom
add_onnx_subfunctions_ci

Conversation

@vbaddi
Copy link
Copy Markdown
Collaborator

@vbaddi vbaddi commented Nov 3, 2025

Jenkins-Ready

qraniumcitest and others added 2 commits October 29, 2025 10:00
Signed-off-by: qraniumcitest <rmakar@qti.qualcomm.com>
Signed-off-by: vbaddi <quic_vbaddi@quicinc.com>
@vbaddi
Copy link
Copy Markdown
Collaborator Author

vbaddi commented Nov 3, 2025

Jenkins-Ready

@qraniumcitest
Copy link
Copy Markdown
Owner

@qraniumcitest
Copy link
Copy Markdown
Owner

@qraniumcitest
Copy link
Copy Markdown
Owner

From GHES (comment) by @qgeniecodeassistant[bot]

Code Assistant

Reviewed Commits: cb7da87, 6b2a01b, bd0d9ef

Updated the correct code with updated syntax, removed device_group
parameter in model.compile()

Signed-off-by: Sharvari Medhe smedhe@qti.qualcomm.com

  • 6b2a01b: Create Mirror_Fork_PRs_to_GHES.yml

Signed-off-by: qraniumcitest rmakar@qti.qualcomm.com

  • bd0d9ef: fix: add support for enabling onnx subfunction

Signed-off-by: vbaddi quic_vbaddi@quicinc.com

PR Overview

This PR introduces significant enhancements to the ONNX export functionality, including support for exporting decoder layers as ONNX functions, custom operation transforms, ONNX graph optimization via onnx-slim, and torch patches for compatibility. It also adds a new GitHub Actions workflow for mirroring fork PRs to GHES.

Files Changed Summary

File Lines Changed Issues Found Highest Severity
.github/workflows/Mirror_Fork_PRs_to_GHES.yml +230 2 Medium
QEfficient/__init__.py +6 0 -
QEfficient/base/modeling_qeff.py +11 1 Low
QEfficient/base/onnx_transforms.py +162 1 Medium
QEfficient/transformers/cache_utils.py +26 1 Low
QEfficient/transformers/models/gemma3/modeling_gemma3.py +1 0 -
QEfficient/transformers/models/modeling_auto.py +14 0 -
QEfficient/transformers/models/pytorch_transforms.py +26 0 -
QEfficient/utils/constants.py +2 0 -
QEfficient/utils/patches.py +121 1 Medium
notebooks/QEfficientGPT2.ipynb +6 0 -
notebooks/QEfficientMPT.ipynb +6 0 -

Critical Issues

  1. [Security] Hardcoded Secrets in Workflow - The GitHub Actions workflow contains hardcoded repository and organization names that could be security-sensitive
  2. [Functionality] Debug Print Statement Left in Production Code - A debug print statement in cache_utils.py should be removed
  3. [Maintainability] Empty Dictionary Override - The patches.py file intentionally overrides onnx_attrs with an empty dict, which may cause issues
  4. [Error Handling] Missing Error Handling in ONNX Transform - The OnnxSlimTransform lacks proper error handling for onnxslim.slim() failures

[Security - Medium] Hardcoded credentials and sensitive information in GitHub workflow

The GitHub Actions workflow file contains hardcoded repository names, organization names, and base URLs that could be security-sensitive. While these may be intentional for the specific use case, it's better practice to use repository variables or secrets for such configuration.

Fixed Code Snippet:

env:
  GHES_BASE_URL: ${{ vars.GHES_BASE_URL }}
  GHES_OWNER: ${{ vars.GHES_OWNER }}
  GHES_REPO: ${{ vars.GHES_REPO }}
  FORK_OWNER: ${{ vars.FORK_OWNER }}
  FORK_REPO: ${{ vars.FORK_REPO }}

[Functionality - Medium] Debug print statement left in production code

A debug print statement is present in the production code at line 441 of cache_utils.py. This appears to be leftover debugging code that should be removed before merging to production.

Fixed Code Snippet:

invalid_idx_value = _get_invalid_idx_value()
ctx_indices = torch.where(invalid_mask, invalid_idx_value, ctx_indices)

[Maintainability - Medium] Intentional empty dictionary override may cause issues

In patches.py at line 40, the code intentionally overrides onnx_attrs with an empty dictionary after retrieving it. The comment indicates this is a fix for a type mismatch issue with transformers v4.55+, but this approach discards potentially important module attributes. This could lead to loss of metadata during ONNX export.

Consider investigating if there's a better way to handle the type mismatch that doesn't discard the attributes entirely, or add comprehensive testing to ensure this doesn't break functionality for models that rely on these attributes.

Fixed Code Snippet:

# FIX: Convert to compatible type to avoid mismatch with _jit_pass_onnx_track_scope_attributes
# Observed in transformers v4.55 and above
if onnx_attrs and not isinstance(onnx_attrs, dict):
    onnx_attrs = {}
_C._jit_pass_onnx_track_scope_attributes(graph, onnx_attrs)

[Error Handling - Low] Missing error handling in ONNX transform

The OnnxSlimTransform.apply() method at line 132 calls onnxslim.slim(model) without proper error handling. If onnxslim fails (e.g., due to unsupported operations or memory issues), the exception will propagate without context.

Fixed Code Snippet:

if onnx_slim_transform:
    transformed = True
    try:
        slimmed_model = onnxslim.slim(model)
        onnx.save(slimmed_model, temp_onnx_path)
    except Exception as e:
        logger.warning(f"ONNX slim optimization failed: {e}. Continuing with original model.")
        return model, False
    return slimmed_model, transformed

[Performance - Low] Redundant environment variable check in _get_invalid_idx_value

The function _get_invalid_idx_value() in cache_utils.py checks the environment variable QEFF_USE_ONNX_FUNCTIONS on every call. Since this is called frequently during cache operations, consider caching this value at module level to avoid repeated environment lookups.

Fixed Code Snippet:

# At module level
_USE_ONNX_FUNCTIONS = os.environ.get("QEFF_USE_ONNX_FUNCTIONS", "false").lower() == "true"

def _get_invalid_idx_value():
    """
    Get the appropriate invalid index value for CtxGather operations.
    """
    if torch.onnx.is_in_onnx_export():
        if _USE_ONNX_FUNCTIONS:
            return 0
        else:
            return torch.iinfo(torch.int32).max
    else:
        return 0

** Version 1.3.6**

Help us improve!

How useful was this code feedback? Not very useful 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ Very useful

How much time did it save you (in hours)? 0 | <1 | 1-3 | >4

Let us know your detailed feedback

Source: https://github.qualcomm.com/qranium/efficient-transformers/pull/43#issuecomment-1009227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants