Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
924326b
[Qwen2_5_vl] - Onboarding Qwen2_5_vl model in QEfficient (#560)
mohiso22 Oct 16, 2025
8c96a4d
Olmo2 Bug fix (#589)
qcdipankar Oct 17, 2025
7ad6365
updated notebooks (#543)
smedhe Oct 23, 2025
c9e417a
Qwen2.5_VL Example Script Update (#598)
mohiso22 Oct 31, 2025
d27fe98
Extend On-Device Sampling Support to more Causal Language Models (#553)
quic-sanising Nov 1, 2025
120698f
[QEff. Finetune]: Added fix for pad_to_max_length in tokenization. (#…
quic-meetkuma Nov 3, 2025
e8cc0f7
Enable CB for vlms with multiple images and multiple prompts (#583)
quic-mamta Nov 4, 2025
848dc6e
Modeling fix (#605)
mohiso22 Nov 4, 2025
171af20
New PR for GPTOSS decode-only model (#603)
ochougul Nov 5, 2025
9b3164e
Update Qeff Documentation to indicate vLLM Support in Validated Model…
quic-vargupt Nov 5, 2025
e6ac655
Adding support to load checkpoints from epoch (#606)
tchawada Nov 5, 2025
1d3eebf
"[QEff. Finetune]: Support for resuming checkpoints using Epoch" (#614)
tchawada Nov 11, 2025
9d53571
[Upgradation]: onnx opset version updated from 13 to 17 (#587)
abukhoy Nov 13, 2025
435895f
[Docs]: Readme Fix (#617)
abukhoy Nov 14, 2025
c7494ce
Adding Compute-Context-Length (CCL) (#576)
vjanfaza Nov 14, 2025
ed6bb1f
Fix for <end_of_turn> token during inference (#622)
quic-akuruvil Nov 19, 2025
a607dff
Add ONNX Sub Functions Export Feature for AutoModelForCausalLM (#621)
abhishek-singh591 Nov 19, 2025
8353831
Example scripts revamp (#615)
quic-rishinr Nov 20, 2025
aab6fac
Example walk through on how to onboard a Causal LM on Qefficient Tran…
quic-dhirajku Nov 20, 2025
9a3c49a
[QEff. Finetune]: Added initial folder structure and files for HF tra…
quic-meetkuma Nov 21, 2025
bde0cda
Updated to mermaid diagram (#631)
quic-rishinr Nov 21, 2025
75065e9
Added Decoder layer class in Qeff for granite (#628)
abhishek-singh591 Nov 21, 2025
8fc86d6
[CI-FIX]: qnn and vllm downstream jobs are disabled (#639)
abukhoy Nov 26, 2025
037e0c4
Installation guide for installing release branches (#637)
quic-rishinr Nov 26, 2025
a380c7a
Added Continuous Batching (CB) Support for Subfunctions (#642)
abhishek-singh591 Nov 26, 2025
3dffc65
[QEff. Finetune]: Added logger and its test cases. (#644)
quic-meetkuma Nov 28, 2025
31fe21f
[QEff. Finetune]: Added component registry and factory functionality.…
quic-meetkuma Nov 28, 2025
5e9d760
[QEff. Finetune]: Adding optimizer registry and its test cases (#649)
tchawada Dec 5, 2025
92e4436
[QEff. Finetune]: Added Base dataset class and SFT dataset classes al…
quic-dhirajku Dec 5, 2025
57cb01a
[QEff.finetune] WIP - Adding TrainerClass and tests for init checks.
quic-dhirajku Dec 8, 2025
cc62a78
Minor changes to the trainer class registration was done.
quic-dhirajku Dec 9, 2025
6138659
Addressed comments. Added the modification to test on custom num_laye…
quic-dhirajku Dec 18, 2025
cafb00c
Rebased to update the branch with mainline.
quic-dhirajku Jan 2, 2026
ec845cb
[QEff.Finetuning]CI enablement for Fine-Tuning (#629)
quic-akuruvil Dec 1, 2025
22390a6
[BUGFIX] Patch for issues with export via replicate_kv_heads script C…
quic-dhirajku Dec 2, 2025
5a8bd0b
Add custom op examples and documentation (#638)
quic-rishinr Dec 2, 2025
7c50f75
Added torchvision (#650)
quic-rishinr Dec 3, 2025
19d8498
removed platform sdk dependency (#609)
smedhe Dec 4, 2025
be97622
Added memory and time optimization for onnx transforms (#640)
abhishek-singh591 Dec 4, 2025
f7b33b3
Adding support for BlockedKV attention in CasualLM models (#618)
vaibverm Dec 4, 2025
66c9f9b
Continuous Batching for VLMs (#610)
asmigosw Dec 5, 2025
9e3546b
[Jenkins]: jenkins Timeout increased (#654)
abukhoy Dec 8, 2025
71f0a64
Adding ccl_enabled flag during model loading and passing CCL lists du…
vjanfaza Dec 8, 2025
1a8ec9d
Diffusers support (#604)
quic-amitraj Dec 22, 2025
2f8c7af
Subfunction fixes for KV cache transform (#655)
abhishek-singh591 Dec 10, 2025
28a4b66
[Test]: subfunction test moved to qaic Test Stage (#665)
abukhoy Dec 11, 2025
d6070aa
Prefill+decode gpt oss (#608)
ochougul Dec 14, 2025
418ac4f
Updated tests of onnx_sunfunction (#668)
quic-amitraj Dec 22, 2025
0af4ebd
Extend on-device sampling support for dual QPC VLMs (#597)
quic-xiyushi Dec 17, 2025
86efef6
test: Verify ONNX subfunction usage through model inspection instead …
vbaddi Dec 17, 2025
358842e
HOTFIX: Testing the Finetune base CI failure by installing pytorch2.9…
quic-dhirajku Dec 18, 2025
8ab31ae
Add Support for Guided Decoding to On Device Sampling (#624)
quic-sanising Dec 18, 2025
56e8b10
Adding memory profiling (#674)
quic-rishinr Dec 19, 2025
4d504c8
HOTFIX: Modified replicate_kv_heads.py script to not run ONNXRT infer…
quic-dhirajku Dec 19, 2025
bfc439d
Add automatic CCL list generation for prefill and decode when user do…
vjanfaza Dec 19, 2025
c79a04d
Adding WAN Lightning support (#669)
tv-karthikeya Dec 20, 2025
1233bda
Added blocking support to flux (#679)
quic-amitraj Dec 22, 2025
812c236
fixed new NPI for changed ONNX names (#684)
ochougul Dec 22, 2025
4af6ffc
Updated compile command for subfunction (#681)
quic-amitraj Dec 22, 2025
792063f
Disagg hotfix gpt oss (#689)
ochougul Dec 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 148 additions & 10 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,160 @@
## Contributing to PROJECT

Hi there!
Were thrilled that youd like to contribute to this project.
We're thrilled that you'd like to contribute to this project.
Your help is essential for keeping this project great and for making it better.

## Branching Strategy

In general, contributors should develop on branches based off of `main` and pull requests should be made against `main`.
## Submitting Your Contribution

## Submitting a pull request
Follow these steps to submit your example to the QEfficient repository:

1. Please read our [code of conduct](CODE-OF-CONDUCT.md) and [license](LICENSE).
1. Fork and clone the repository.
1. Create a new branch based on `main`: `git checkout -b <my-branch-name> main`.
1. Make your changes, add tests, and make sure the tests still pass.
1. Commit your changes using the [DCO](http://developercertificate.org/). You can attest to the DCO by commiting with the **-s** or **--signoff** options or manually adding the "Signed-off-by".
1. Push to your fork and submit a pull request from your branch to `main`.
1. Pat yourself on the back and wait for your pull request to be reviewed.

### 1. Fork and Clone the Repository

First, fork the repository to your GitHub account, then clone your fork:

```bash
# Fork the repository on GitHub (click the "Fork" button)
# Then clone your fork
git clone git@github.com:YOUR_USERNAME/efficient-transformers.git
cd efficient-transformers

# Add upstream remote to keep your fork in sync
git remote add upstream git@github.com:quic/efficient-transformers.git
```

### 2. Create a Feature Branch

Create a descriptive branch for your changes:

```bash
# Update your main branch
git checkout main
git pull upstream main

# Create a new branch
git checkout -b <branch-name>
```

### 3. Make Your Changes

When making changes to the codebase:

- **Follow Existing Design Patterns**
- Review similar implementations before creating new code
- Maintain consistency with the project's architecture and coding style
- Reuse existing utilities and base classes where applicable

- **Onboarding New Models**
- For adding new model support, refer to the comprehensive guide: `examples/onboarding_guide/causallm/`
- Follow the step-by-step process with code examples provided

- **Testing is Mandatory**
- Add tests for all new features in the appropriate `tests/` subdirectory
- Run tests locally before pushing: `pytest tests/path/to/your/test.py -v`
- For model additions, verify all 4 pipeline stages (PyTorch HF → KV → ORT → AI 100) and make sure tokens are matching with refernce PyTorch HF

- **Documentation**
- **For New Features/Flags:**
- Document usage in `docs/source/<appropriate-page>` with feature description and usage examples
- Ensure documentation is clear enough for others to understand and use the feature
- **For New Models:**
- Test with basic inference scripts in the `examples/` folder
- If specific changes are needed, create a dedicated example file
- Update `docs/source/validate.md` with the model's HuggingFace card name and relevant details


- **Code Quality Checks**
- Pre-commit hooks, DCO sign-off, and CI checks are covered in the following steps
- Ensure you complete steps 4-8 before finalizing your PR

### 4. Run Pre-commit Checks

Before committing, ensure your code passes all quality checks:

```bash
# Install pre-commit and ruff if not already installed
pip install pre-commit
pip install ruff

# Run pre-commit on your changed files
pre-commit run --files path/to/your/file1.py path/to/your/file2.py

# Run Ruff check
ruff check
```

**Important:** If pre-commit reports any failures:
- Some issues will be auto-fixed (formatting, trailing whitespace, etc.)
- For issues that aren't auto-fixed, manually correct them
- Re-run `pre-commit run --files <files>` or `ruff check` until all checks pass

### 5. Commit with Sign-off (DCO)

All commits must be signed off to comply with the Developer Certificate of Origin (DCO):

```bash
# Stage your changes
git add examples/your_domain/your_example.py
git add examples/your_domain/README.md

# Commit with sign-off
git commit -s --author "Your Name <your.email@example.com>" -m "Add [model-name] support

- Implements inference for [model-name]
- Includes documentation and usage examples
- Tested with [specific configurations]"
```

**Commit Message Guidelines:**
- Use a clear, descriptive title
- Add a blank line, then detailed description if needed
- Always include the `-s` flag for DCO sign-off

### 6. Push to Your Fork

Push your branch to your forked repository:

```bash
git push origin <branch-name>
```

### 7. Create a Pull Request

1. Go to your fork on GitHub
2. Click "Compare & pull request" for your branch
3. Fill out the PR template with:
- **Title:** Clear, descriptive title (e.g., "Add Llama-3.2-Vision Support" or "Fix memory leak in KV cache")
- **Description:**
- What changes were made and why
- What problem it solves or feature it adds
- Any special considerations or breaking changes
- Links to relevant documentation, issues, or model cards (if applicable)
- **Testing:** Describe how you tested your changes

### 8. Ensure CI Checks Pass

After creating the PR, verify that all automated checks pass:

- ✅ **DCO Check:** Ensures all commits are signed off
- ✅ **Lint Check:** Code style and formatting validation
- ✅ **Tests:** Automated test suite (if applicable)

If any checks fail:
1. Review the error messages in the PR
2. Make necessary fixes in your local branch
3. Commit and push the fixes (with sign-off)
4. The PR will automatically update and re-run checks

### 9. Address Review Feedback

Maintainers will review your PR and may request changes:
- Make requested changes in your local branch
- Commit with sign-off and push to update the PR
- Respond to comments to facilitate discussion


Here are a few things you can do that will increase the likelihood of your pull request to be accepted:

Expand Down
88 changes: 47 additions & 41 deletions QEfficient/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,23 +6,64 @@
# -----------------------------------------------------------------------------

import os
import warnings

import QEfficient.utils.model_registery # noqa: F401
from QEfficient.utils import custom_format_warning
from QEfficient.utils.logging_utils import logger

# ----------------------------------------------------------------------------- #
# For faster downloads via hf_transfer
# This code is put above import statements as this needs to be executed before
# hf_transfer is imported (will happen on line 15 via leading imports)
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
# DO NOT ADD ANY CODE ABOVE THIS LINE
# Please contact maintainers if you must edit this file above this line.
# ----------------------------------------------------------------------------- #
# Placeholder for all non-transformer models registered in QEfficient
import warnings # noqa: I001

import QEfficient.utils.model_registery # noqa: F401
from QEfficient.base import (
QEFFAutoModel,
QEFFAutoModelForCausalLM,
QEFFAutoModelForCTC,
QEFFAutoModelForImageTextToText,
QEFFAutoModelForSpeechSeq2Seq,
QEFFCommonLoader,
)
from QEfficient.compile.compile_helper import compile
from QEfficient.diffusers.pipelines.flux.pipeline_flux import QEffFluxPipeline
from QEfficient.diffusers.pipelines.wan.pipeline_wan import QEffWanPipeline
from QEfficient.exporter.export_hf_to_cloud_ai_100 import qualcomm_efficient_converter
from QEfficient.generation.text_generation_inference import cloud_ai_100_exec_kv
from QEfficient.peft import QEffAutoPeftModelForCausalLM
from QEfficient.transformers.transform import transform
from QEfficient.utils import custom_format_warning
from QEfficient.utils.logging_utils import logger

# custom warning for the better logging experience
warnings.formatwarning = custom_format_warning


# Users can use QEfficient.export for exporting models to ONNX
export = qualcomm_efficient_converter
__all__ = [
"transform",
"export",
"compile",
"cloud_ai_100_exec_kv",
"QEFFAutoModel",
"QEFFAutoModelForCausalLM",
"QEFFAutoModelForCTC",
"QEffAutoPeftModelForCausalLM",
"QEFFAutoModelForImageTextToText",
"QEFFAutoModelForSpeechSeq2Seq",
"QEFFCommonLoader",
"QEffFluxPipeline",
"QEffWanPipeline",
]


# Conditionally import QAIC-related modules if the SDK is installed
__version__ = "0.0.1.dev0"


def check_qaic_sdk():
"""Check if QAIC SDK is installed"""
try:
Expand All @@ -37,40 +78,5 @@ def check_qaic_sdk():
return False


# Conditionally import QAIC-related modules if the SDK is installed
__version__ = "0.0.1.dev0"

if check_qaic_sdk():
from QEfficient.base import (
QEFFAutoModel,
QEFFAutoModelForCausalLM,
QEFFAutoModelForCTC,
QEFFAutoModelForImageTextToText,
QEFFAutoModelForSpeechSeq2Seq,
QEFFCommonLoader,
)
from QEfficient.compile.compile_helper import compile
from QEfficient.exporter.export_hf_to_cloud_ai_100 import qualcomm_efficient_converter
from QEfficient.generation.text_generation_inference import cloud_ai_100_exec_kv
from QEfficient.peft import QEffAutoPeftModelForCausalLM
from QEfficient.transformers.transform import transform

# Users can use QEfficient.export for exporting models to ONNX
export = qualcomm_efficient_converter

__all__ = [
"transform",
"export",
"compile",
"cloud_ai_100_exec_kv",
"QEFFAutoModel",
"QEFFAutoModelForCausalLM",
"QEFFAutoModelForCTC",
"QEffAutoPeftModelForCausalLM",
"QEFFAutoModelForImageTextToText",
"QEFFAutoModelForSpeechSeq2Seq",
"QEFFCommonLoader",
]

else:
if not check_qaic_sdk():
logger.warning("QAIC SDK is not installed, eager mode features won't be available!")
Loading