From d25c35a01144ac5cda8c7f8c6984f48a6c899efc Mon Sep 17 00:00:00 2001 From: Erik Lundell Date: Wed, 17 Sep 2025 08:26:00 +0200 Subject: [PATCH 001/395] Arm backend: Split Arm tutorial into ethosu and vgf (#14299) Align with minimal examples with regards to content and code. Signed-off-by: Erik Lundell --- docs/source/index.md | 4 +- docs/source/tutorial-arm-ethos-u.md | 220 +++++++++++++ docs/source/tutorial-arm-vgf.md | 220 +++++++++++++ docs/source/tutorial-arm.md | 467 ---------------------------- 4 files changed, 443 insertions(+), 468 deletions(-) create mode 100644 docs/source/tutorial-arm-ethos-u.md create mode 100644 docs/source/tutorial-arm-vgf.md delete mode 100644 docs/source/tutorial-arm.md diff --git a/docs/source/index.md b/docs/source/index.md index 8afe4e85d78..1c2fdbcc110 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -149,7 +149,8 @@ using-executorch-faqs Building an ExecuTorch Android Demo App Building an ExecuTorch iOS Demo App -tutorial-arm.md +tutorial-arm-ethos-u +tutorial-arm-vgf ``` ```{toctree} @@ -164,6 +165,7 @@ backends-coreml backends-mps backends-vulkan backends-arm-ethos-u +backends-arm-vgf backends-qualcomm backends-mediatek backends-cadence diff --git a/docs/source/tutorial-arm-ethos-u.md b/docs/source/tutorial-arm-ethos-u.md new file mode 100644 index 00000000000..b856e7ade75 --- /dev/null +++ b/docs/source/tutorial-arm-ethos-u.md @@ -0,0 +1,220 @@ +# Arm Ethos-U NPU Backend Tutorial + + +::::{grid} 2 + +:::{grid-item-card} Tutorials we recommend you complete before this: +:class-card: card-prerequisites +* [Introduction to ExecuTorch](intro-how-it-works.md) +* [Getting Started](getting-started.md) +* [Building ExecuTorch with CMake](using-executorch-building-from-source.md) +::: + +:::{grid-item-card} What you will learn in this tutorial: +:class-card: card-prerequisites +In this tutorial you will learn how to export a simple PyTorch model for the ExecuTorch Ethos-U backend. +::: + +:::: + +```{warning} +This delegate is under active development, to get best results please use a recent version. +The TOSA and Ethos-U backend support is reasonably mature and used in production by some users. +You may encounter some rough edges and features which may be documented or planned but not implemented, please refer to the in-tree documentation for the latest status of features. +``` + +```{tip} +If you are already familiar with this delegate, you may want to jump directly to the examples: +* [Examples in the ExecuTorch repository](https://github.com/pytorch/executorch/tree/main/examples/arm) +* [A commandline compiler for example models](https://github.com/pytorch/executorch/blob/main/examples/arm/aot_arm_compiler.py) +``` + +This tutorial serves as an introduction to using ExecuTorch to deploy PyTorch models on Arm® Ethos™-U targets. It is based on `ethos_u_minimal_example.ipynb`, provided in Arm’s examples folder. + +## Prerequisites + +### Hardware + +To successfully complete this tutorial, you will need a Linux machine with aarch64 or x86_64 processor architecture, or a macOS™ machine with Apple® Silicon. + +To enable development without a specific development board, we will be using a [Fixed Virtual Platform (FVP)](https://www.arm.com/products/development-tools/simulation/fixed-virtual-platforms), simulating [Arm® Corstone™-300](https://developer.arm.com/Processors/Corstone-300)(cs300) and [Arm® Corstone™-300](https://developer.arm.com/Processors/Corstone-320)(cs320)systems. Think of it as virtual hardware. + +### Software + +First, you will need to install ExecuTorch. Please follow the recommended tutorials to set up a working ExecuTorch development environment. + +In addition to this, you need to install a number of SDK dependencies for generating Ethos-U command streams. Scripts to automate this are available in the main [ExecuTorch repository](https://github.com/pytorch/executorch/tree/main/examples/arm/). +To install Ethos-U dependencies, run +```bash +./examples/arm/setup.sh --i-agree-to-the-contained-eula +``` +This will install: +- [TOSA Serialization Library](https://www.mlplatform.org/tosa/software.html) for serializing the Exir IR graph into TOSA IR. +- [Ethos-U Vela graph compiler](https://pypi.org/project/ethos-u-vela/) for compiling TOSA flatbuffers into a Ethos-U command stream. +- [Arm GNU Toolchain](https://developer.arm.com/Tools%20and%20Software/GNU%20Toolchain) for cross compilation. +- [Corstone SSE-300 FVP](https://developer.arm.com/documentation/100966/1128/Arm--Corstone-SSE-300-FVP) for testing on Ethos-U55 reference design. +- [Corstone SSE-320 FVP](https://developer.arm.com/documentation/109760/0000/SSE-320-FVP) for testing on Ethos-U85 reference design. + +## Set Up the Developer Environment + +The setup.sh script generates a setup_path.sh script that you need to source whenever you restart your shell. Run: + +```{bash} +source examples/arm/ethos-u-scratch/setup_path.sh +``` + +As a simple check that your environment is set up correctly, run `which FVP_Corstone_SSE-320` and make sure that the executable is located where you expect, in the `examples/arm` tree. + +## Build + +### Ahead-of-Time (AOT) components + +The ExecuTorch Ahead-of-Time (AOT) pipeline takes a PyTorch Model (a `torch.nn.Module`) and produces a `.pte` binary file, which is then consumed by the ExecuTorch Runtime. This [document](getting-started-architecture.md) goes in much more depth about the ExecuTorch software stack for both AoT as well as Runtime. + +The example below shows how to quantize a model consisting of a single addition, and export it it through the AOT flow using the EthosU backend. For more details, see `examples/arm/ethos_u_minimal_example.ipynb`. + +```python +import torch + +class Add(torch.nn.Module): + def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor: + return x + y + +example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1)) + +model = Add() +model = model.eval() +exported_program = torch.export.export(model, example_inputs) +graph_module = exported_program.module() + + +from executorch.backends.arm.ethosu import EthosUCompileSpec +from executorch.backends.arm.quantizer import ( + EthosUQuantizer, + get_symmetric_quantization_config, +) +from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e + +# Create a compilation spec describing the target for configuring the quantizer +# Some args are used by the Arm Vela graph compiler later in the example. Refer to Arm Vela documentation for an +# explanation of its flags: https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/blob/main/OPTIONS.md +compile_spec = EthosUCompileSpec( + target="ethos-u55-128", + system_config="Ethos_U55_High_End_Embedded", + memory_mode="Shared_Sram", + extra_flags=["--output-format=raw", "--debug-force-regor"] + ) + +# Create and configure quantizer to use a symmetric quantization config globally on all nodes +quantizer = EthosUQuantizer(compile_spec) +operator_config = get_symmetric_quantization_config() +quantizer.set_global(operator_config) + +# Post training quantization +quantized_graph_module = prepare_pt2e(graph_module, quantizer) +quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input +quantized_graph_module = convert_pt2e(quantized_graph_module) + + +# Create a new exported program using the quantized_graph_module +quantized_exported_program = torch.export.export(quantized_graph_module, example_inputs) +from executorch.backends.arm.ethosu import EthosUPartitioner +from executorch.exir import ( + EdgeCompileConfig, + ExecutorchBackendConfig, + to_edge_transform_and_lower, +) +from executorch.extension.export_util.utils import save_pte_program + +# Create partitioner from compile spec +partitioner = EthosUPartitioner(compile_spec) + +# Lower the exported program to the Ethos-U backend +edge_program_manager = to_edge_transform_and_lower( + quantized_exported_program, + partitioner=[partitioner], + compile_config=EdgeCompileConfig( + _check_ir_validity=False, + ), + ) + +# Convert edge program to executorch +executorch_program_manager = edge_program_manager.to_executorch( + config=ExecutorchBackendConfig(extract_delegate_segments=False) + ) + + +# Save pte file +save_pte_program(executorch_program_manager, "ethos_u_minimal_example.pte") +``` + + +```{tip} +For a quick start, you can use the script `examples/arm/aot_arm_compiler.py` to produce the pte. +To produce a pte file equivalent to the one above, run +`python -m examples.arm.aot_arm_compiler --model_name=add --delegate --quantize --output=ethos_u_minimal_example.pte` +``` + +### Runtime: + +After the AOT compilation flow is done, the runtime can be cross compiled and linked to the produced `.pte`-file using the Arm cross-compilation toolchain. This is done in two steps: + +First, build and install the ExecuTorch libraries and EthosUDelegate: +``` +# In ExecuTorch top-level, with sourced setup_path.sh +cmake -DCMAKE_BUILD_TYPE=Release --preset arm-baremetal -B cmake-out-arm . +cmake --build cmake-out-arm --target install -j$(nproc) +``` +Second, build and link the `arm_executor_runner` and generate kernel bindings for any non delegated ops. This is the actual program that will run on target. + +``` +# In ExecuTorch top-level, with sourced setup_path.sh +cmake -DCMAKE_TOOLCHAIN_FILE=`pwd`/examples/arm/ethos-u-setup/arm-none-eabi-gcc.cmake \ + -DCMAKE_BUILD_TYPE=Release \ + -DET_PTE_FILE_PATH=ethos_u_minimal_example.pte \ + -DTARGET_CPU=cortex-m55 \ + -DETHOSU_TARGET_NPU_CONFIG=ethos-u55-128 \ + -DMEMORY_MODE=Shared_Sram \ + -DSYSTEM_CONFIG=Ethos_U55_High_End_Embedded \ + -Bethos_u_minimal_example \ + examples/arm/executor_runner +cmake --build ethos_u_minimal_example -j$(nproc) -- arm_executor_runner +``` + +```{tip} +For a quick start, you can use the script `backends/arm/scripts/build_executor_runner.sh` to build the runner. +To build a runner equivalent to the one above, run +`./backends/arm/scripts/build_executor_runner.sh --pte=ethos_u_minimal_example.pte` +```` + +The block diagram below shows, at the high level, how the various build artifacts are generated and are linked together to generate the final bare-metal executable. + +![](arm-delegate-runtime-build.svg) + + + +## Running on Corstone FVP Platforms + +Finally, use the `backends/arm/scripts/run_fvp.sh` utility script to run the .elf-file on simulated Arm hardware. +``` +backends/arm/scripts/run_fvp.sh --elf=$(find ethos_u_minimal_example -name arm_executor_runner) --target=ethos-u55-128 +``` +The example application is by default built with an input of ones, so the expected result of the quantized addition should be close to 2. + + +## Takeaways + +In this tutorial you have learned how to use ExecuTorch to export a PyTorch model to an executable that can run on an embedded target, and then run that executable on simulated hardware. +To learn more, check out these learning paths: + +https://learn.arm.com/learning-paths/embedded-and-microcontrollers/rpi-llama3/ +https://learn.arm.com/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/ + +## FAQs + +If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new). + + +``` +Arm is a registered trademark of Arm Limited (or its subsidiaries or affiliates). +``` \ No newline at end of file diff --git a/docs/source/tutorial-arm-vgf.md b/docs/source/tutorial-arm-vgf.md new file mode 100644 index 00000000000..5c723053e63 --- /dev/null +++ b/docs/source/tutorial-arm-vgf.md @@ -0,0 +1,220 @@ +# Arm VGF Backend Tutorial + + +::::{grid} 2 + +:::{grid-item-card} Tutorials we recommend you complete before this: +:class-card: card-prerequisites +* [Introduction to ExecuTorch](intro-how-it-works.md) +* [Getting Started](getting-started.md) +* [Building ExecuTorch with CMake](using-executorch-building-from-source.md) +::: + +:::{grid-item-card} What you will learn in this tutorial: +:class-card: card-prerequisites +In this tutorial you will learn how to export a simple PyTorch model for the ExecuTorch VGF backend. +::: + +:::: + +```{warning} +This delegate is under active development, to get best results please use a recent version. +The VGF backend support is in early development and you may encounter issues. +You may encounter some rough edges and features which may be documented or planned but not implemented, please refer to the in-tree documentation for the latest status of features. +``` + +```{tip} +If you are already familiar with this delegate, you may want to jump directly to the examples: +* [Examples in the ExecuTorch repository](https://github.com/pytorch/executorch/tree/main/examples/arm) +* [A commandline compiler for example models](https://github.com/pytorch/executorch/blob/main/examples/arm/aot_arm_compiler.py) +``` + +This tutorial serves as an introduction to using ExecuTorch to deploy PyTorch models on VGF targets. The tutorial is based on `vgf_minimal_example.ipyb`, provided in Arm®'s example folder. + +## Prerequisites + +### Hardware + +To successfully complete this tutorial, you will need a Linux machine with aarch64 or x86_64 processor architecture, or a macOS™ machine with Apple® Silicon. + +To enable development without a specific development board, we will be using the [ML SDK for Vulkan®](https://github.com/arm/ai-ml-sdk-for-vulkan/) to emulate the program consumer. + +### Software + +First, you will need to install ExecuTorch. Please follow the recommended tutorials if you haven't already, to set up a working ExecuTorch development environment. For the VGF backend it's recommended you [install from source](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html), or from a [nightly](https://download.pytorch.org/whl/nightly/executorch/). + +Additionally, you need to install a number of SDK dependencies for generating VGF files. For glslc, prefer installing it via your package manager. If this is not possible, and for other dependencies, there are scripts to automate installation available in the main [ExecuTorch repository](https://github.com/pytorch/executorch/tree/main/examples/arm/). glscl will then be installed via the Vulkan SDK. + +To install VGF dependencies, run +```bash +./examples/arm/setup.sh --i-agree-to-the-contained-eula --disable-ethos-u-deps --enable-mlsdk-deps +``` +This will install: +- [TOSA Serialization Library](https://www.mlplatform.org/tosa/software.html) for serializing the Exir IR graph into TOSA IR. +- [ML SDK Model Converter](https://github.com/arm/ai-ml-sdk-model-converter) for converting TOSA flatbuffers to VGF files. +- [Vulkan API (If needed)](https://www.vulkan.org) Should be set up locally for GPU execution support. +- [ML Emulation Layer for Vulkan](https://github.com/arm/ai-ml-emulation-layer-for-vulkan) for testing on Vulkan API. + + +## Set Up the Developer Environment + +The `setup.sh` script has generated a `setup_path.sh` script that you need to source whenever you restart your shell. Do this by running + +`source examples/arm/ethos-u-scratch/setup_path.sh` + +As a simple check that your environment is set up correctly, run + +```bash +which model-converter +``` +Make sure the executable is located where you expect, in the `examples/arm` tree. + +## Build + +### Ahead-of-Time (AOT) components + +The ExecuTorch Ahead-of-Time (AOT) pipeline takes a PyTorch Model (a `torch.nn.Module`) and produces a `.pte` binary file, which is then typically consumed by the ExecuTorch Runtime. This [document](getting-started-architecture.md) goes in much more depth about the ExecuTorch software stack for both AoT as well as Runtime. + +The example below shows how to quantize a model consisting of a single addition, and export it it through the AOT flow using the VGF backend. For more details, se `examples/arm/vgf_minimal_example.ipynb`. + +```python +import torch + +class Add(torch.nn.Module): + def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor: + return x + y + +example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1)) + +model = Add() +model = model.eval() +exported_program = torch.export.export_for_training(model, example_inputs) +graph_module = exported_program.module() + + +from executorch.backends.arm.vgf import VgfCompileSpec +from executorch.backends.arm.quantizer import ( + VgfQuantizer, + get_symmetric_quantization_config, +) +from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e + +# Create a compilation spec describing the target for configuring the quantizer +compile_spec = VgfCompileSpec("TOSA-1.0+INT") + +# Create and configure quantizer to use a symmetric quantization config globally on all nodes +quantizer = VgfQuantizer(compile_spec) +operator_config = get_symmetric_quantization_config(is_per_channel=False) +quantizer.set_global(operator_config) + +# Post training quantization +quantized_graph_module = prepare_pt2e(graph_module, quantizer) +quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input +quantized_graph_module = convert_pt2e(quantized_graph_module) + + +# Create a new exported program using the quantized_graph_module +quantized_exported_program = torch.export.export(quantized_graph_module, example_inputs) +import os +from executorch.backends.arm.vgf import VgfPartitioner +from executorch.exir import ( + EdgeCompileConfig, + ExecutorchBackendConfig, + to_edge_transform_and_lower, +) +from executorch.extension.export_util.utils import save_pte_program + +# Create partitioner from compile spec +partitioner = VgfPartitioner(compile_spec) + +# Lower the exported program to the VGF backend +edge_program_manager = to_edge_transform_and_lower( + quantized_exported_program, + partitioner=[partitioner], + compile_config=EdgeCompileConfig( + _check_ir_validity=False, + ), +) + +# Convert edge program to executorch +executorch_program_manager = edge_program_manager.to_executorch( + config=ExecutorchBackendConfig(extract_delegate_segments=False) +) + + +# Save pte file +cwd_dir = os.getcwd() +pte_base_name = "simple_example" +pte_name = pte_base_name + ".pte" +pte_path = os.path.join(cwd_dir, pte_name) +save_pte_program(executorch_program_manager, pte_name) +assert os.path.exists(pte_path), "Build failed; no .pte-file found" +``` + + +```{tip} +For a quick start, you can use the script `examples/arm/aot_arm_compiler.py` to produce the pte. +To produce a pte file equivalent to the one above, run +`python -m examples.arm.aot_arm_compiler --model_name=add --delegate --quantize --output=simple_example.pte --target=vgf` +``` + +### Runtime: + +## Build executor runtime + +After the AOT compilation flow is done, we can build the executor runner target. For this tutorial, the default runner can be used. Build it with the following configuration: + +```bash +# In ExecuTorch top-level, with sourced setup_path.sh +cmake \ + -DCMAKE_INSTALL_PREFIX=cmake-out \ + -DCMAKE_BUILD_TYPE=Debug \ + -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \ + -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ + -DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON \ + -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \ + -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \ + -DEXECUTORCH_BUILD_XNNPACK=OFF \ + -DEXECUTORCH_BUILD_VULKAN=ON \ + -DEXECUTORCH_BUILD_VGF=ON \ + -DEXECUTORCH_ENABLE_LOGGING=ON \ + -DPYTHON_EXECUTABLE=python \ + -Bcmake-out . + +cmake --build cmake-out --target executor_runner` +``` + + +The block diagram below demonstrates, at the high level, how the various build artifacts are generated and are linked together to generate the final bare-metal executable. + +![](arm-delegate-runtime-build.svg) + + +## Deploying and running on device + +Since we are using the Vulkan emulation layer, we can run the the executor runner with the VGF delegate on the host machine: + +```bash +./cmake-out/executor_runner -model_path simple_example.pte +``` + +The example application is by default built with an input of ones, so the expected result of the quantized addition should be close to 2. + +## Takeaways + +In this tutorial you have learned how to use ExecuTorch to export a PyTorch model to an executable that can run on an embedded target, and then run that executable on simulated hardware. + + +## FAQs + +*glslc is not found when configuring the executor runner*. + +The Vulkan sdk is likely not in your path, check whether setup_path.sh contains something like +`export PATH=$(pwd)/examples/arm/ethos-u-scratch/vulkan_sdk/1.4.321.1/x86_64/bin:$PATH`. +If not, add it and source the file. + +If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new). + +``` +Arm is a registered trademark of Arm Limited (or its subsidiaries or affiliates). +``` \ No newline at end of file diff --git a/docs/source/tutorial-arm.md b/docs/source/tutorial-arm.md deleted file mode 100644 index 0692b631154..00000000000 --- a/docs/source/tutorial-arm.md +++ /dev/null @@ -1,467 +0,0 @@ -# Arm® Backend Tutorial - - -::::{grid} 2 - -:::{grid-item-card} Tutorials we recommend you complete before this: -:class-card: card-prerequisites -* [Introduction to ExecuTorch](intro-how-it-works.md) -* [Getting Started](getting-started.md) -* [Building ExecuTorch with CMake](using-executorch-building-from-source.md) -::: - -:::{grid-item-card} What you will learn in this tutorial: -:class-card: card-prerequisites -In this tutorial you will learn how to export a simple PyTorch model for ExecuTorch Arm backends. -::: - -:::: - -```{warning} -This delegate is under active development, to get best results please use a recent version. -The TOSA and Ethos(tm) backend support is reasonably mature and used in production by some users. -The VGF backend support is in early development and you may encounter issues. -You may encounter some rough edges and features which may be documented or planned but not implemented, please refer to the in-tree documentation for the latest status of features. -``` - -```{tip} -If you are already familiar with this delegate, you may want to jump directly to the examples: -* [Examples in the ExecuTorch repository](https://github.com/pytorch/executorch/tree/main/examples/arm) -* [Compilation for Ethos-U](https://github.com/pytorch/executorch/blob/main/examples/arm/ethos_u_minimal_example.ipynb) -* [A commandline compiler for example models](https://github.com/pytorch/executorch/blob/main/examples/arm/aot_arm_compiler.py) -``` - -## Prerequisites - -Let's make sure you have everything you need before you get started. - -### Hardware - -To successfully complete this tutorial, you will need a Linux or MacOS host machine with Arm aarch64 or x86_64 processor architecture. - -The target device will be an emulated platform to enable development without a specific development board. This tutorial has guidance for both Ethos-U targets and VGF via the ML SDK for Vulkan®. - -For Ethos-U and Cortex-M, We will be using a [Fixed Virtual Platform (FVP)](https://www.arm.com/products/development-tools/simulation/fixed-virtual-platforms), simulating [Corstone-300](https://developer.arm.com/Processors/Corstone-300)(cs300) and [Corstone-320](https://developer.arm.com/Processors/Corstone-320)(cs320)systems. Since we will be using the FVP (think of it as virtual hardware), we won't be requiring any real embedded hardware for this tutorial. - -For VGF we will be using the [ML SDK for Vulkan(R)](https://github.com/arm/ai-ml-sdk-for-vulkan/)) to emulate the program consumer. - -### Software - -First, you will need to install ExecuTorch. Please follow the recommended tutorials if you haven't already, to set up a working ExecuTorch development environment. For the VGF backend it's recommended you [install from source](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html), or from a [nightly](https://download.pytorch.org/whl/nightly/executorch/). - -In addition to this, you need to install a number of SDK dependencies for generating Ethos-U command streams or VGF files. There are scripts which automate this, which are found in the main [ExecuTorch repository](https://github.com/pytorch/executorch/tree/main/examples/arm/). - -## Set Up the Developer Environment - -In this section, we will do a one-time setup of the platform support files needed to run ExecuTorch programs in this tutorial. It is recommended to run the script in a conda or venv environment. - -With a checkout of the ExecuTorch repository, we will use the `examples/arm/setup.sh` script to pull each item in an automated fashion. - -For Ethos-U run: -```bash -./examples/arm/setup.sh --i-agree-to-the-contained-eula -``` - -For VGF run: -```bash -./examples/arm/setup.sh --i-agree-to-the-contained-eula --disable-ethos-u-deps --enable-mlsdk-deps -``` -It is possible to install both sets of dependencies if you omit the disable options. - - -### Notes: - -```{warning} -The `setup.sh` script has generated a `setup_path.sh` script that you need to source whenever you restart your shell. -``` - -i.e. run -`source executorch/examples/arm/ethos-u-scratch/setup_path.sh` - - -To confirm your environment is set up correctly and will enable you to generate .pte's for your target: - -For Ethos-U run: -```bash -# Check for Vela, which converts TOSA to Ethos-U command streams. -which vela -``` - -For VGF run: -```bash -# Check for model-converter, which converts TOSA to ML-SDK VGF format. -which model-converter -``` - -To ensure there's no environment pollution you should confirm these binaries reside within your executorch checkout, under the examples/arm tree. Other versions may present compatibility issues, so this should be corrected by modifying your environment variables such as ${PATH} appropriately. - - -## Convert the PyTorch Model to the `.pte` File - -`.pte` is a binary file produced by ExecuTorch Ahead-of-Time (AoT) pipeline by taking in a PyTorch Model (a torch.nn.Module), exporting it, running a variety of passes, and finally serializing it to a `.pte` file format. This binary file is typically consumed by the ExecuTorch Runtime. This [document](https://github.com/pytorch/executorch/blob/main/docs/source/getting-started-architecture.md) goes in much more depth about the ExecuTorch software stack for both AoT as well as Runtime. - -In this section, we will primarily focus on the AoT flow with the end goal of producing a `.pte` file. There are a set of export configurations to target different backends at runtime. For each, the AoT flow will produce a unique `.pte` file. We will explore a couple of different configurations producing different `.pte` files, particularly interesting for our Corstone-300 system and available processing elements. - -Before we get started, let's first talk about the PyTorch modules we will be using. - -### PyTorch Example Modules -We will use a couple of simple PyTorch Modules to explore the end-to-end flow. These modules will be used in various different ways throughout the tutorial, referring to them by their ``. - -#### SoftmaxModule -This is a very simple PyTorch module with just one [Softmax](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html#torch.nn.Softmax) operator. - -```python -import torch - -class SoftmaxModule(torch.nn.Module): - def __init__(self): - super().__init__() - self.softmax = torch.nn.Softmax() - - def forward(self, x): - z = self.softmax(x) - return z -``` - -Running it using the Python environment (on the same development Linux machine), you get the expected output. - -```python ->>> m = SoftmaxModule() ->>> m(torch.ones(2,2)) -tensor([[0.5000, 0.5000], - [0.5000, 0.5000]]) -``` - -#### AddModule -Let's write another simple PyTorch module with just one [Add](https://pytorch.org/docs/stable/generated/torch.add.html#torch.add) operator. - -```python -class AddModule(torch.nn.Module): - def __init__(self): - super().__init__() - - def forward(self, x): - return x + x -``` - -Running it in python shows that 1 + 1 produces 2 as exepected: - -```python ->>> m = AddModule() ->>> m(torch.ones(5, dtype=torch.int32)) # integer types for non-quantized Ethos-U delegation -tensor([2, 2, 2, 2, 2], dtype=torch.int32) -``` -Keep the inputs and outputs to these modules in mind. When you will lower and run this through alternate means as opposed to running on this Linux machine, you will use the same inputs, and expect the outputs to match with the one shown here. - -```{tip} -you need to be aware of data types for running networks on the Ethos-U as it is an integer only co-processor. For this example you use integer types explicitly, for typical use of such a flow networks are built and trained in floating point, and then are quantized from floating point to integer for efficient inference. -``` - -#### MobileNetV2 Module -[MobileNetV2](https://arxiv.org/abs/1801.04381) is a commonly used network for edge and mobile devices. -It's also available as a default model in [torchvision](https://github.com/pytorch/vision), so you can load it with the sample code below. -``` -from torchvision.models import mobilenet_v2 # @manual -from torchvision.models.mobilenetv2 import MobileNet_V2_Weights - -mv2 = mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT) -``` -For more details, refer to the code snippet [here](https://github.com/pytorch/executorch/blob/2354945d47f67f60d9a118ea1a08eef8ba2364b5/examples/models/mobilenet_v2/model.py#L18). - -### Non-delegated Workflow - -In the ExecuTorch AoT pipeline, one of the options is to select a backend. ExecuTorch offers a variety of different backends. Selecting backend is optional, it is typically done to target a particular mode of acceleration or hardware for a given model compute requirements. Without any backends, ExecuTorch runtime will fallback to using, available by default, a highly portable set of operators. - -It's expected that on platforms with dedicated acceleration like the Ethos-U55, that the non-delegated flow is used for two primary cases: -1. When the network is designed to be very small and best suited to run on the Cortex-M alone. -2. When the network has a mix of operations that can target the NPU and those that can't, e.g. the Ethos-U55 supports integer operations and so floating point softmax will fall back to execute on the CPU. - -In this flow, without any backend delegates, to illustrate the portability of the ExecuTorch runtime, as well as of the operator library you will skip specifying the backend during the `.pte` generation. - -Following script will serve as a helper utility to help generating the `.pte` file. This is available in the `examples/arm` directory. - -```bash -python3 -m examples.arm.aot_arm_compiler --model_name="softmax" -# This should produce ./softmax_arm_ethos-u55-128.pte -``` - -### Delegated Workflow - -Working with Arm, you introduced a new Arm backend delegate for ExecuTorch. This backend is under active development and has a limited set of features available as of writing this. - -By including a following step during the ExecuTorch AoT export pipeline to generate the `.pte` file, you can enable this backend delegate. - -```python -from executorch.backends.arm.arm_backend import generate_ethosu_compile_spec - -graph_module_edge.exported_program = to_backend( - model.exported_program, - ArmPartitioner(generate_ethosu_compile_spec("ethos-u55-128"))) -``` - -Similar to the non-delegate flow, the same script will server as a helper utility to help generate the `.pte` file. Notice the `--delegate` option to enable the `to_backend` call. - -For Ethos targets: -```bash -python3 -m examples.arm.aot_arm_compiler --model_name="add" --delegate -# This targets the default of ethos-u55-128, see --help for further targets -# should produce ./add_arm_delegate_ethos-u55-128.pte -``` - -For basic post-training quantization: -```bash -python3 -m examples.arm.aot_arm_compiler --model_name="mv2" --delegate --quantize -# This targets the default of ethos-u55-128, see --help for further targets -# should produce ./mv2_arm_delegate_ethos-u55-128.pte -``` - - -For VGF targets: -```bash -python3 -m examples.arm.aot_arm_compiler --model_name="add" --target=vgf --delegate -# should produce ./add_arm_delegate_vgf.pte -``` - -For basic post-training quantization: -```bash -python3 -m examples.arm.aot_arm_compiler --model_name="mv2" --target=vgf --delegate --quantize -# should produce ./mv2_arm_delegate_vgf.pte -``` - -To capture intermediates such as VGF for lower level integration, invoke with the "-i" option: -```bash -python3 -m examples.arm.aot_arm_compiler --model_name="mv2" --target=vgf --delegate --quantize -i ./mv2_output -# should produce ./mv2_arm_delegate_vgf.pte and intermediates in ./mv2_out/ -``` - -
- -At the end of this, you should have a number of different `.pte` files. - -- the SoftmaxModule, without any backend delegates. -- the AddModule, targeting the Arm Ethos-U backend. -- the Quantized MV2Model, targeting the Arm Ethos-U backend. -- the AddModule, targeting the VGF backend. -- the Quantized MV2Model, targeting the VGF backend. - -Now let's try to run these `.pte` files on a target. - -## Getting a Bare-Metal Executable - -In this section, you will go over steps that you need to go through to build the runtime application. This then run on the target device. In the executorch repository you have a functioning script which does the exact same steps. It is located at `executorch/examples/arm/run.sh`. You will use that to build necessary pieces and finally run the previously generated PTE file on an FVP. - -By default the `run.sh` will use `arm_test/` as an build and output folder and you will find the build artifacts under it. This can be controlled/overrided with the `--et_build_root` and the `--output` flags if needed. - -e.g. running `examples/arm/run.sh --model_name=add --target=ethos-u85-128` will produce a pte and elf file like this: - -```bash -arm_test/add/add_arm_delegate_ethos-u85-128.pte -arm_test/add/cmake-out/arm_executor_runner -``` -Also before you get started, make sure that you have completed ExecuTorch cmake build setup, and the instructions to setup the development environment described [earlier](#set-up-the-developer-environment). - -The block diagram below demonstrates, at the high level, how the various build artifacts are generated and are linked together to generate the final bare-metal executable. - -![](arm-delegate-runtime-build.svg) - -```{tip} -The `generate_pte_file` function in `run.sh` script produces the `.pte` files based on the models provided through `--model_name` input argument -``` - -### Generating ExecuTorch Libraries - -ExecuTorch's CMake build system produces a set of build pieces which are critical to building the ExecuTorch runtime with-in the bare-metal environment you have for Corstone FVPs from Ethos-U SDK. - -[This](using-executorch-building-from-source.md) document provides a detailed overview of each individual build piece. For running either variant of the `.pte` file, you will need a core set of libraries. Here is a list, - -- `libexecutorch.a` -- `libportable_kernels.a` -- `libportable_ops_lib.a` - -To run a `.pte` file with the Arm backend delegate call instructions, you will need the Arm backend delegate runtime library, that is, - -- `libexecutorch_delegate_ethos_u.a` - -These libraries are generated by the `backends/arm/scripts/build_executorch.sh` script called from the `run.sh` script. - -### Building the executor_runner Bare-Metal Application - -The SDK dir is the same one prepared [earlier](#setup-the-arm-ethos-u-software-development). And, you will be passing the `.pte` file (any one of them) generated above. - -Note, you have to generate a new `executor-runner` binary if you want to change the model or the `.pte` file. This constraint is from the constrained bare-metal runtime environment you have for Corstone-300/Corstone-320 platforms. The build also generates a kernel registration library for the relevant operators which could not be delegated to the EthosU, see the [Kernel Library Selective Build documentation](https://docs.pytorch.org/executorch/stable/kernel-library-selective-build.html). - -This step is executed by the build_executor_runner.sh script, which is invoked from the run.sh in the backends/arm/scripts folder. - -```{tip} -The `run.sh` script takes in `--target` option, which provides a way to provide a specific target, Corstone-300(ethos-u55-128) or Corstone-320(ethos-u85-128) -``` - -## Running on Corstone FVP Platforms - -Once the elf is prepared, regardless of the `.pte` file variant is used to generate the bare metal elf. `run.sh` will run the FVP for you via the `backends/arm/scripts/run_fvp.sh` script. - -#### Automatic FVP Selection - -- To run a specific test model with the compiler flag and target -```bash -./run.sh --model_name=mv2 --delegate --quantize --target=ethos-u85-128 -``` - -- To run a specific test model and target -```bash -./run.sh --model_name=mv2 --delegate --target=ethos-u85-128 -``` - -- To run all the test models iteratively in a loop , simply run -```bash -./run.sh -``` - -Note that you could use `build_executor_runner.sh` and `run_fvp.sh` scripts in tandem by passing the relevant --target argument (e.g., --target=ethos-u55-128), the correct FVP binary will be chosen automatically. For more details, see the [section on Runtime Integration](https://docs.pytorch.org/executorch/main/backends-arm-ethos-u.html#runtime-integration). - - -#### Manual FVP Binary Selection - -- If you build for the Ethos delegate U55/U65 target (e.g., using --target=ethos-u55-128 or --target=ethos-u65-256 with `build_executor_runner.sh` and `run_fvp.sh`), you should use the corresponding FVP binary: - - For U55: - ```bash - examples/arm/ethos-u-scratch/FVP-corstone300/models/Linux64_GCC-9.3/FVP_Corstone_SSE-300_Ethos-U55 - ``` - - For U65: - ```bash - examples/arm/ethos-u-scratch/FVP-corstone300/models/Linux64_GCC-9.3/FVP_Corstone_SSE-300_Ethos-U65 - ``` -- And say if you are not building for an Ethos target, use: - ```bash - examples/arm/ethos-u-scratch/FVP-corstone320/models/Linux64_GCC-9.3/FVP_Corstone_SSE-320 - ``` - -Following is an example usage: - -```bash -ethos_u_build_dir=examples/arm/executor_runner/ - -elf=$(find ${ethos_u_build_dir} -name "arm_executor_runner") - -FVP_Corstone_SSE-320 \ - -C mps4_board.subsystem.ethosu.num_macs=128 \ - -C mps4_board.visualisation.disable-visualisation=1 \ - -C vis_hdlcd.disable_visualisation=1 \ - -C mps4_board.telnetterminal0.start_telnet=0 \ - -C mps4_board.uart0.out_file='-' \ - -C mps4_board.uart0.shutdown_on_eot=1 \ - -a "${elf}" \ - --timelimit 120 || true # seconds- after which sim will kill itself -``` - -#### Verification of Successful FVP Execution -After running the FVP command, either automatically or manually, you should see output similar to the following on your shell if the execution is successful: - -```console -I [executorch:arm_executor_runner.cpp:364] Model in 0x70000000 $ -I [executorch:arm_executor_runner.cpp:366] Model PTE file loaded. Size: 4425968 bytes. -I [executorch:arm_executor_runner.cpp:376] Model buffer loaded, has 1 methods -I [executorch:arm_executor_runner.cpp:384] Running method forward -I [executorch:arm_executor_runner.cpp:395] Setup Method allocator pool. Size: 62914560 bytes. -I [executorch:arm_executor_runner.cpp:412] Setting up planned buffer 0, size 752640. -I [executorch:ArmBackendEthosU.cpp:79] ArmBackend::init 0x70000070 -I [executorch:arm_executor_runner.cpp:445] Method loaded. -I [executorch:arm_executor_runner.cpp:447] Preparing inputs... -I [executorch:arm_executor_runner.cpp:461] Input prepared. -I [executorch:arm_executor_runner.cpp:463] Starting the model execution... -I [executorch:ArmBackendEthosU.cpp:118] ArmBackend::execute 0x70000070 -I [executorch:ArmBackendEthosU.cpp:298] Tensor input/output 0 will be permuted -I [executorch:arm_perf_monitor.cpp:120] NPU Inferences : 1 -I [executorch:arm_perf_monitor.cpp:121] Profiler report, CPU cycles per operator: -I [executorch:arm_perf_monitor.cpp:125] ethos-u : cycle_cnt : 1498202 cycles -I [executorch:arm_perf_monitor.cpp:132] Operator(s) total: 1498202 CPU cycles -I [executorch:arm_perf_monitor.cpp:138] Inference runtime: 6925114 CPU cycles total -I [executorch:arm_perf_monitor.cpp:140] NOTE: CPU cycle values and ratio calculations require FPGA and identical CPU/NPU frequency -I [executorch:arm_perf_monitor.cpp:149] Inference CPU ratio: 99.99 % -I [executorch:arm_perf_monitor.cpp:153] Inference NPU ratio: 0.01 % -I [executorch:arm_perf_monitor.cpp:162] cpu_wait_for_npu_cntr : 729 CPU cycles -I [executorch:arm_perf_monitor.cpp:167] Ethos-U PMU report: -I [executorch:arm_perf_monitor.cpp:168] ethosu_pmu_cycle_cntr : 5920305 -I [executorch:arm_perf_monitor.cpp:171] ethosu_pmu_cntr0 : 359921 -I [executorch:arm_perf_monitor.cpp:171] ethosu_pmu_cntr1 : 0 -I [executorch:arm_perf_monitor.cpp:171] ethosu_pmu_cntr2 : 0 -I [executorch:arm_perf_monitor.cpp:171] ethosu_pmu_cntr3 : 503 -I [executorch:arm_perf_monitor.cpp:178] Ethos-U PMU Events:[ETHOSU_PMU_EXT0_RD_DATA_BEAT_RECEIVED, ETHOSU_PMU_EXT1_RD_DATA_BEAT_RECEIVED, ETHOSU_PMU_EXT0_WR_DATA_BEAT_WRITTEN, ETHOSU_PMU_NPU_IDLE] -I [executorch:arm_executor_runner.cpp:470] model_pte_loaded_size: 4425968 bytes. -I [executorch:arm_executor_runner.cpp:484] method_allocator_used: 1355722 / 62914560 free: 61558838 ( used: 2 % ) -I [executorch:arm_executor_runner.cpp:491] method_allocator_planned: 752640 bytes -I [executorch:arm_executor_runner.cpp:493] method_allocator_loaded: 966 bytes -I [executorch:arm_executor_runner.cpp:494] method_allocator_input: 602116 bytes -I [executorch:arm_executor_runner.cpp:495] method_allocator_executor: 0 bytes -I [executorch:arm_executor_runner.cpp:498] temp_allocator_used: 0 / 1048576 free: 1048576 ( used: 0 % ) -I [executorch:arm_executor_runner.cpp:152] Model executed successfully. -I [executorch:arm_executor_runner.cpp:156] 1 outputs: -Output[0][0]: -0.749744 -Output[0][1]: -0.019224 -Output[0][2]: 0.134570 -...(Skipped) -Output[0][996]: -0.230691 -Output[0][997]: -0.634399 -Output[0][998]: -0.115345 -Output[0][999]: 1.576386 -I [executorch:arm_executor_runner.cpp:177] Program complete, exiting. -I [executorch:arm_executor_runner.cpp:179] -``` - -```{note} -The `run.sh` script provides various options to select a particular FVP target, use desired models, select portable kernels and can be explored using the `--help` argument -``` - -## Running on the VGF backend with the standard executor_runner for Linux - -Follow typical [Building ExecuTorch with CMake](using-executorch-building-from-source.md) flow to build the linux target, ensuring that the VGF delegate is enabled. - -```bash --DEXECUTORCH_BUILD_VGF=ON -``` - -A full example buld line is: -``` -cmake bash \ - -DCMAKE_INSTALL_PREFIX=cmake-out \ - -DCMAKE_BUILD_TYPE=Release \ - -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \ - -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ - -DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON \ - -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \ - -DEXECUTORCH_BUILD_XNNPACK=OFF \ - -DEXECUTORCH_BUILD_VULKAN=ON \ - -DEXECUTORCH_BUILD_VGF=ON \ - -DEXECUTORCH_ENABLE_LOGGING=ON \ - -DEXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL=ON \ - -DPYTHON_EXECUTABLE=python \ - -Bcmake-out . -cmake --build cmake-out -j25 --target install --config Release -``` - -You can then invoke the executor runner on the host machine, which will use the VGF delegate, and requires the vulkan layer drivers we installed with setup.sh. - -```bash -./cmake-out/executor_runner -model_path add_arm_delegate_vgf.pte -``` - - -## Takeaways -In this tutorial you have learnt how to use the ExecuTorch software to both export a standard model from PyTorch and to run it on the compact and fully functioned ExecuTorch runtime, enabling a smooth path for offloading models from PyTorch to Arm based platforms. - -To recap, there are two major flows: - * A direct flow which offloads work onto the Cortex-M using libraries built into ExecuTorch. - * A delegated flow which partitions the graph into sections for Cortex-M and sections which can be offloaded and accelerated on the Ethos-U hardware. - -Both of these flows continue to evolve, enabling more use-cases and better performance. - -## FAQs - - -If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new). From bc18834e38cfb8ce558754e56c151f5c2d6c6572 Mon Sep 17 00:00:00 2001 From: Mengwei Liu Date: Wed, 17 Sep 2025 01:10:04 -0700 Subject: [PATCH 002/395] [multimodal] Allow float32 image input (#14359) Letting `Image` class support both `uint8_t` and `float` data types, changing `MultimodalPrefiller` class to support text, image, and audio modalities with error checking and modularity. **Image Data Handling and Type Safety:** * Refactored the `Image` class in `image.h` from a simple struct to a class that uses a `std::variant` to support both `uint8_t` and `float` image data, providing type-safe accessors and a `toTensor` method for conversion to tensors. * Updated `load_image` in Llava `main.cpp` to construct `Image` objects using the new class interface and move semantics, ensuring correct data layout and encapsulation. * Added a runtime check in `LlavaImagePrefiller` to ensure only `uint8_t` images are processed, using the new type-checking methods. **Multimodal Prefill Logic and Flexibility:** * Updated the `MultimodalPrefiller` class in `multimodal_prefiller.h` to dynamically check input types, validate tensor types against model expectations, and handles encoder/decoder execution with improved error handling and modularity. --- examples/models/llava/main.cpp | 16 +-- extension/android/jni/jni_layer_llama.cpp | 2 +- .../Exported/ExecuTorchLLMMultimodalRunner.mm | 12 +- extension/llm/runner/image.h | 103 +++++++++++++- extension/llm/runner/multimodal_prefiller.cpp | 40 +++++- .../llm/runner/test/test_multimodal_input.cpp | 133 ++++++++---------- 6 files changed, 205 insertions(+), 101 deletions(-) diff --git a/examples/models/llava/main.cpp b/examples/models/llava/main.cpp index 6cb84aa088e..3946a629ade 100644 --- a/examples/models/llava/main.cpp +++ b/examples/models/llava/main.cpp @@ -81,24 +81,20 @@ void load_image(const std::string& image_path, Image& image) { new_height, 0, channels); - // transpose to CHW - image.data.resize(channels * new_width * new_height); + std::vector chw_data(channels * new_width * new_height); for (int i = 0; i < new_width * new_height; ++i) { for (int c = 0; c < channels; ++c) { - image.data[c * new_width * new_height + i] = - resized_data[i * channels + c]; + chw_data[c * new_width * new_height + i] = resized_data[i * channels + c]; } } - image.width = new_width; - image.height = new_height; - image.channels = channels; + image = Image(std::move(chw_data), new_width, new_height, channels); // convert to tensor ET_LOG( Info, "image Channels: %" PRId32 ", Height: %" PRId32 ", Width: %" PRId32, - image.channels, - image.height, - image.width); + image.channels(), + image.height(), + image.width()); stbi_image_free(data); } diff --git a/extension/android/jni/jni_layer_llama.cpp b/extension/android/jni/jni_layer_llama.cpp index 23686f01ee7..cabf30c42e4 100644 --- a/extension/android/jni/jni_layer_llama.cpp +++ b/extension/android/jni/jni_layer_llama.cpp @@ -268,7 +268,7 @@ class ExecuTorchLlmJni : public facebook::jni::HybridClass { for (int i = 0; i < image_size; i++) { image_data[i] = image_data_jint[i]; } - llm::Image image_runner{image_data, width, height, channels}; + llm::Image image_runner{std::move(image_data), width, height, channels}; prefill_inputs_.emplace_back( llm::MultimodalInput{std::move(image_runner)}); } diff --git a/extension/llm/apple/ExecuTorchLLM/Exported/ExecuTorchLLMMultimodalRunner.mm b/extension/llm/apple/ExecuTorchLLM/Exported/ExecuTorchLLMMultimodalRunner.mm index dcc5dc98806..b95e480aded 100644 --- a/extension/llm/apple/ExecuTorchLLM/Exported/ExecuTorchLLMMultimodalRunner.mm +++ b/extension/llm/apple/ExecuTorchLLM/Exported/ExecuTorchLLMMultimodalRunner.mm @@ -172,12 +172,12 @@ - (BOOL)generate:(NSArray *)inputs case ExecuTorchLLMMultimodalInputTypeImage: { ExecuTorchLLMImage *image = input.image; std::vector data((uint8_t *)image.data.bytes, (uint8_t *)image.data.bytes + image.data.length); - nativeInputs.emplace_back(llm::MultimodalInput(llm::Image{ - .data = std::move(data), - .width = (int32_t)image.width, - .height = (int32_t)image.height, - .channels = (int32_t)image.channels - })); + nativeInputs.emplace_back(llm::MultimodalInput(llm::Image( + std::move(data), + (int32_t)image.width, + (int32_t)image.height, + (int32_t)image.channels + ))); break; } default: { diff --git a/extension/llm/runner/image.h b/extension/llm/runner/image.h index 67fb8939518..dbdba273536 100644 --- a/extension/llm/runner/image.h +++ b/extension/llm/runner/image.h @@ -10,19 +10,112 @@ #pragma once #include +#include #include +#include #include +#include +#include + namespace executorch { namespace extension { namespace llm { -struct ET_EXPERIMENTAL Image { +class ET_EXPERIMENTAL Image { + public: + // Default constructor + Image() : width_(0), height_(0), channels_(0) {} + + // Constructor for uint8_t data + Image( + std::vector&& data, + int32_t width, + int32_t height, + int32_t channels) + : data_(std::move(data)), + width_(width), + height_(height), + channels_(channels) {} + + // Constructor for float data + Image( + std::vector&& data, + int32_t width, + int32_t height, + int32_t channels) + : data_(std::move(data)), + width_(width), + height_(height), + channels_(channels) {} + + // Getters + int32_t width() const { + return width_; + } + int32_t height() const { + return height_; + } + int32_t channels() const { + return channels_; + } + + // Data access + bool is_uint8() const { + return std::holds_alternative>(data_); + } + + bool is_float() const { + return std::holds_alternative>(data_); + } + + const std::vector& get_uint8_data() const& { + return std::get>(data_); + } + + std::vector& get_uint8_data() & { + return std::get>(data_); + } + + const std::vector& get_float_data() const& { + return std::get>(data_); + } + + std::vector& get_float_data() & { + return std::get>(data_); + } + + executorch::runtime::Result toTensor( + bool with_batch = false) const { + // Note: This creates a 3D tensor (CHW). The model might expect a 4D + // tensor (NCHW). The caller should handle reshaping if needed. + std::vector sizes = { + channels(), height(), width()}; + if (with_batch) { + sizes.insert(sizes.begin(), 1); + } + if (is_float()) { + return executorch::extension::from_blob( + const_cast(get_float_data().data()), + sizes, + ::executorch::aten::ScalarType::Float); + } else if (is_uint8()) { + return executorch::extension::from_blob( + const_cast(get_uint8_data().data()), + sizes, + ::executorch::aten::ScalarType::Byte); + } + ET_LOG( + Error, "Image data is not initialized with uint8_t or float vector."); + return ::executorch::runtime::Error::NotSupported; + } + + private: // Assuming NCHW format - std::vector data; - int32_t width; - int32_t height; - int32_t channels; + std::variant, std::vector> data_; + int32_t width_; + int32_t height_; + int32_t channels_; }; } // namespace llm diff --git a/extension/llm/runner/multimodal_prefiller.cpp b/extension/llm/runner/multimodal_prefiller.cpp index 2705a9eadff..3f8777d4acf 100644 --- a/extension/llm/runner/multimodal_prefiller.cpp +++ b/extension/llm/runner/multimodal_prefiller.cpp @@ -41,10 +41,42 @@ Result MultimodalPrefiller::prefill( ::executorch::runtime::EValue encoder_output; if (input.is_image()) { Image image = input.get_image(); - auto image_tensor = executorch::extension::from_blob( - image.data.data(), - {3, image.height, image.width}, - ::executorch::aten::ScalarType::Byte); + + auto method_meta = ET_UNWRAP( + module_->method_meta(kImageEncoderMethod), + "Failed to get method_meta for %s", + kImageEncoderMethod); + + ET_CHECK_MSG( + method_meta.num_inputs() > 0, + "Image encoder should have at least 1 input"); + auto input_meta = ET_UNWRAP( + method_meta.input_tensor_meta(0), + "Cannot get input tensor meta at index 0"); + auto expected_dtype = input_meta.scalar_type(); + + if (expected_dtype == ::executorch::aten::ScalarType::Float) { + ET_CHECK_MSG( + image.is_float(), + "Model expects float image data, but image has uint8_t data."); + } else if (expected_dtype == ::executorch::aten::ScalarType::Byte) { + ET_CHECK_MSG( + image.is_uint8(), + "Model expects uint8_t image data, but image has float data."); + } else { + ET_LOG( + Error, + "Unsupported image encoder input dtype: %s", + ::executorch::runtime::toString(expected_dtype)); + return ::executorch::runtime::Error::NotSupported; + } + + // The model might expect a 4D tensor (NCHW), but toTensor() returns a 3D + // tensor (CHW). Add a batch dimension of 1 if needed. + auto expected_dims = input_meta.sizes(); + auto image_tensor = ET_UNWRAP( + image.toTensor(/*with_batch*/ expected_dims.size() == 4), + "Failed to convert image to tensor"); // Run image encoder auto image_encoder_outputs = diff --git a/extension/llm/runner/test/test_multimodal_input.cpp b/extension/llm/runner/test/test_multimodal_input.cpp index 97b9cc1379e..486515175e8 100644 --- a/extension/llm/runner/test/test_multimodal_input.cpp +++ b/extension/llm/runner/test/test_multimodal_input.cpp @@ -16,7 +16,6 @@ using executorch::extension::llm::make_image_input; using executorch::extension::llm::make_text_input; using executorch::extension::llm::MultimodalInput; -namespace { class MultimodalInputTest : public Test { protected: std::string createTestText() { @@ -28,21 +27,13 @@ class MultimodalInputTest : public Test { } Image createTestImage() { - Image img; - img.width = 224; - img.height = 224; - img.channels = 3; - img.data = std::vector(224 * 224 * 3, 128); // Fill with gray - return img; + std::vector data(224 * 224 * 3, 128); // Fill with gray + return Image(std::move(data), 224, 224, 3); } Image createTestImageSmall() { - Image img; - img.width = 32; - img.height = 32; - img.channels = 1; - img.data = std::vector(32 * 32, 255); // Fill with white - return img; + std::vector data(32 * 32, 255); // Fill with white + return Image(std::move(data), 32, 32, 1); } }; @@ -76,28 +67,28 @@ TEST_F(MultimodalInputTest, ImageConstructorFromImage) { EXPECT_FALSE(input.is_text()); EXPECT_TRUE(input.is_image()); EXPECT_EQ(input.get_type(), MultimodalInput::Type::IMAGE); - EXPECT_EQ(input.get_image().width, 224); - EXPECT_EQ(input.get_image().height, 224); - EXPECT_EQ(input.get_image().channels, 3); - EXPECT_EQ(input.get_image().data.size(), 224 * 224 * 3); + EXPECT_EQ(input.get_image().width(), 224); + EXPECT_EQ(input.get_image().height(), 224); + EXPECT_EQ(input.get_image().channels(), 3); + EXPECT_EQ(input.get_image().get_uint8_data().size(), 224 * 224 * 3); } TEST_F(MultimodalInputTest, ImageConstructorFromRvalueImage) { Image img = createTestImage(); - int width = img.width; - int height = img.height; - int channels = img.channels; - size_t data_size = img.data.size(); + int width = img.width(); + int height = img.height(); + int channels = img.channels(); + size_t data_size = img.get_uint8_data().size(); MultimodalInput input(std::move(img)); EXPECT_FALSE(input.is_text()); EXPECT_TRUE(input.is_image()); EXPECT_EQ(input.get_type(), MultimodalInput::Type::IMAGE); - EXPECT_EQ(input.get_image().width, width); - EXPECT_EQ(input.get_image().height, height); - EXPECT_EQ(input.get_image().channels, channels); - EXPECT_EQ(input.get_image().data.size(), data_size); + EXPECT_EQ(input.get_image().width(), width); + EXPECT_EQ(input.get_image().height(), height); + EXPECT_EQ(input.get_image().channels(), channels); + EXPECT_EQ(input.get_image().get_uint8_data().size(), data_size); } // Test copy constructor and assignment @@ -129,10 +120,10 @@ TEST_F(MultimodalInputTest, CopyConstructorImage) { MultimodalInput copy(original); EXPECT_TRUE(copy.is_image()); - EXPECT_EQ(copy.get_image().width, 224); - EXPECT_EQ(copy.get_image().height, 224); - EXPECT_EQ(copy.get_image().channels, 3); - EXPECT_EQ(original.get_image().width, 224); // Original should be unchanged + EXPECT_EQ(copy.get_image().width(), 224); + EXPECT_EQ(copy.get_image().height(), 224); + EXPECT_EQ(copy.get_image().channels(), 3); + EXPECT_EQ(original.get_image().width(), 224); // Original should be unchanged } TEST_F(MultimodalInputTest, CopyAssignmentImage) { @@ -143,10 +134,10 @@ TEST_F(MultimodalInputTest, CopyAssignmentImage) { copy = original; EXPECT_TRUE(copy.is_image()); - EXPECT_EQ(copy.get_image().width, 224); - EXPECT_EQ(copy.get_image().height, 224); - EXPECT_EQ(copy.get_image().channels, 3); - EXPECT_EQ(original.get_image().width, 224); // Original should be unchanged + EXPECT_EQ(copy.get_image().width(), 224); + EXPECT_EQ(copy.get_image().height(), 224); + EXPECT_EQ(copy.get_image().channels(), 3); + EXPECT_EQ(original.get_image().width(), 224); // Original should be unchanged } // Test move constructor and assignment @@ -174,32 +165,32 @@ TEST_F(MultimodalInputTest, MoveAssignmentText) { TEST_F(MultimodalInputTest, MoveConstructorImage) { Image img = createTestImage(); - int width = img.width; - int height = img.height; - int channels = img.channels; + int width = img.width(); + int height = img.height(); + int channels = img.channels(); MultimodalInput original(std::move(img)); MultimodalInput moved(std::move(original)); EXPECT_TRUE(moved.is_image()); - EXPECT_EQ(moved.get_image().width, width); - EXPECT_EQ(moved.get_image().height, height); - EXPECT_EQ(moved.get_image().channels, channels); + EXPECT_EQ(moved.get_image().width(), width); + EXPECT_EQ(moved.get_image().height(), height); + EXPECT_EQ(moved.get_image().channels(), channels); } TEST_F(MultimodalInputTest, MoveAssignmentImage) { Image img = createTestImage(); - int width = img.width; - int height = img.height; - int channels = img.channels; + int width = img.width(); + int height = img.height(); + int channels = img.channels(); MultimodalInput original(std::move(img)); MultimodalInput moved(createTestText()); // Start with different type moved = std::move(original); EXPECT_TRUE(moved.is_image()); - EXPECT_EQ(moved.get_image().width, width); - EXPECT_EQ(moved.get_image().height, height); - EXPECT_EQ(moved.get_image().channels, channels); + EXPECT_EQ(moved.get_image().width(), width); + EXPECT_EQ(moved.get_image().height(), height); + EXPECT_EQ(moved.get_image().channels(), channels); } // Test getter methods with correct types @@ -227,16 +218,13 @@ TEST_F(MultimodalInputTest, GetImageWithImageInput) { // Test const lvalue reference version const MultimodalInput& const_input = input; - EXPECT_EQ(const_input.get_image().width, 224); - - // Test mutable lvalue reference version - Image& mutable_image = input.get_image(); - mutable_image.width = 448; - EXPECT_EQ(input.get_image().width, 448); + EXPECT_EQ(const_input.get_image().width(), 224); + EXPECT_EQ(const_input.get_image().height(), 224); + EXPECT_EQ(const_input.get_image().channels(), 3); // Test rvalue reference version Image moved_image = std::move(input).get_image(); - EXPECT_EQ(moved_image.width, 448); + EXPECT_EQ(moved_image.width(), 224); } // Test getter methods with wrong types (should throw) @@ -296,18 +284,14 @@ TEST_F(MultimodalInputTest, TryGetImageWithImageInput) { const MultimodalInput& const_input = input; const Image* image_ptr = const_input.try_get_image(); ASSERT_NE(image_ptr, nullptr); - EXPECT_EQ(image_ptr->width, 224); - EXPECT_EQ(image_ptr->height, 224); - EXPECT_EQ(image_ptr->channels, 3); + EXPECT_EQ(image_ptr->width(), 224); + EXPECT_EQ(image_ptr->height(), 224); + EXPECT_EQ(image_ptr->channels(), 3); // Test mutable version Image* mutable_image_ptr = input.try_get_image(); ASSERT_NE(mutable_image_ptr, nullptr); - EXPECT_EQ(mutable_image_ptr->width, 224); - - // Modify through pointer - mutable_image_ptr->width = 448; - EXPECT_EQ(input.get_image().width, 448); + EXPECT_EQ(mutable_image_ptr->width(), 224); } TEST_F(MultimodalInputTest, TryGetImageWithTextInput) { @@ -344,22 +328,22 @@ TEST_F(MultimodalInputTest, MakeImageInputFromImage) { MultimodalInput input = make_image_input(img); EXPECT_TRUE(input.is_image()); - EXPECT_EQ(input.get_image().width, 224); - EXPECT_EQ(input.get_image().height, 224); - EXPECT_EQ(input.get_image().channels, 3); + EXPECT_EQ(input.get_image().width(), 224); + EXPECT_EQ(input.get_image().height(), 224); + EXPECT_EQ(input.get_image().channels(), 3); } TEST_F(MultimodalInputTest, MakeImageInputFromRvalueImage) { Image img = createTestImage(); - int width = img.width; - int height = img.height; - int channels = img.channels; + int width = img.width(); + int height = img.height(); + int channels = img.channels(); MultimodalInput input = make_image_input(std::move(img)); EXPECT_TRUE(input.is_image()); - EXPECT_EQ(input.get_image().width, width); - EXPECT_EQ(input.get_image().height, height); - EXPECT_EQ(input.get_image().channels, channels); + EXPECT_EQ(input.get_image().width(), width); + EXPECT_EQ(input.get_image().height(), height); + EXPECT_EQ(input.get_image().channels(), channels); } // Test with different image sizes @@ -368,10 +352,10 @@ TEST_F(MultimodalInputTest, DifferentImageSizes) { MultimodalInput input(small_img); EXPECT_TRUE(input.is_image()); - EXPECT_EQ(input.get_image().width, 32); - EXPECT_EQ(input.get_image().height, 32); - EXPECT_EQ(input.get_image().channels, 1); - EXPECT_EQ(input.get_image().data.size(), 32 * 32); + EXPECT_EQ(input.get_image().width(), 32); + EXPECT_EQ(input.get_image().height(), 32); + EXPECT_EQ(input.get_image().channels(), 1); + EXPECT_EQ(input.get_image().get_uint8_data().size(), 32 * 32); } // Test with empty text @@ -424,11 +408,10 @@ TEST_F(MultimodalInputTest, AssignmentBetweenTypes) { // Assign image to text input input = MultimodalInput(img); EXPECT_TRUE(input.is_image()); - EXPECT_EQ(input.get_image().width, 224); + EXPECT_EQ(input.get_image().width(), 224); // Assign text back to image input input = MultimodalInput(text); EXPECT_TRUE(input.is_text()); EXPECT_EQ(input.get_text(), text); } -} // namespace From facf35d953a1b691847d78a8bdde757711c98613 Mon Sep 17 00:00:00 2001 From: winskuo-quic <143469905+winskuo-quic@users.noreply.github.com> Date: Thu, 18 Sep 2025 00:13:29 +0800 Subject: [PATCH 003/395] Qualcomm AI Engine Direct - Cat Fix (#14325) ### Summary Fix op cat to retrieve the right node. ### Test plan CI pass --- backends/qualcomm/builders/op_cat.py | 17 +++++++++-------- backends/qualcomm/tests/models.py | 9 +++++++++ backends/qualcomm/tests/test_qnn_delegate.py | 4 ++-- 3 files changed, 20 insertions(+), 10 deletions(-) diff --git a/backends/qualcomm/builders/op_cat.py b/backends/qualcomm/builders/op_cat.py index 9f6eb6676cf..644b087ab9c 100644 --- a/backends/qualcomm/builders/op_cat.py +++ b/backends/qualcomm/builders/op_cat.py @@ -29,14 +29,15 @@ def define_node( node: torch.fx.Node, nodes_to_wrappers: Dict[torch.fx.Node, PyQnnWrapper.TensorWrapper], ) -> PyQnnWrapper.PyQnnOpWrapper: - list_of_tensors = cast(List[torch.fx.Node], node.args[0]) - list_of_tensor_wrappers = [] + input_nodes = cast(List[torch.fx.Node], node.args[0]) + input_tensor_wrappers = [] - for tensor_input in list_of_tensors: - input_tensor = self.get_tensor(self.get_node(tensor_input), node) - list_of_tensor_wrappers.append( + for input_node in input_nodes: + source_input_node = self.get_node(input_node) + input_tensor = self.get_tensor(source_input_node, node) + input_tensor_wrappers.append( self.define_tensor( - tensor_input, + source_input_node, node, input_tensor, PyQnnWrapper.Qnn_TensorType_t.QNN_TENSOR_TYPE_NATIVE, @@ -44,7 +45,7 @@ def define_node( ) ) - if len(list_of_tensors) != len(list_of_tensor_wrappers): + if len(input_nodes) != len(input_tensor_wrappers): warnings.warn( "[QNN Delegate Op Builder]: The number or input tensors is not equal to the number of input tensor wrappers.", stacklevel=1, @@ -76,7 +77,7 @@ def define_node( QNN_OP_PACKAGE_NAME_QTI_AISW, OpConcat.op_name, ) - concat_op.AddInputTensors(list_of_tensor_wrappers) + concat_op.AddInputTensors(input_tensor_wrappers) concat_op.AddOutputTensors([output_tensor_wrapper]) concat_op.AddScalarParam( diff --git a/backends/qualcomm/tests/models.py b/backends/qualcomm/tests/models.py index 77ff1be4562..2de2cd098aa 100644 --- a/backends/qualcomm/tests/models.py +++ b/backends/qualcomm/tests/models.py @@ -274,6 +274,15 @@ def forward(self, x, y): return torch.cat((y, y, x, x), axis=2) +class Cat5(torch.nn.Module): + def __init__(self): + super().__init__() + self.const_tensor = torch.randn(1, 1, 2, 2) + + def forward(self, x, y): + return torch.cat((x, y, self.const_tensor), axis=2) + + class CausalMask(torch.nn.Module): def __init__(self): super().__init__() diff --git a/backends/qualcomm/tests/test_qnn_delegate.py b/backends/qualcomm/tests/test_qnn_delegate.py index 5a86d5f286d..0e75cf2844a 100644 --- a/backends/qualcomm/tests/test_qnn_delegate.py +++ b/backends/qualcomm/tests/test_qnn_delegate.py @@ -232,7 +232,7 @@ def test_qnn_backend_cast(self): self.lower_module_and_test_output(module, sample_input) def test_qnn_backend_cat(self): - modules = [Cat2(), Cat3(), Cat4()] # noqa: F405 + modules = [Cat2(), Cat3(), Cat4(), Cat5()] # noqa: F405 sample_input = (torch.randn(1, 1, 2, 2), torch.randn(1, 1, 4, 2)) for i, module in enumerate(modules): with self.subTest(i=i): @@ -1699,7 +1699,7 @@ def test_qnn_backend_cast(self): self.lower_module_and_test_output(module, sample_input) def test_qnn_backend_cat(self): - modules = [Cat2(), Cat3(), Cat4()] # noqa: F405 + modules = [Cat2(), Cat3(), Cat4(), Cat5()] # noqa: F405 sample_input = (torch.randn(1, 1, 2, 2), torch.randn(1, 1, 4, 2)) for i, module in enumerate(modules): with self.subTest(i=i): From 56659e4b72021121f809e80f4a5f2ca7fc8e6b79 Mon Sep 17 00:00:00 2001 From: lucylq Date: Wed, 17 Sep 2025 09:24:31 -0700 Subject: [PATCH 004/395] Revert "Quantized Softmax Kernel" (#14364) This reverts commit 94f62b7a5a0eb5b7f0066ec35c0263f1258b0952. Not landed internally and failing internal tests here: [D82596569](https://www.internalfb.com/diff/D82596569), causing fix-up patch --- backends/cadence/aot/ops_registrations.py | 39 --------- backends/cadence/aot/quantizer/fusion_pass.py | 79 +------------------ backends/cadence/aot/quantizer/patterns.py | 22 ------ backends/cadence/aot/quantizer/quantizer.py | 29 ------- 4 files changed, 1 insertion(+), 168 deletions(-) diff --git a/backends/cadence/aot/ops_registrations.py b/backends/cadence/aot/ops_registrations.py index bd2bf32834d..efb22a9e7d6 100644 --- a/backends/cadence/aot/ops_registrations.py +++ b/backends/cadence/aot/ops_registrations.py @@ -324,19 +324,6 @@ "rope.out(Tensor input, Tensor sin_tensor, Tensor cos_tensor, Tensor? pos, *, Tensor(a!) out) -> Tensor(a!)" ) -lib.define( - "quantized_softmax(Tensor input, Tensor mask, int dim, Tensor in_scale, Tensor in_zero_point, Tensor out_scale, Tensor out_zero_point) -> (Tensor out)" -) -lib.define( - "quantized_softmax.per_tensor(Tensor input, Tensor mask, int dim, float in_scale, int in_zero_point, float out_scale, int out_zero_point) -> (Tensor out)" -) -lib.define( - "quantized_softmax.out(Tensor input, Tensor mask, int dim, Tensor in_scale, Tensor in_zero_point, Tensor out_scale, Tensor out_zero_point, *, Tensor(a!) out) -> Tensor (a!)" -) -lib.define( - "quantized_softmax.per_tensor_out(Tensor input, Tensor mask, int dim, float in_scale, int in_zero_point, float out_scale, int out_zero_point, *, Tensor(a!) out) -> Tensor (a!)" -) - # Load/store with iDMA. These only exist before memory planning. # Post memory planning, we check that outputs/inputs for the load/store are in # DTCM and replace idma_load/idma_store with idma_copy. @@ -2342,29 +2329,3 @@ def softmax_f32_f32_meta( half_to_float: Optional[bool] = None, ) -> torch.Tensor: return self.new_empty(self.size(), dtype=self.dtype) - - -@register_fake("cadence::quantized_softmax") -def quantized_softmax_meta( - input: torch.Tensor, - mask: torch.Tensor, - dim: int, - in_scale: torch.Tensor, - in_zero_point: torch.Tensor, - out_scale: torch.Tensor, - out_zero_point: torch.Tensor, -) -> torch.Tensor: - return input.new_empty(input.size(), dtype=input.dtype) - - -@register_fake("cadence::quantized_softmax.per_tensor") -def quantized_softmax_per_tensor_meta( - input: torch.Tensor, - mask: torch.Tensor, - dim: int, - in_scale: float, - in_zero_point: int, - out_scale: float, - out_zero_point: int, -) -> torch.Tensor: - return input.new_empty(input.size(), dtype=input.dtype) diff --git a/backends/cadence/aot/quantizer/fusion_pass.py b/backends/cadence/aot/quantizer/fusion_pass.py index ed14574a8c8..8f106a815ac 100644 --- a/backends/cadence/aot/quantizer/fusion_pass.py +++ b/backends/cadence/aot/quantizer/fusion_pass.py @@ -6,10 +6,9 @@ # pyre-strict -from typing import Any, cast, Dict, List, Tuple +from typing import Any, Dict, List, Tuple import torch -from executorch.backends.cadence.aot.compiler_utils import get_shape from executorch.backends.cadence.aot.quantizer.patterns import ( AddmmPattern, AddPattern, @@ -26,7 +25,6 @@ MatmulPattern, ReluPattern0, ReluPattern1, - SoftmaxPattern, ) from executorch.backends.cadence.aot.quantizer.utils import ( check_out_zero_point_is_min_range, @@ -390,73 +388,6 @@ def get_args_and_kwargs_relu( return args, kwargs -def get_args_and_kwargs_softmax( - graph_module: GraphModule, - inputs_inputs: List[fx.Node], - dequants_inputs: List[fx.Node], - quant_node: fx.Node, - op_node: fx.Node, -) -> Tuple[Tuple[ArgsType, ...], Dict[str, ArgsType]]: - # Make a dummy mask tensor - mask_shape = get_shape(graph_module, cast(fx.Node, quant_node.args[0])) - mask_shape = list(mask_shape) if mask_shape else [] - mask_shape[-1] = mask_shape[-1] // 16 - mask_tensor = graph_module.graph.call_function( - torch.ops.aten.full.default, - ( - mask_shape, - 0.0, - ), - {"dtype": torch.int32}, - ) - # Make the scale and zero_point tensors - in_scale_tensor = graph_module.graph.call_function( - torch.ops.aten.full.default, - ( - [1], - dequants_inputs[0].args[1], - ), - {"dtype": torch.float32}, - ) - in_zero_point_tensor = graph_module.graph.call_function( - torch.ops.aten.full.default, - ( - [1], - dequants_inputs[0].args[2], - ), - {"dtype": torch.int32}, - ) - out_scale_tensor = graph_module.graph.call_function( - torch.ops.aten.full.default, - ( - [1], - quant_node.args[1], - ), - {"dtype": torch.float32}, - ) - out_zero_point_tensor = graph_module.graph.call_function( - torch.ops.aten.full.default, - ( - [1], - quant_node.args[2], - ), - {"dtype": torch.int32}, - ) - - # Make the args and kwargs for the replacement op - args = ( - inputs_inputs[0], - mask_tensor, - op_node.args[1], - in_scale_tensor, - in_zero_point_tensor, - out_scale_tensor, - out_zero_point_tensor, - ) - kwargs = {} - return args, kwargs - - class QuantFusion(ExportPass): # pyre-ignore[2]: Parameter `patterns` has no type specified def __init__(self, patterns) -> None: @@ -612,14 +543,6 @@ def call(self, graph_module: fx.GraphModule) -> PassResult: # noqa: C901 dequants_inputs, quant_node, ) - elif isinstance(pattern, SoftmaxPattern): - args, kwargs = get_args_and_kwargs_softmax( - graph_module, - inputs_inputs, - dequants_inputs, - quant_node, - anchor_output_node, - ) fused = graph_module.graph.call_function( pattern.replacement_op(), args, diff --git a/backends/cadence/aot/quantizer/patterns.py b/backends/cadence/aot/quantizer/patterns.py index 33b476f5120..b653be27e8f 100644 --- a/backends/cadence/aot/quantizer/patterns.py +++ b/backends/cadence/aot/quantizer/patterns.py @@ -485,25 +485,3 @@ def partition_types(self) -> List[OpOverload]: class Conv2dReluPattern1(ConvReluBasePattern): def partition_types(self) -> List[OpOverload]: return [torch.ops.aten.conv2d.default, torch.ops.aten.relu_.default] - - -class SoftmaxPattern(QuantizationPattern): - - def partition_types(self) -> List[OpOverload]: - return [torch.ops.aten._softmax.default] - - def get_anchors( - self, gm: fx.GraphModule, fused_partition: List[fx.GraphModule] - ) -> PartitionAnchors: - # pyre-fixme[29]: `Union[BoundMethod[typing.Callable(torch._C.TensorBase.__ge... - softmax_node = fused_partition[0].nodes[-1] - - return PartitionAnchors( - inputs=[(softmax_node, 0)], - weights=[], - biases=[], - output=[(softmax_node,)], - ) - - def replacement_op(self) -> OpOverload: - return torch.ops.cadence.quantized_softmax.default diff --git a/backends/cadence/aot/quantizer/quantizer.py b/backends/cadence/aot/quantizer/quantizer.py index ad5f935173e..cce7c207a6b 100644 --- a/backends/cadence/aot/quantizer/quantizer.py +++ b/backends/cadence/aot/quantizer/quantizer.py @@ -27,7 +27,6 @@ QuantizationPattern, ReluPattern0, ReluPattern1, - SoftmaxPattern, ) from executorch.backends.cadence.aot.quantizer.utils import ( find_sequential_partitions_aten, @@ -59,15 +58,6 @@ observer_or_fake_quant_ctr=HistogramObserver.with_args(eps=2**-12), ) -act_qspec_asym16s = QuantizationSpec( - dtype=torch.int16, - quant_min=-32768, - quant_max=32767, - qscheme=torch.per_tensor_affine, - is_dynamic=False, - observer_or_fake_quant_ctr=HistogramObserver.with_args(eps=2**-12), -) - wgt_qspec_asym8s = QuantizationSpec( dtype=torch.int8, quant_min=-128, @@ -102,13 +92,6 @@ None, ) -qconfig_A16 = QuantizationConfig( - act_qspec_asym16s, - act_qspec_asym16s, - wgt_qspec_asym8s, - None, -) - class CadenceAtenQuantizer(Quantizer): def __init__( @@ -300,15 +283,3 @@ def __init__(self, quantizers: Optional[list[Quantizer]] = None) -> None: quantizers.append(CadenceAtenQuantizer(AddPattern(), qconfig_A8W8)) quantizers.append(CadenceAtenQuantizer(CatPattern(), qconfig_A8W8)) super().__init__(quantizers) - - -class CadenceWithSoftmaxQuantizer(CadenceQuantizer): - """ - Quantizer including A16 softmax - """ - - def __init__(self, quantizers: Optional[list[Quantizer]] = None) -> None: - if quantizers is None: - quantizers = get_cadence_default_quantizers() - quantizers.append(CadenceAtenQuantizer(SoftmaxPattern(), qconfig_A16)) - super().__init__(quantizers) From f9264f2c80a47d0c9ffffaf03b901c92784ffb5f Mon Sep 17 00:00:00 2001 From: Rohan Joshi Date: Wed, 17 Sep 2025 10:21:43 -0700 Subject: [PATCH 005/395] Fix eval_llama_qnn script (#14379) Fix eval_llama_qnn.py after recent changes and use eval utils --- .../oss_scripts/llama/eval_llama_qnn.py | 346 ++++++++++++------ 1 file changed, 225 insertions(+), 121 deletions(-) diff --git a/examples/qualcomm/oss_scripts/llama/eval_llama_qnn.py b/examples/qualcomm/oss_scripts/llama/eval_llama_qnn.py index b25e0cbdc7d..5fa0cd3fedf 100644 --- a/examples/qualcomm/oss_scripts/llama/eval_llama_qnn.py +++ b/examples/qualcomm/oss_scripts/llama/eval_llama_qnn.py @@ -4,44 +4,40 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. -import argparse +""" Utilities for running fast evals (using prefill mode version of model) on eager-quantized model and QDQ model, for experimentation purposes. """ + import json import logging import sys import types -from functools import partial import torch -from executorch.backends.qualcomm.quantizer.custom_annotation import ( - annotate_kv_8bit, - annotate_output_16a8w, - annotate_qkv_proj_sha, - StaticLLMQuantConfig, -) from executorch.backends.qualcomm.quantizer.observers.per_channel_param_observer import ( PerChannelParamObserver, ) from executorch.backends.qualcomm.quantizer.qconfig import ( _derived_bias_quant_spec, - get_ptq_per_channel_quant_config, QuantizationConfig, ) from executorch.backends.qualcomm.quantizer.quantizer import QuantDtype from executorch.backends.qualcomm.utils.utils import convert_linear_to_conv2d -from executorch.examples.models.llama.eval_llama_lib import ( - build_args_parser, - GraphModuleEvalWrapper, +from executorch.examples.models.llama.eval_llama_lib import build_args_parser +from executorch.examples.models.llama.hf_download import ( + download_and_convert_hf_checkpoint, ) from executorch.examples.models.llama.source_transformation.quantize import ( get_quant_embedding_transform, ) +from executorch.examples.qualcomm.oss_scripts.llama import SUPPORTED_LLM_MODELS -from executorch.examples.qualcomm.oss_scripts.llama.decoder_utils import calibrate +from executorch.examples.qualcomm.oss_scripts.llama.decoder_utils import ( + graph_module_inference, +) from executorch.examples.qualcomm.oss_scripts.llama.model.static_llama import ( LlamaModel, @@ -55,13 +51,17 @@ WrappedLlamaModel, ) from lm_eval.evaluator import simple_evaluate - -from pytorch_tokenizers import get_tokenizer +from pytorch_tokenizers import get_tokenizer, TiktokenTokenizer +from pytorch_tokenizers.llama2c import Llama2cTokenizer as SentencePieceTokenizer +from torchao.prototype.quantization.module_swap.module_swap import ( + QuantizationRecipe, + quantize_module_swap, +) from torchao.prototype.spinquant import apply_spinquant -from torchao.quantization.pt2e import MinMaxObserver from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e from torchao.quantization.pt2e.quantizer import QuantizationSpec +from transformers import AutoTokenizer sys.setrecursionlimit(4096) @@ -97,13 +97,58 @@ def add_mse_weight_observer(quant_dtype, quantizer): ) -def prepare_model(model_name, args): - with open(args.params) as f: +def prepare_tokenizer(args): + runtime_tokenizer_path = "" + if args.decoder_model in {"stories110m", "stories260k"}: + tokenizer = get_tokenizer(args.tokenizer_model) + assert isinstance( + tokenizer, SentencePieceTokenizer + ), "Wrong tokenizer provided for stories." + assert ( + args.tokenizer_bin is not None + ), "Please provide tokenizer_bin for stories." + runtime_tokenizer_path = args.tokenizer_bin + elif args.decoder_model == "llama3_2": + tokenizer = get_tokenizer(args.tokenizer_model) + assert isinstance( + tokenizer, TiktokenTokenizer + ), "Wrong tokenizer provided for llama3_2." + runtime_tokenizer_path = args.tokenizer_model + elif args.decoder_model == "phi_4_mini": + model_id = SUPPORTED_LLM_MODELS[args.decoder_model].repo_id + tokenizer = AutoTokenizer.from_pretrained(model_id) + runtime_tokenizer_path = tokenizer.save_pretrained(args.artifact)[-1] + tokenizer = get_tokenizer(runtime_tokenizer_path) + with open(runtime_tokenizer_path, "r+") as file: + data = json.load(file) + # TODO: Encountered the following error during runtime, so switched behavior for now. + # Error: libc++abi: terminating due to uncaught exception of type std::runtime_error: invert=true is not supported for Split PreTokenizer. Only invert=false is supported. + data["pre_tokenizer"]["pretokenizers"][-2]["invert"] = False + file.seek(0) + json.dump(data, file, indent=4) + file.truncate() + elif args.decoder_model in SUPPORTED_LLM_MODELS: + model_id = SUPPORTED_LLM_MODELS[args.decoder_model].repo_id + tokenizer = AutoTokenizer.from_pretrained(model_id) + runtime_tokenizer_path = tokenizer.save_pretrained(args.artifact)[-1] + tokenizer = get_tokenizer(runtime_tokenizer_path) + else: + raise RuntimeError(f"Unknown decoder_model: {args.decoder_model}.") + return tokenizer + + +def prepare_model(args): + if args.params: + params_path = args.params + else: + params_path = SUPPORTED_LLM_MODELS[args.decoder_model].params_path + with open(params_path) as f: prefill_config = ModelArgs(**json.load(f)) - # TODO: support batch inputs if necessary - prefill_config.max_batch_size = 1 - prefill_config.max_seq_len = args.max_seq_length - prefill_config.use_kv_cache = False + # TODO: support batch inputs if necessary + prefill_config.max_batch_size = 1 + prefill_config.max_seq_len = args.max_seq_length + prefill_config.use_kv_cache = False + prefill_config.enable_r3 = args.r3 use_i64_token = args.embedding_quantize is not None model = LlamaModel( prefill_config, @@ -112,47 +157,69 @@ def prepare_model(model_name, args): output_cache=False, use_i64_token=use_i64_token, ) - state_dict = torch.load( - args.checkpoint, weights_only=True, map_location=args.device, mmap=True - ) - - # Change to HuggingFace weight to improve the performance of RoPE in HTP backend. - def permute(w, heads): - dim_0 = w.size(0) - dim_1 = w.size(1) - return ( - w.view(heads, dim_0 // heads // 2, 2, dim_1) - .transpose(1, 2) - .reshape(dim_0, dim_1) + if args.checkpoint is None: # HF models + checkpoint = download_and_convert_hf_checkpoint( + SUPPORTED_LLM_MODELS[args.decoder_model].repo_id, + SUPPORTED_LLM_MODELS[args.decoder_model].convert_weights.__func__, ) - - n_heads = model.n_heads - n_kv_heads = model.n_kv_heads - n_layers = model.n_layers - - for layer_i in range(n_layers): - state_dict[f"layers.{layer_i}.attention.wq.weight"] = permute( - state_dict[f"layers.{layer_i}.attention.wq.weight"], n_heads + state_dict = torch.load( + checkpoint, weights_only=True, map_location=args.device, mmap=True ) - state_dict[f"layers.{layer_i}.attention.wk.weight"] = permute( - state_dict[f"layers.{layer_i}.attention.wk.weight"], n_kv_heads + transform_weight = SUPPORTED_LLM_MODELS[args.decoder_model].transform_weight + else: + state_dict = torch.load( + args.checkpoint, weights_only=True, map_location=args.device, mmap=True ) + if "model" in state_dict: + state_dict = state_dict["model"] + + if args.decoder_model == "stories260k": + state_dict = {k.replace("_orig_mod.", ""): v for k, v in state_dict.items()} + transform_weight = True + + if transform_weight: + # Change to HuggingFace weight to improve the performance of RoPE in HTP backend. + def permute(w, heads): + dim_0 = w.size(0) + dim_1 = w.size(1) + return ( + w.view(heads, dim_0 // heads // 2, 2, dim_1) + .transpose(1, 2) + .reshape(dim_0, dim_1) + ) + + n_heads = model.n_heads + n_kv_heads = model.n_kv_heads + n_layers = model.n_layers + + for layer_i in range(n_layers): + state_dict[f"layers.{layer_i}.attention.wq.weight"] = permute( + state_dict[f"layers.{layer_i}.attention.wq.weight"], n_heads + ) + state_dict[f"layers.{layer_i}.attention.wk.weight"] = permute( + state_dict[f"layers.{layer_i}.attention.wk.weight"], n_kv_heads + ) + model.load_state_dict( state_dict, strict=True, assign=True, ) + return model, prefill_config - if "model" in state_dict: - state_dict = state_dict["model"] +def prequant_algorithm(model, prefill_config, args): # TODO: use dtype of model checkpoint model = model.to(device=args.device, dtype=torch.float) inputs = model.get_example_inputs(use_kv_cache=False) tokens, atten_mask = inputs + tokens.to(args.device) + for mask in atten_mask.masks: + mask.mask.to(args.device) scales_state_dict = {} + if args.spinquant: config = types.SimpleNamespace( dim=prefill_config.dim, @@ -201,31 +268,55 @@ def permute(w, heads): return model, prefill_config, inputs, scales_state_dict -def gen_eval_wrapper(model_name, args): - tokenizer = get_tokenizer(args.tokenizer_path) - model, config, inputs, scales_state_dict = prepare_model(model_name, args) - tokens, atten_mask = inputs +def eager_eval_quanty( + model, + weight_bits, + act_bits, + embedding_quantization, + dynamic_activations=False, + dynamic_weights=False, +): + """ + Run evaluations where we quantize only linear layers with Quanty (eager-mode module swap quantization flow) + Although when lowering to Qualcomm backend using the PT2E flow we quantize all (not just linear) layers, + Quanty flow is fast and can be used for rapid experimentation. + """ + + recipe = QuantizationRecipe( + weight_bits=weight_bits, + weight_quantization=True, + dynamic_weights=dynamic_weights, + weight_group_size="per_channel", + activation_bits=act_bits, + activation_quantization=True, + activation_group_size="per_tensor", + input_quantization=True, + output_quantization=True, + dynamic_activations=dynamic_activations, + embedding_quantization=embedding_quantization, + ) + + quantized_model = quantize_module_swap(model, recipe) + simple_evaluate( + model=model, + tasks=["wikitext"], + ) + + reverse_quantize_module_swap(quantized_model) + + +def eval_llm(args): + tokenizer = prepare_tokenizer(args) + model, prefill_config = prepare_model(args) + model, config, inputs, scales_state_dict = prequant_algorithm( + model, prefill_config, args + ) use_i64_token = args.embedding_quantize is not None if args.ptq is not None: quant_dtype = getattr(QuantDtype, f"use_{args.ptq}") - - quantization_config_wv_sha_8a4w = get_ptq_per_channel_quant_config( - act_dtype=torch.uint8, - weight_dtype=torch.int4, - act_observer=MinMaxObserver, - act_symmetric=True, - ) - custom_annotations = ( - annotate_kv_8bit, - partial( - annotate_qkv_proj_sha, - qkv_tags={StaticLLMQuantConfig.wv_sha}, - quantization_config=quantization_config_wv_sha_8a4w, - ), - ) - if args.llama_model == "stories110m": - custom_annotations = custom_annotations + (annotate_output_16a8w,) + decoder_model_config = SUPPORTED_LLM_MODELS[args.decoder_model] + custom_annotations = decoder_model_config.custom_annotation quantizer = make_custom_quantizer( quant_dtype, args.range_setting, custom_annotations, args.quant_linear_only @@ -233,7 +324,9 @@ def gen_eval_wrapper(model_name, args): with torch.no_grad(): logging.info("Starting export...") - model = torch.export.export(model, inputs, strict=True).module() + model = torch.export.export( + model, (inputs[0], *inputs[1]), strict=True + ).module() if quant_dtype == QuantDtype.use_16a4w_block: conv_nodes = [n for n in model.graph.nodes if "conv" in n.name] block_size_map = {n.name: (1, 64, 1, 1) for n in conv_nodes} @@ -242,16 +335,18 @@ def gen_eval_wrapper(model_name, args): model = prepare_pt2e(model, quantizer) logging.info("Observers added, starting calibration...") - - calibrate( - inputs, - "Once upon a time", - model, + graph_module_inference( + use_kv_cache=False, + get_example_inputs=lambda use_kv_cache=False: inputs, + module=model, tokenizer=tokenizer, - ar_len=args.prefill_ar_len, + ar_len=args.max_seq_len, max_seq_len=args.max_seq_len, - kv_updater=None, + kv_updater=args.kv_updater, + tasks=["wikitext"], + tasks_limit=1, use_i64_token=use_i64_token, + event_name="prepare_pt2e_prompt", ) if args.range_setting == "mse_with_act_loss": @@ -262,61 +357,37 @@ def gen_eval_wrapper(model_name, args): model = convert_pt2e(model) logging.info("Quantization complete! Here is some sample generated text:") - calibrate( - inputs, - "Could you tell me about Facebook?", - model, + graph_module_inference( + use_kv_cache=False, + get_example_inputs=lambda use_kv_cache=False: inputs, + module=model, tokenizer=tokenizer, - ar_len=args.prefill_ar_len, + ar_len=args.max_seq_len, max_seq_len=args.max_seq_len, - kv_updater=None, + kv_updater=args.kv_updater, + prompt="Can you tell me about Facebook?", use_i64_token=use_i64_token, + event_name="convert_pt2e_prompt", ) - model = WrappedLlamaModel( - model, atten_mask, args.use_kv_cache, args.max_seq_length, args.device - ) - - return GraphModuleEvalWrapper( - model=model, + logging.info("Evaluation of QDQ model:") + graph_module_inference( + use_kv_cache=False, + get_example_inputs=lambda use_kv_cache=False: inputs, + module=model, tokenizer=tokenizer, - max_seq_length=args.calibration_seq_length, - use_kv_cache=args.use_kv_cache, - generate_full_logits=args.generate_full_logits, - enable_dynamic_shape=False, + ar_len=args.max_seq_len, + max_seq_len=args.max_seq_len, + kv_updater=args.kv_updater, + tasks=["wikitext"], + use_i64_token=use_i64_token, + event_name="convert_pt2e_prompt", ) -def eval_llama( - model_name: str, - args: argparse.Namespace, -) -> None: - # Generate the eval wrapper - eval_wrapper = gen_eval_wrapper(model_name, args) - - # Needed for loading mmlu dataset. - # See https://github.com/EleutherAI/lm-evaluation-harness/pull/1998/files - if args.tasks and "mmlu" in args.tasks: - import datasets - - datasets.config.HF_DATASETS_TRUST_REMOTE_CODE = True - # Evaluate the model - with torch.no_grad(): - eval_results = simple_evaluate( - model=eval_wrapper, - tasks=args.tasks, - num_fewshot=args.num_fewshot, - limit=args.fraction, - ) - - for task, res in eval_results["results"].items(): - print(f"{task}: {res}") - - def main() -> None: seed = 42 torch.manual_seed(seed) - modelname = "llama2" parser = build_args_parser() parser.add_argument( "-P", @@ -344,9 +415,42 @@ def main() -> None: help="if you select this option we quantize linear layers only", action="store_true", ) + parser.add_argument( + "--kv_updater", + help="Choose how to update kv cache during runtime", + choices=["smart_mask", "shift_pointer"], + default="smart_mask", + type=str, + ) + parser.add_argument( + "--decoder_model", + choices=["stories260k", "stories110m", "llama3_2"] + + list(SUPPORTED_LLM_MODELS.keys()), + help=f"The Llama model to export. Current available options are: [stories260k, stories110m, llama3_2] + {SUPPORTED_LLM_MODELS.keys()}", + required=True, + ) + parser.add_argument( + "-a", + "--artifact", + help="path for storing generated artifacts and output by this example. Default ./llama_qnn", + default="./eval_llama_qnn", + type=str, + ) + parser.add_argument( + "--r3", + help="Enable SpinQuant R3 quantization optimization. Please notice enable R3 could possibly cause performance drop.", + action="store_true", + default=False, + ) + parser.add_argument( + "--tokenizer_model", + help="Pass llama tokenizer model.", + type=str, + default=None, + ) args = parser.parse_args() - args.llama_model = "llama3_2" + # Overrides this arg, because evaluation requires full logits. args.generate_full_logits = True @@ -357,10 +461,10 @@ def main() -> None: args.use_kv_cache = False args.prefill_ar_len = args.max_seq_length - args.device = "cuda" if torch.cuda.is_available() else "cpu" + args.device = "cuda:0" if torch.cuda.is_available() else "cpu" torch.set_default_device(args.device) - eval_llama(modelname, args) + eval_llm(args) if __name__ == "__main__": From 16d8109d2a4cc29347fdba319eaa9fca02231771 Mon Sep 17 00:00:00 2001 From: Gregory Comer Date: Wed, 17 Sep 2025 11:48:09 -0600 Subject: [PATCH 006/395] [Windows] Reduce trunk model test count (#14348) Reduce the number of models we test on Windows CI in order to reduce runner utilization. I've tried to pick a reasonably representative set. I don't think we're getting much incremental value from the ones we're cutting and will save on CI costs. We can re-evaluate this when we have shared/cached build for Windows CI runs. --- .github/workflows/trunk.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/trunk.yml b/.github/workflows/trunk.yml index 975a8ebbb30..629c84847f6 100644 --- a/.github/workflows/trunk.yml +++ b/.github/workflows/trunk.yml @@ -1016,8 +1016,8 @@ jobs: strategy: fail-fast: false matrix: - model: [linear, add, add_mul, ic3, ic4, mv2, mv3, resnet18, resnet50, vit, w2l, mobilebert, emformer_join, emformer_transcribe] - backend: [portable, xnnpack-f32, xnnpack-q8] + model: [mv3, resnet50, vit, mobilebert, emformer_transcribe] + backend: [portable, xnnpack-q8] with: submodules: 'recursive' ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} From 6fa20624b46785db67232286d5dc51d2ace96a34 Mon Sep 17 00:00:00 2001 From: JP <46308822+zonglinpeng@users.noreply.github.com> Date: Wed, 17 Sep 2025 11:13:23 -0700 Subject: [PATCH 007/395] limit facto tensor size to random 4000 numel Differential Revision: D82483921 Pull Request resolved: https://github.com/pytorch/executorch/pull/14317 --- backends/cadence/utils/facto_util.py | 62 +++++++++++++++++++++++++++- 1 file changed, 60 insertions(+), 2 deletions(-) diff --git a/backends/cadence/utils/facto_util.py b/backends/cadence/utils/facto_util.py index 5b204e99fcb..2ab5f731210 100644 --- a/backends/cadence/utils/facto_util.py +++ b/backends/cadence/utils/facto_util.py @@ -23,8 +23,66 @@ def apply_tensor_contraints(op_name: str, index: int) -> list[object]: - # Constraint to limit tensor size product to < 4000 - max_size_constraint = cp.Size.Le(lambda deps, r, d: max(1, int((3999) ** (1 / r)))) + # Constraint to limit tensor size product to < 4000 with fully randomized shapes + import random + + # Global cache to store generated shapes per tensor to ensure consistency + _shape_cache: dict[str, list[int]] = {} + + def generate_random_shape_with_product_limit( + rank: int, max_product: int = 3999, seed_base: int = 42 + ) -> list[int]: + """Generate a random shape with given rank ensuring product < max_product""" + random.seed(seed_base + rank) + + # Start with all dimensions as 1 + shape = [1] * rank + remaining_product = max_product - 1 # Leave room since we start with product=1 + + # Randomly distribute the remaining capacity across dimensions + for i in range(rank): + if remaining_product <= 1: + break + + # Calculate maximum size this dimension can have without exceeding limit + current_product = 1 + for j in range(rank): + if j != i: + current_product *= shape[j] + + max_size_for_dim = min( + remaining_product // current_product, 50 + ) # Cap at 50 + if max_size_for_dim > shape[i]: + # Randomly choose a size between current and max + new_size = random.randint(shape[i], max_size_for_dim) + shape[i] = new_size + remaining_product = max_product // (current_product * new_size) + remaining_product = max(1, remaining_product) + + # Final random shuffle of the dimensions to make it more random + random.shuffle(shape) + return shape + + def random_size_constraint(deps: object, r: int, d: int) -> int: + """Generate random sizes ensuring total product < 4000""" + # Create a unique key for this tensor configuration + cache_key = f"{r}_{d}" + + if cache_key not in _shape_cache: + # Generate a new random shape for this rank + shape = generate_random_shape_with_product_limit( + r, max_product=3999, seed_base=42 + r * 10 + ) + _shape_cache[cache_key] = shape + + # Return the size for dimension d, ensuring we don't go out of bounds + cached_shape = _shape_cache[cache_key] + return cached_shape[d] if d < len(cached_shape) else 1 + + max_size_constraint = cp.Size.Le( + lambda deps, r, d: random_size_constraint(deps, r, d) + ) tensor_constraints = ( [ From 8d081eda6da0dea543e9a2b58609375527d8ecc9 Mon Sep 17 00:00:00 2001 From: Gasoonjia Date: Wed, 17 Sep 2025 11:14:40 -0700 Subject: [PATCH 008/395] aten mode clone dim order op Differential Revision: D82558256 Pull Request resolved: https://github.com/pytorch/executorch/pull/14340 --- kernels/aten/cpu/op__clone_dim_order.cpp | 128 ++++++++++++++++++ kernels/aten/cpu/targets.bzl | 6 + kernels/aten/edge_dialect_aten_op.yaml | 5 + kernels/test/op__clone_dim_order_test.cpp | 3 - kernels/test/targets.bzl | 2 +- .../xplat/executorch/kernels/test/util.bzl | 6 +- 6 files changed, 144 insertions(+), 6 deletions(-) create mode 100644 kernels/aten/cpu/op__clone_dim_order.cpp diff --git a/kernels/aten/cpu/op__clone_dim_order.cpp b/kernels/aten/cpu/op__clone_dim_order.cpp new file mode 100644 index 00000000000..5e6c35d64f9 --- /dev/null +++ b/kernels/aten/cpu/op__clone_dim_order.cpp @@ -0,0 +1,128 @@ +/* + * Copyright (c) Meta Platforms, Inc. and affiliates. + * All rights reserved. + * + * This source code is licensed under the BSD-style license found in the + * LICENSE file in the root directory of this source tree. + */ + +#include +#include + +namespace torch { +namespace executor { +namespace native { + +using Tensor = executorch::aten::Tensor; +using SizesArrayRef = executorch::aten::ArrayRef; +using DimOrderArrayRef = + executorch::aten::ArrayRef; +using MemoryFormat = executorch::aten::MemoryFormat; + +template +using OptionalArrayRef = executorch::aten::OptionalArrayRef; + +template +using Optional = std::optional; + +namespace { +Optional get_memory_format(OptionalArrayRef dim_order) { + if (!dim_order.has_value()) { + return executorch::aten::nullopt; + } + if (is_contiguous_dim_order( + dim_order.value().data(), dim_order.value().size())) { + return MemoryFormat::Contiguous; + } else if (is_channels_last_dim_order( + dim_order.value().data(), dim_order.value().size())) { + return MemoryFormat::ChannelsLast; + } else { + ET_ASSERT_UNREACHABLE(); + } +} + +bool check__clone_dim_order_args( + const Tensor& input, + bool non_blocking, + executorch::aten::OptionalArrayRef dim_order, + Tensor& out) { + // Right now we only support blocking data transfer + ET_LOG_AND_RETURN_IF_FALSE(non_blocking == false); + + // Ensure input and output dtype match + ET_LOG_AND_RETURN_IF_FALSE(input.scalar_type() == out.scalar_type()); + + // dim_order is set, the target dim_order will be either contiguous or + // channels_last memory format + if (dim_order.has_value()) { + executorch::aten::ArrayRef dim_order_ref = dim_order.value(); + + // dim order size shall equal to input dim + ET_LOG_AND_RETURN_IF_FALSE(dim_order_ref.size() == input.dim()); + + ET_LOG_AND_RETURN_IF_FALSE( + is_channels_last_dim_order( + dim_order.value().data(), dim_order.value().size()) || + is_contiguous_dim_order( + dim_order.value().data(), dim_order.value().size())); + + // Out Aten tensor shall have same memory format stride as dim_order + const size_t kMaxNumOfDimensions = 16; + ET_LOG_AND_RETURN_IF_FALSE(kMaxNumOfDimensions >= out.dim()); + executorch::aten::StridesType target_strides[kMaxNumOfDimensions]; + dim_order_to_stride_nocheck( + out.sizes().data(), + dim_order_ref.data(), + dim_order_ref.size(), + target_strides); + ET_LOG_AND_RETURN_IF_FALSE(out.dim() == dim_order_ref.size()); + for (size_t i = 0; i < dim_order_ref.size(); i++) { + ET_LOG_AND_RETURN_IF_FALSE(target_strides[i] == out.strides()[i]); + } + + } else { // dim_order is not set, preserve the dim order of input + + auto out_strides = out.strides(); + auto input_strides = input.strides(); + ET_LOG_AND_RETURN_IF_FALSE(input_strides.size() == out_strides.size()); + for (size_t i = 0; i < input_strides.size(); i++) { + ET_LOG_AND_RETURN_IF_FALSE(input_strides[i] == out_strides[i]); + } + } + return true; +} +} // namespace + +// _clone_dim_order.out(Tensor self, *, bool non_blocking=False, int[]? +// dim_order=None, Tensor(a!) out) -> Tensor(a!) +Tensor& _clone_dim_order_out( + KernelRuntimeContext& ctx, + const Tensor& self, + bool non_blocking, + OptionalArrayRef dim_order, + Tensor& out) { + // TODO(T181345875): enable sanity check in aten mode + ET_KERNEL_CHECK( + ctx, + check__clone_dim_order_args(self, non_blocking, dim_order, out), + InvalidArgument, + out); + + Optional memory_format = get_memory_format(dim_order); + at::clone_outf(self, memory_format, out); + + return out; +} + +Tensor& _clone_dim_order_out( + const Tensor& self, + bool non_blocking, + OptionalArrayRef dim_order, + Tensor& out) { + KernelRuntimeContext ctx{}; + return _clone_dim_order_out(ctx, self, non_blocking, dim_order, out); +} + +} // namespace native +} // namespace executor +} // namespace torch diff --git a/kernels/aten/cpu/targets.bzl b/kernels/aten/cpu/targets.bzl index bb7083c1f01..e39bbdd144d 100644 --- a/kernels/aten/cpu/targets.bzl +++ b/kernels/aten/cpu/targets.bzl @@ -18,6 +18,12 @@ _EDGE_DIALECT_OPS = ( "//executorch/kernels/aten/cpu/util:copy_ops_util", ], ), + op_target( + name = "op__clone_dim_order", + deps = [ + "//executorch/kernels/aten/cpu/util:copy_ops_util", + ], + ), ) def define_common_targets(): diff --git a/kernels/aten/edge_dialect_aten_op.yaml b/kernels/aten/edge_dialect_aten_op.yaml index d9de3f6dded..1a74b3c71d1 100644 --- a/kernels/aten/edge_dialect_aten_op.yaml +++ b/kernels/aten/edge_dialect_aten_op.yaml @@ -11,3 +11,8 @@ kernels: - arg_meta: null kernel_name: torch::executor::_to_dim_order_copy_out + +- func: dim_order_ops::_clone_dim_order.out(Tensor self, *, bool non_blocking=False, int[]? dim_order=None, Tensor(a!) out) -> Tensor(a!) + kernels: + - arg_meta: null + kernel_name: torch::executor::_clone_dim_order_out diff --git a/kernels/test/op__clone_dim_order_test.cpp b/kernels/test/op__clone_dim_order_test.cpp index d999897cdf3..f009ce1b195 100644 --- a/kernels/test/op__clone_dim_order_test.cpp +++ b/kernels/test/op__clone_dim_order_test.cpp @@ -7,9 +7,6 @@ */ #include -#include -#include -#include #include // Declares the operator. #include diff --git a/kernels/test/targets.bzl b/kernels/test/targets.bzl index a4e681a7be1..7478f190185 100644 --- a/kernels/test/targets.bzl +++ b/kernels/test/targets.bzl @@ -177,7 +177,7 @@ def define_common_targets(): _common_op_test("op__to_dim_order_copy_test", ["aten", "portable"]) _common_op_test("op__empty_dim_order_test", ["aten", "portable"]) - _common_op_test("op__clone_dim_order_test", ["portable"]) + _common_op_test("op__clone_dim_order_test", ["aten", "portable"]) _common_op_test("op_abs_test", ["aten", "portable"]) _common_op_test("op_acos_test", ["aten", "portable"]) _common_op_test("op_acosh_test", ["aten", "portable"]) diff --git a/shim_et/xplat/executorch/kernels/test/util.bzl b/shim_et/xplat/executorch/kernels/test/util.bzl index cefb4fae6f0..0c702d12a18 100644 --- a/shim_et/xplat/executorch/kernels/test/util.bzl +++ b/shim_et/xplat/executorch/kernels/test/util.bzl @@ -21,11 +21,13 @@ def op_test(name, deps = [], kernel_name = "portable", use_kernel_prefix = False if kernel_name == "aten": generated_lib_and_op_deps = [ "//executorch/kernels/aten:generated_lib", - #TODO(T187390274): consolidate all aten ops into one target - "//executorch/kernels/aten/cpu:op__to_dim_order_copy_aten", "//executorch/kernels/aten:generated_lib_headers", "//executorch/kernels/test:supported_features_aten", ] + + if "dim_order" in op_root: + generated_lib_and_op_deps.append("//executorch/kernels/aten/cpu:" + op_root + "_aten") + else: generated_lib_and_op_deps = [ "//executorch/kernels/{}/cpu:{}".format(kernel_name, op_root), From 75cb986b228abf1da0843ab05fab10ed499051ce Mon Sep 17 00:00:00 2001 From: Scott Roy <161522778+metascroy@users.noreply.github.com> Date: Wed, 17 Sep 2025 11:40:29 -0700 Subject: [PATCH 009/395] Back out "Add extra logging in CoreML (#13890)" Differential Revision: D82581442 Pull Request resolved: https://github.com/pytorch/executorch/pull/14353 --- backends/apple/coreml/runtime/delegate/ETCoreMLModelManager.mm | 3 --- 1 file changed, 3 deletions(-) diff --git a/backends/apple/coreml/runtime/delegate/ETCoreMLModelManager.mm b/backends/apple/coreml/runtime/delegate/ETCoreMLModelManager.mm index 524ceaf7e28..c27b42566dc 100644 --- a/backends/apple/coreml/runtime/delegate/ETCoreMLModelManager.mm +++ b/backends/apple/coreml/runtime/delegate/ETCoreMLModelManager.mm @@ -449,14 +449,12 @@ - (nullable NSURL *)compiledModelURLWithIdentifier:(NSString *)identifier case ModelAssetType::CompiledModel: { // The model is already compiled; no further action needed. // Return the existing model URL. - ETCoreMLLogInfo("The model in the pte file is pre-compiled. Skipping compilation."); return modelURL; } case ModelAssetType::Model: { // The model is not compiled yet. // Compile the model at the specified URL with a maximum wait time of 5 minutes. - ETCoreMLLogInfo("The model in the pte file is not pre-compiled. Compiling with a 5 min timeout."); NSURL *compiledModelURL = [ETCoreMLModelCompiler compileModelAtURL:modelURL maxWaitTimeInSeconds:(5 * 60) error:error]; @@ -492,7 +490,6 @@ - (nullable ETCoreMLAsset *)compiledModelAssetWithMetadata:(const ModelMetadata& error:error]; if (compiledModelURL) { // Move the compiled model to the asset manager to transfer ownership. - ETCoreMLLogInfo("Storing compiled asset with identifier=%@ in the asset manager.", identifier); compiledModelAsset = [self.assetManager storeAssetAtURL:compiledModelURL withIdentifier:identifier error:error]; } }]; From 7d5c886e41de5fafe0e647716b53b801d0412201 Mon Sep 17 00:00:00 2001 From: JP <46308822+zonglinpeng@users.noreply.github.com> Date: Wed, 17 Sep 2025 12:12:45 -0700 Subject: [PATCH 010/395] limit facto to 4000 bytes than numel Differential Revision: D82483935 Pull Request resolved: https://github.com/pytorch/executorch/pull/14318 --- backends/cadence/utils/facto_util.py | 60 ++++++++++++++++++++-------- 1 file changed, 44 insertions(+), 16 deletions(-) diff --git a/backends/cadence/utils/facto_util.py b/backends/cadence/utils/facto_util.py index 2ab5f731210..173f543a46e 100644 --- a/backends/cadence/utils/facto_util.py +++ b/backends/cadence/utils/facto_util.py @@ -22,26 +22,50 @@ MAX_CASES = 50 +# Global cache to store generated shapes per tensor to ensure consistency +_shape_cache: dict[str, list[int]] = {} + + def apply_tensor_contraints(op_name: str, index: int) -> list[object]: - # Constraint to limit tensor size product to < 4000 with fully randomized shapes + # Constraint to limit tensor size to < 4000 bytes with fully randomized shapes import random - # Global cache to store generated shapes per tensor to ensure consistency - _shape_cache: dict[str, list[int]] = {} + def get_dtype_bytes(dtype: torch.dtype) -> int: + """Get the number of bytes per element for a given dtype""" + dtype_bytes = { + torch.int8: 1, + torch.uint8: 1, + torch.int16: 2, + torch.uint16: 2, + torch.int32: 4, + torch.float32: 4, + torch.int64: 8, + torch.float64: 8, + torch.bool: 1, + torch.float: 4, # alias for float32 + torch.int: 4, # alias for int32 + torch.long: 8, # alias for int64 + } + return dtype_bytes.get(dtype, 4) # Default to 4 bytes if dtype not found - def generate_random_shape_with_product_limit( - rank: int, max_product: int = 3999, seed_base: int = 42 + def generate_random_shape_with_byte_limit( + rank: int, dtype: torch.dtype, max_bytes: int = 3999, seed_base: int = 42 ) -> list[int]: - """Generate a random shape with given rank ensuring product < max_product""" + """Generate a random shape with given rank ensuring total byte size < max_bytes""" random.seed(seed_base + rank) + bytes_per_element = get_dtype_bytes(dtype) + max_elements = max_bytes // bytes_per_element + # Start with all dimensions as 1 shape = [1] * rank - remaining_product = max_product - 1 # Leave room since we start with product=1 + remaining_elements = ( + max_elements - 1 + ) # Leave room since we start with product=1 # Randomly distribute the remaining capacity across dimensions for i in range(rank): - if remaining_product <= 1: + if remaining_elements <= 1: break # Calculate maximum size this dimension can have without exceeding limit @@ -51,28 +75,32 @@ def generate_random_shape_with_product_limit( current_product *= shape[j] max_size_for_dim = min( - remaining_product // current_product, 50 + remaining_elements // current_product, 50 ) # Cap at 50 if max_size_for_dim > shape[i]: # Randomly choose a size between current and max new_size = random.randint(shape[i], max_size_for_dim) shape[i] = new_size - remaining_product = max_product // (current_product * new_size) - remaining_product = max(1, remaining_product) + remaining_elements = max_elements // (current_product * new_size) + remaining_elements = max(1, remaining_elements) # Final random shuffle of the dimensions to make it more random random.shuffle(shape) return shape def random_size_constraint(deps: object, r: int, d: int) -> int: - """Generate random sizes ensuring total product < 4000""" + """Generate random sizes ensuring total byte size < 4000 bytes""" + # Use conservative approach: assume worst case is 4 bytes per element (float32/int32) + # This ensures we never exceed 4000 bytes regardless of actual dtype + worst_case_dtype = torch.float32 # 4 bytes per element + # Create a unique key for this tensor configuration - cache_key = f"{r}_{d}" + cache_key = f"{r}_{d}_conservative" if cache_key not in _shape_cache: - # Generate a new random shape for this rank - shape = generate_random_shape_with_product_limit( - r, max_product=3999, seed_base=42 + r * 10 + # Generate a new random shape for this rank using worst-case byte estimation + shape = generate_random_shape_with_byte_limit( + r, worst_case_dtype, max_bytes=3999, seed_base=42 + r * 10 + d ) _shape_cache[cache_key] = shape From eeecd564f58f6a5af4ccdfcbcc708e21b9ce4965 Mon Sep 17 00:00:00 2001 From: Gregory Comer Date: Wed, 17 Sep 2025 13:30:17 -0600 Subject: [PATCH 011/395] [Backend Tester] Add pass rate breakdown by parameterization to markdown summary (#14360) Add a table showing pass rate by test parameters. This gives a breakdown by dtype and dynamic shape on/off for model tests, making it easier to see the pass rate for f32 + static shapes. Also, run on release branches. --- .github/workflows/test-backend-arm.yml | 2 + .github/workflows/test-backend-coreml.yml | 2 + .github/workflows/test-backend-qnn.yml | 2 + .github/workflows/test-backend-vulkan.yml | 2 + .github/workflows/test-backend-xnnpack.yml | 2 + .../test/suite/generate_markdown_summary.py | 231 +++++++++++++----- backends/test/suite/reporting.py | 5 +- backends/test/suite/runner.py | 2 +- backends/test/suite/tests/test_reporting.py | 7 +- 9 files changed, 188 insertions(+), 67 deletions(-) diff --git a/.github/workflows/test-backend-arm.yml b/.github/workflows/test-backend-arm.yml index e57be2704a2..bee74fee172 100644 --- a/.github/workflows/test-backend-arm.yml +++ b/.github/workflows/test-backend-arm.yml @@ -4,6 +4,8 @@ on: schedule: - cron: 0 2 * * * push: + branches: + - release/* tags: - ciflow/nightly/* pull_request: diff --git a/.github/workflows/test-backend-coreml.yml b/.github/workflows/test-backend-coreml.yml index c6970ddff61..247f9576595 100644 --- a/.github/workflows/test-backend-coreml.yml +++ b/.github/workflows/test-backend-coreml.yml @@ -4,6 +4,8 @@ on: schedule: - cron: 0 2 * * * push: + branches: + - release/* tags: - ciflow/nightly/* pull_request: diff --git a/.github/workflows/test-backend-qnn.yml b/.github/workflows/test-backend-qnn.yml index 00933d6c74e..907c4d2dac0 100644 --- a/.github/workflows/test-backend-qnn.yml +++ b/.github/workflows/test-backend-qnn.yml @@ -4,6 +4,8 @@ on: schedule: - cron: 0 2 * * * push: + branches: + - release/* tags: - ciflow/nightly/* pull_request: diff --git a/.github/workflows/test-backend-vulkan.yml b/.github/workflows/test-backend-vulkan.yml index f04fdcdd1f1..cb2478fc825 100644 --- a/.github/workflows/test-backend-vulkan.yml +++ b/.github/workflows/test-backend-vulkan.yml @@ -4,6 +4,8 @@ on: schedule: - cron: 0 2 * * * push: + branches: + - release/* tags: - ciflow/nightly/* pull_request: diff --git a/.github/workflows/test-backend-xnnpack.yml b/.github/workflows/test-backend-xnnpack.yml index 2ae423dd99b..086c9625a38 100644 --- a/.github/workflows/test-backend-xnnpack.yml +++ b/.github/workflows/test-backend-xnnpack.yml @@ -4,6 +4,8 @@ on: schedule: - cron: 0 2 * * * push: + branches: + - release/* tags: - ciflow/nightly/* pull_request: diff --git a/backends/test/suite/generate_markdown_summary.py b/backends/test/suite/generate_markdown_summary.py index 73da8fba678..e54fc691723 100644 --- a/backends/test/suite/generate_markdown_summary.py +++ b/backends/test/suite/generate_markdown_summary.py @@ -1,44 +1,69 @@ import argparse import csv +import json import sys -# -# A standalone script to generate a Markdown representation of a test report. -# This is primarily intended to be used with GitHub actions to generate a nice -# representation of the test results when looking at the action run. -# -# Usage: python executorch/backends/test/suite/generate_markdown_summary.py -# Markdown is written to stdout. -# +from dataclasses import dataclass, field -def escape_for_markdown(text: str) -> str: +@dataclass +class ResultCounts: """ - Modify a string to properly display in a markdown table cell. + Represents aggregated result counts for each status. """ - if not text: - return text - # Replace newlines with
tags - escaped = text.replace("\n", "
") + total: int = 0 + passes: int = 0 + fails: int = 0 + skips: int = 0 + by_detail: dict[str, int] = field(default_factory=lambda: {}) - # Escape backslashes. - escaped = escaped.replace("\\", "\\\\") + def add_row(self, result_value: str, result_detail: str) -> None: + """ + Update the result counts for the specified row. + """ - # Escape pipe characters that would break table structure - escaped = escaped.replace("|", "\\|") + self.total += 1 - return escaped + if result_value == "Pass": + self.passes += 1 + elif result_value == "Fail": + self.fails += 1 + elif result_value == "Skip": + self.skips += 1 + else: + raise RuntimeError(f"Unknown result value {result_value}") + if result_detail: + if result_detail not in self.by_detail: + self.by_detail[result_detail] = 0 + + self.by_detail[result_detail] += 1 + + +@dataclass +class AggregatedSummary: + """ + Represents aggegrated summary data for the test run. + """ + + counts: ResultCounts + counts_by_params: dict[str, ResultCounts] + failed_tests: list[list[str]] + header: list[str] + + +# +# A standalone script to generate a Markdown representation of a test report. +# This is primarily intended to be used with GitHub actions to generate a nice +# representation of the test results when looking at the action run. +# +# Usage: python executorch/backends/test/suite/generate_markdown_summary.py +# Markdown is written to stdout. +# -def generate_markdown(csv_path: str, exit_code: int = 0): # noqa (C901) - # Print warning if exit code is non-zero - if exit_code != 0: - print("> [!WARNING]") - print( - f"> Exit code {exit_code} was non-zero. Test process may have crashed. Check the job logs for more information.\n" - ) +def aggregate_results(csv_path: str) -> AggregatedSummary: with open(csv_path, newline="", encoding="utf-8") as f: reader = csv.reader(f) rows = list(reader) @@ -46,24 +71,28 @@ def generate_markdown(csv_path: str, exit_code: int = 0): # noqa (C901) header = rows[0] data_rows = rows[1:] - # Find the Result and Result Detail column indices - result_column_index = None - result_detail_column_index = None - for i, col in enumerate(header): - if col.lower() == "result": - result_column_index = i - elif col.lower() == "result detail": - result_detail_column_index = i + header_indices_by_name = {n.lower(): i for (i, n) in enumerate(header)} + params_column_index = header_indices_by_name.get("params", None) + result_column_index = header_indices_by_name["result"] + result_detail_column_index = header_indices_by_name["result detail"] # Count results and prepare data - pass_count = 0 - fail_count = 0 - skip_count = 0 + counts = ResultCounts() failed_tests = [] - processed_rows = [] - result_detail_counts = {} + counts_by_param = {} for row in data_rows: + result = row[result_column_index] + result_detail = row[result_detail_column_index] + + counts.add_row(result, result_detail) + + params = row[params_column_index] if params_column_index else None + if params: + if params not in counts_by_param: + counts_by_param[params] = ResultCounts() + counts_by_param[params].add_row(result, result_detail) + # Make a copy of the row to avoid modifying the original processed_row = [escape_for_markdown(cell) for cell in row] @@ -71,54 +100,130 @@ def generate_markdown(csv_path: str, exit_code: int = 0): # noqa (C901) if result_column_index is not None and result_column_index < len(row): result_value = row[result_column_index].strip().lower() if result_value == "pass": - pass_count += 1 processed_row[result_column_index] = ( 'Pass' ) elif result_value == "fail": - fail_count += 1 processed_row[result_column_index] = ( 'Fail' ) failed_tests.append(processed_row.copy()) elif result_value == "skip": - skip_count += 1 processed_row[result_column_index] = ( 'Skip' ) - # Count result details (excluding empty ones) - if result_detail_column_index is not None and result_detail_column_index < len( - row - ): - result_detail_value = row[result_detail_column_index].strip() - if result_detail_value: # Only count non-empty result details - if result_detail_value in result_detail_counts: - result_detail_counts[result_detail_value] += 1 - else: - result_detail_counts[result_detail_value] = 1 + return AggregatedSummary( + counts=counts, + failed_tests=failed_tests, + counts_by_params=counts_by_param, + header=header, + ) + + +def escape_for_markdown(text: str) -> str: + """ + Modify a string to properly display in a markdown table cell. + """ + if not text: + return text + + # Replace newlines with
tags + escaped = text.replace("\n", "
") - processed_rows.append(processed_row) + # Escape backslashes. + escaped = escaped.replace("\\", "\\\\") + + # Escape pipe characters that would break table structure + escaped = escaped.replace("|", "\\|") + + return escaped + + +def generate_markdown(csv_path: str, exit_code: int = 0): # noqa (C901) + # Print warning if exit code is non-zero + if exit_code != 0: + print("> [!WARNING]") + print( + f"> Exit code {exit_code} was non-zero. Test process may have crashed. Check the job logs for more information.\n" + ) + + results = aggregate_results(csv_path) # Generate Summary section - total_rows = len(data_rows) print("# Summary\n") - print(f"- **Pass**: {pass_count}/{total_rows}") - print(f"- **Fail**: {fail_count}/{total_rows}") - print(f"- **Skip**: {skip_count}/{total_rows}") + total_excluding_skips = results.counts.passes + results.counts.fails + pass_fraction = results.counts.passes / total_excluding_skips + fail_fraction = results.counts.fails / total_excluding_skips + print( + f"- **Pass**: {results.counts.passes}/{total_excluding_skips} ({pass_fraction*100:.2f}%)" + ) + print( + f"- **Fail**: {results.counts.fails}/{total_excluding_skips} ({fail_fraction*100:.2f}%)" + ) + print(f"- **Skip**: {results.counts.skips}") + + if results.counts_by_params: + print("\n## Results by Parameters\n") + + # Extract all unique parameter keys from the JSON strings + all_param_keys = set() + parsed_params = {} + + for params_str in results.counts_by_params.keys(): + # Parse the JSON string (it's a string representation of a dict) + params_dict = json.loads(params_str) + parsed_params[params_str] = params_dict + all_param_keys.update(params_dict.keys()) + + if parsed_params and len(parsed_params) > 1: + # Sort parameter keys for consistent column ordering + sorted_param_keys = sorted(all_param_keys) + + # Create table header + header_cols = sorted_param_keys + ["Pass", "Fail", "Skip", "Pass %"] + print("| " + " | ".join(header_cols) + " |") + print("|" + "|".join(["---"] * len(header_cols)) + "|") + + # Create table rows + for params_str, counts in results.counts_by_params.items(): + if params_str in parsed_params: + params_dict = parsed_params[params_str] + row_values = [] + + # Add parameter values + for key in sorted_param_keys: + value = params_dict.get(key, "") + row_values.append(str(value)) + + pass_fraction = counts.passes / (counts.passes + counts.fails) + + # Add count values + row_values.extend( + [ + str(counts.passes), + str(counts.fails), + str(counts.skips), + f"{pass_fraction*100:.2f}%", + ] + ) + + print("| " + " | ".join(row_values) + " |") + + print() print("## Failure Breakdown:") - total_rows_with_result_detail = sum(result_detail_counts.values()) - for detail, count in sorted(result_detail_counts.items()): + total_rows_with_result_detail = sum(results.counts.by_detail.values()) + for detail, count in sorted(results.counts.by_detail.items()): print(f"- **{detail}**: {count}/{total_rows_with_result_detail}") # Generate Failed Tests section print("# Failed Tests\n") - if failed_tests: - escaped_header = [escape_for_markdown(col) for col in header] + if results.failed_tests: + escaped_header = [escape_for_markdown(col) for col in results.header] print("| " + " | ".join(escaped_header) + " |") - print("|" + "|".join(["---"] * len(header)) + "|") - for row in failed_tests: + print("|" + "|".join(["---"] * len(results.header)) + "|") + for row in results.failed_tests: print("| " + " | ".join(row) + " |") else: print("No failed tests.\n") diff --git a/backends/test/suite/reporting.py b/backends/test/suite/reporting.py index cdf2ce870e1..09e950ab672 100644 --- a/backends/test/suite/reporting.py +++ b/backends/test/suite/reporting.py @@ -1,4 +1,5 @@ import csv +import json from collections import Counter from dataclasses import dataclass, field @@ -343,7 +344,9 @@ def _sum_op_counts(counter: Counter | None) -> int | None: def _serialize_params(params: dict[str, Any] | None) -> str: if params is not None: - return str(dict(sorted(params.items()))) + # Convert values to strings - JSON conversion doesn't like dtypes. + str_params = {k: str(v) for k, v in params.items()} + return json.dumps(str_params) else: return "" diff --git a/backends/test/suite/runner.py b/backends/test/suite/runner.py index eeea09e0fc1..a6d7d07bce0 100644 --- a/backends/test/suite/runner.py +++ b/backends/test/suite/runner.py @@ -57,7 +57,7 @@ def _graph_has_unsupported_patterns(program: torch.export.ExportedProgram) -> bo and node.target == exir_ops.edge.aten.convolution.default ): in_rank = node.args[0].meta["val"].dim() - if in_rank != 4: + if in_rank > 4: return True return False diff --git a/backends/test/suite/tests/test_reporting.py b/backends/test/suite/tests/test_reporting.py index 58ff76cba17..e42681fc678 100644 --- a/backends/test/suite/tests/test_reporting.py +++ b/backends/test/suite/tests/test_reporting.py @@ -1,3 +1,4 @@ +import json import unittest from csv import DictReader @@ -102,14 +103,16 @@ def test_csv_report_simple(self): self.assertEqual(records[2]["Test Case"], "test2") self.assertEqual(records[2]["Flow"], "flow1") self.assertEqual(records[2]["Result"], "Pass") - self.assertEqual(records[2]["Params"], str({"dtype": torch.float32})) + self.assertEqual(records[2]["Params"], json.dumps({"dtype": "torch.float32"})) # Validate fourth record: test2, backend2, EXPORT_FAIL with use_dynamic_shapes param self.assertEqual(records[3]["Test ID"], "test2_backend2_flow1") self.assertEqual(records[3]["Test Case"], "test2") self.assertEqual(records[3]["Flow"], "flow1") self.assertEqual(records[3]["Result"], "Skip") - self.assertEqual(records[3]["Params"], str({"use_dynamic_shapes": True})) + self.assertEqual( + records[3]["Params"], json.dumps({"use_dynamic_shapes": "True"}) + ) def test_count_ops(self): """ From b1587531ebd35201a8a9d77d325941e3cf7264e3 Mon Sep 17 00:00:00 2001 From: Martin Pavella Date: Wed, 17 Sep 2025 21:31:20 +0200 Subject: [PATCH 012/395] NXP backend: Add pre-processing pass to fuse Lienar + Add (#14112) ### Summary Add a pre-processing aten dialect pass, which fuses Linear nodes with following Add nodes. This pass replaces the existing Neutron IR optimization. ### Test plan Unit tests provided. cc @robert-kalmar --- .../aten_passes/fuse_linear_and_add_pass.py | 204 ++++++ .../aten_passes/neutron_aten_pass_manager.py | 4 + backends/nxp/backend/edge_helper.py | 2 +- .../fuse_fully_connected_and_add_operators.py | 80 --- .../backend/ir/tflite_optimizer/optimizer.py | 7 - .../nxp/tests/test_linear_and_add_fusion.py | 644 ++++++++++++++++++ 6 files changed, 853 insertions(+), 88 deletions(-) create mode 100644 backends/nxp/aten_passes/fuse_linear_and_add_pass.py delete mode 100755 backends/nxp/backend/ir/tflite_optimizer/optimizations/fuse_fully_connected_and_add_operators.py create mode 100644 backends/nxp/tests/test_linear_and_add_fusion.py diff --git a/backends/nxp/aten_passes/fuse_linear_and_add_pass.py b/backends/nxp/aten_passes/fuse_linear_and_add_pass.py new file mode 100644 index 00000000000..20a32c1bcac --- /dev/null +++ b/backends/nxp/aten_passes/fuse_linear_and_add_pass.py @@ -0,0 +1,204 @@ +# Copyright 2025 NXP +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +from typing import Optional + +import torch + +from executorch.backends.nxp.backend.edge_helper import ( + try_get_tensor_constant_from_node, +) +from torch.ao.quantization.fx.utils import get_new_attr_name_with_prefix +from torch.export.unflatten import _assign_attr, _AttrKind +from torch.fx import GraphModule, Node +from torch.fx.passes.infra.pass_base import PassBase, PassResult + + +class FuseLinearAndAddPass(PassBase): + """Replace a sequence of `linear` and `add` nodes in the following pattern by a single `linear` node when possible. + │ + ┌──────▼──────┐ + │ aten.linear │ + └──────┬──────┘ │ + │ replace with ┌──────▼──────┐ + ┌─────▼────┐ ───────────► │ aten.linear │ + │ aten.add │ └──────┬──────┘ + └─────┬────┘ + ▼ + """ + + def _fuse_with_existing_bias( + self, + linear_node: Node, + other_add_input: Node, + graph_module: GraphModule, + alpha: float, + ) -> bool: + """Fuse the `linear` and `add` nodes provided the `linear` already has a bias. + The fusion can only be done if both the "biases" have static data, which can be added together to get a + single bias. + + :return: True, if the nodes were successfully merged. False, otherwise. + """ + + linear_bias = linear_node.args[2] + if other_add_input.meta["val"].shape != linear_bias.meta["val"].shape: + # The biases cannot be added together due to their different shapes. + # Shape broadcasting is not applicable, as the only allowed `linear` bias shape is 1D ([output_features]). + return False + + bias_data = [ + try_get_tensor_constant_from_node(graph_module, linear_bias), + try_get_tensor_constant_from_node(graph_module, other_add_input), + ] + if any(data is None for data in bias_data): + return ( + False # Fusion is not possible because at least 1 bias is not static. + ) + + # Add the bias data together, to obtain the combined bias. Take the `alpha` attribute into account. + combined_bias = bias_data[0] + bias_data[1] * alpha + + # Create a new node containing the combined bias data. + combined_bias_name = get_new_attr_name_with_prefix( + linear_bias.name + "combined" + )(graph_module) + _assign_attr( + torch.nn.Parameter(combined_bias), + graph_module, + combined_bias_name, + _AttrKind.PARAMETER, + ) + with graph_module.graph.inserting_before(linear_node): + new_bias_node = graph_module.graph.get_attr(combined_bias_name) + + # Use the combined bias as the new bias for the `Linear`. + linear_node.args = ( + linear_node.args[:2] + (new_bias_node,) + linear_node.args[3:] + ) + return True + + def _fuse_without_existing_bias( + self, + linear_node: Node, + other_add_input: Node, + graph_module: GraphModule, + alpha: float, + ) -> bool: + """Fuse the `linear` and `add` provided the `linear` does not already have a bias. + + :return: True, if the nodes were successfully merged. False, otherwise. + """ + + # The weights have shape (out_features, in_features). + output_features = linear_node.args[1].meta["val"].shape[0] + new_bias_shape = other_add_input.meta["val"].shape + if list(new_bias_shape) != [output_features]: + return False # The `Add` is adding a tensor with shape that is not supported for the `Linear` bias. + + bias_data = try_get_tensor_constant_from_node(graph_module, other_add_input) + + if bias_data is None: + return False # Neutron doesn't support a dynamic bias, so fusion would be counterproductive. + + # It is possible that the `linear` comes before the `other_add_input` in the graph, so it cannot use it as an + # input directly. If the nodes are ordered as [linear, ..., other_add_input, ... add] (which is valid), using + # `other_add_input` directly as an input to `Linear` would not follow topological order. + # Rearranging the nodes is not trivial, as the graph could be complex (ultimately, the + # `other_add_input` could even originate from the `Linear` node...). + # Since the `other_add_input` has static data, we can create a new node with the data just before the `Linear` + # to ensure topological order. + # Regardless of the node ordering, the `add.Tensor` attribute `alpha` multiplies the second `add` input. If + # `alpha != 1`, we would have to insert a `mul` operator if we wanted to keep the original parameter node. + # Therefore, it is better to create a new static parameter node for the multiplied data in this case as well. + nodes = list(graph_module.graph.nodes) + if nodes.index(linear_node) < nodes.index(other_add_input) or alpha != 1.0: + # Problematic order, or required multiplication. + + # Handle the `aten.add.Tensor` attribute `alpha`. + bias_data *= alpha + + # Create a unique name. + new_bias_name = get_new_attr_name_with_prefix(linear_node.name + "_bias")( + graph_module + ) + _assign_attr(bias_data, graph_module, new_bias_name, _AttrKind.PARAMETER) + with graph_module.graph.inserting_before(linear_node): + new_bias_node = graph_module.graph.get_attr(new_bias_name) + + # Use the added tensor as the new `Linear` bias. + linear_node.args = ( + linear_node.args[:2] + (new_bias_node,) + linear_node.args[2:] + ) + return True + + else: + # Use the `other_add_input` directly as the new bias. + linear_node.args = ( + linear_node.args[:2] + (other_add_input,) + linear_node.args[2:] + ) + return True + + def call(self, graph_module: GraphModule) -> Optional[PassResult]: + def _is_applicable_linear_node(node_: Node): + is_linear = ( + node_.op == "call_function" + and node_.target == torch.ops.aten.linear.default + ) + has_single_user = len(node.users) == 1 + + return is_linear and has_single_user + + def _is_add(node_: Node): + return ( + node_.op == "call_function" + and node_.target == torch.ops.aten.add.Tensor + ) + + made_changes = False + for node in graph_module.graph.nodes: + if not _is_applicable_linear_node( + linear_node := node + ): # Also ensures a single user. + continue + + if not _is_add(add_node := list(linear_node.users.keys())[0]): + continue # Not the `Linear` -> `Add` case. + + if len(add_node.args) != 2: + continue # Unexpected case. + + # The `aten.add.Tensor` carries out the expression `out = input[0] + alpha × input[1]`. + # https://docs.pytorch.org/docs/stable/generated/torch.add.html + alpha = add_node.kwargs.get("alpha", 1.0) + if add_node.args[0] == linear_node: + other_add_input = add_node.args[1] + + else: + # The fusion is not implemented. The `other_add_input` would have to be divided by `alpha` before the + # fusion, and a `mul` operator would have to be added after the `linear` to multiply its output by + # `alpha`. + continue + + if len(linear_node.args) > 2: + if not self._fuse_with_existing_bias( + linear_node, other_add_input, graph_module, alpha + ): + continue # The nodes could not be fused. + + else: + # The `Linear` doesn't have a bias yet. + if not self._fuse_without_existing_bias( + linear_node, other_add_input, graph_module, alpha + ): + continue # The nodes could not be fused. + + # Use the output of the `Linear` instead of the `Add`, and remove the now unused `Add` node. + add_node.replace_all_uses_with(linear_node) + graph_module.graph.erase_node(add_node) + + made_changes = True + + return PassResult(graph_module, made_changes) diff --git a/backends/nxp/aten_passes/neutron_aten_pass_manager.py b/backends/nxp/aten_passes/neutron_aten_pass_manager.py index f6e3c374b19..407ebf5da61 100644 --- a/backends/nxp/aten_passes/neutron_aten_pass_manager.py +++ b/backends/nxp/aten_passes/neutron_aten_pass_manager.py @@ -13,6 +13,9 @@ from executorch.backends.nxp.aten_passes.fuse_batch_norm_with_linear_pass import ( FuseBatchNormWithLinearPass, ) +from executorch.backends.nxp.aten_passes.fuse_linear_and_add_pass import ( + FuseLinearAndAddPass, +) from executorch.backends.nxp.aten_passes.remove_nodes_with_known_outputs import ( RemoveNodesWithKnownOutputs, ) @@ -38,6 +41,7 @@ def __init__(self, passes: list[PassType] = None): SplitGroupConvolution(), SplitGRUBasedOnNumLayers(), RemoveNodesWithKnownOutputs(), + FuseLinearAndAddPass(), ] super().__init__(passes) diff --git a/backends/nxp/backend/edge_helper.py b/backends/nxp/backend/edge_helper.py index 061295ead79..60b367c0f39 100644 --- a/backends/nxp/backend/edge_helper.py +++ b/backends/nxp/backend/edge_helper.py @@ -1,4 +1,4 @@ -# Copyright 2024 NXP +# Copyright 2024-2025 NXP # # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. diff --git a/backends/nxp/backend/ir/tflite_optimizer/optimizations/fuse_fully_connected_and_add_operators.py b/backends/nxp/backend/ir/tflite_optimizer/optimizations/fuse_fully_connected_and_add_operators.py deleted file mode 100755 index b6fd5849551..00000000000 --- a/backends/nxp/backend/ir/tflite_optimizer/optimizations/fuse_fully_connected_and_add_operators.py +++ /dev/null @@ -1,80 +0,0 @@ -# Copyright 2024 NXP -# -# This source code is licensed under the BSD-style license found in the -# LICENSE file in the root directory of this source tree. - -from executorch.backends.nxp.backend.ir.lib.tflite.TensorType import TensorType -from executorch.backends.nxp.backend.ir.tflite_optimizer.operator_rules import ( - NoFusedActivationFunction, -) -from executorch.backends.nxp.backend.ir.tflite_optimizer.optimizations.base_optimization import ( - BaseOptimization, -) -from executorch.backends.nxp.backend.ir.tflite_optimizer.pattern_matcher import ( - OneOf, - Op, - PatternMatcher, -) -from executorch.backends.nxp.backend.ir.tflite_optimizer.tensor_rules import ( - RuleAnd, - RuleIf, - RuleOr, - TensorDimensionsMatch, - TensorHasDimensionOfSize, - TensorHasOneConsumer, - TensorHasRank, - TensorHasType, - TensorIsQuantized, -) - - -class FuseFullyConnectedAndAddOperators(BaseOptimization): - - def __call__(self) -> bool: - """ - FullyConnected -> Add sequence can handle more complicated shapes than just FullyConnected with bias - (due to shape broadcasting). - The bias can have shape [N] or [1, N], where N is the first dimension of the FC weights tensor. - It could also have shape [1, ..., 1, N], but then the TFLite FullyConnected removes the leading ones, - even if 'keep_num_dims' is True. In ONNX, the output tensor has the leading ones, - In this case, a Reshape would have to be added, so we do not perform the fusion. - - # https://github.com/tensorflow/tensorflow/blob/v2.15.0/tensorflow/lite/kernels/fully_connected.cc#L398 - """ - matcher = PatternMatcher( - self._builder, - [ - # Require exactly 2 inputs. - Op( - ["FullyConnected"], ["x", "w"], ["y"], [NoFusedActivationFunction()] - ), - OneOf([Op(["Add"], ["y", "b"]), Op(["Add"], ["b", "y"])]), - ], - [ - TensorHasOneConsumer("y"), - TensorHasRank("w", 2), - RuleOr( - TensorHasRank("b", 1), - RuleAnd(TensorHasRank("b", 2), TensorHasDimensionOfSize("b", 0, 1)), - ), - TensorDimensionsMatch("w", 0, "b", -1), - RuleIf(TensorIsQuantized("x"), TensorHasType("b", TensorType.INT32)), - ], - ) - - to_remove = [] - for (fc, add), tensor_map, _, _ in matcher.match_patterns(): - b = tensor_map["b"] - fc.tmp_inputs.append(b) - - # Remove the 'Add' operator. - fc.tmp_outputs[0] = add.tmp_outputs[0] - fc.builtin_options.fused_activation_function = ( - add.builtin_options.fused_activation_function - ) - to_remove.append(add) - - for op in to_remove: - self._builder.get_operators().remove(op) - - return len(to_remove) != 0 diff --git a/backends/nxp/backend/ir/tflite_optimizer/optimizer.py b/backends/nxp/backend/ir/tflite_optimizer/optimizer.py index d4a097ca76d..0d429fa9818 100755 --- a/backends/nxp/backend/ir/tflite_optimizer/optimizer.py +++ b/backends/nxp/backend/ir/tflite_optimizer/optimizer.py @@ -17,9 +17,6 @@ from executorch.backends.nxp.backend.ir.tflite_optimizer.optimizations.fuse_activation_functions import ( FuseActivationFunctions, ) -from executorch.backends.nxp.backend.ir.tflite_optimizer.optimizations.fuse_fully_connected_and_add_operators import ( - FuseFullyConnectedAndAddOperators, -) from executorch.backends.nxp.backend.ir.tflite_optimizer.optimizations.move_relu_before_concat import ( MoveActivationBeforeConcatenation, ) @@ -34,7 +31,6 @@ class Optimization(Enum): FUSE_ACTIVATION_FUNCTIONS = 1 - FUSE_FULLY_CONNECTED_AND_ADD = 2 FUSE_TRANSPOSE_OPERATORS = 5 REMOVE_IDENTITY_TRANSPOSE_OPERATORS = 6 @@ -75,9 +71,6 @@ def __init__( Optimization.FUSE_ACTIVATION_FUNCTIONS: FuseActivationFunctions( builder, conversion_config ), - Optimization.FUSE_FULLY_CONNECTED_AND_ADD: FuseFullyConnectedAndAddOperators( - builder, conversion_config - ), Optimization.FUSE_TRANSPOSE_OPERATORS: FuseTransposeOperators( builder, conversion_config ), diff --git a/backends/nxp/tests/test_linear_and_add_fusion.py b/backends/nxp/tests/test_linear_and_add_fusion.py new file mode 100644 index 00000000000..16d3c4140a2 --- /dev/null +++ b/backends/nxp/tests/test_linear_and_add_fusion.py @@ -0,0 +1,644 @@ +# Copyright 2025 NXP +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +import unittest +from copy import deepcopy + +import numpy as np +import torch + +from executorch.backends.nxp.aten_passes.fuse_linear_and_add_pass import ( + FuseLinearAndAddPass, +) +from executorch.backends.nxp.aten_passes.neutron_aten_pass_manager import ( + NeutronAtenPassManager, +) +from executorch.backends.nxp.aten_passes.remove_nodes_with_known_outputs import ( + RemoveNodesWithKnownOutputs, +) +from executorch.backends.nxp.tests.executors import graph_contains_any_of_ops +from parameterized import parameterized + + +class LinearAddModule(torch.nn.Module): + def __init__( + self, + fc_in_features: int, + fc_out_features: int, + bias: bool, + artificial_bias_shape: list[int], + alpha=1.0, + ): + super().__init__() + self.fc_in_features = fc_in_features + self.fc_out_features = fc_out_features + self.bias = bias + self.artificial_bias_shape = artificial_bias_shape + self.alpha = alpha + self.linear = torch.nn.Linear(fc_in_features, fc_out_features, bias=bias) + self.eval() + + def forward(self, x): + artificial_bias = torch.ones(self.artificial_bias_shape, dtype=torch.float32) + x = self.linear(x) + return torch.add(x, artificial_bias, alpha=self.alpha) + + +class LinearAddModuleReverseNodeOrder(torch.nn.Module): + """The `ones` added by the `add` are only generated after the `linear` node.""" + + def __init__( + self, + fc_in_features: int, + fc_out_features: int, + bias: bool, + artificial_bias_shape: list[int], + ): + super().__init__() + self.fc_in_features = fc_in_features + self.fc_out_features = fc_out_features + self.bias = bias + self.artificial_bias_shape = artificial_bias_shape + self.linear = torch.nn.Linear(fc_in_features, fc_out_features, bias=bias) + self.eval() + + def forward(self, x): + # The `ones` are generated after the `linear` call. + x = self.linear(x) + artificial_bias = torch.ones(self.artificial_bias_shape, dtype=torch.float32) + return torch.add(x, artificial_bias) + + +class LinearAddModuleReverseInputOrder(torch.nn.Module): + """The `add` has the output of the `linear` as its second input (which is the input multiplied by `alpha`).""" + + def __init__( + self, + fc_in_features: int, + fc_out_features: int, + bias: bool, + artificial_bias_shape: list[int], + alpha=1.0, + ): + super().__init__() + self.fc_in_features = fc_in_features + self.fc_out_features = fc_out_features + self.bias = bias + self.artificial_bias_shape = artificial_bias_shape + self.alpha = alpha + self.linear = torch.nn.Linear(fc_in_features, fc_out_features, bias=bias) + self.eval() + + def forward(self, x): + artificial_bias = torch.ones(self.artificial_bias_shape, dtype=torch.float32) + x = self.linear(x) + return torch.add(artificial_bias, x, alpha=self.alpha) # Reversed input order. + + +class TestLinearAndAddFusing(unittest.TestCase): + __test__ = False # Prevent interfering with PyTest tests. + + @classmethod + def setUpClass(cls): + torch.manual_seed(23) + np.random.seed(42) + + @parameterized.expand( + [ + ["2D", [4, 6]], + ["4D", [4, 6, 8, 10]], + ] + ) + def test_linear_add_fusing__static__no_bias__valid_shape( + self, _, input_shape: list[int] + ): + example_input = (torch.ones(input_shape),) + + module = LinearAddModule(input_shape[-1], 5, False, [5]) + program = torch.export.export(module, example_input, strict=True) + original_module = program.module() + + modified_module = NeutronAtenPassManager( + [ + RemoveNodesWithKnownOutputs(), # Make the added tensor static. + FuseLinearAndAddPass(), + ] + )(deepcopy(program.module())).graph_module + + # Make sure the module wasn't broken. + original_nodes = list(original_module.graph.nodes) + modified_nodes = list(modified_module.graph.nodes) + + assert len(original_nodes) == 6 + assert original_nodes[3].target == torch.ops.aten.linear.default + assert original_nodes[4].target == torch.ops.aten.add.Tensor + + # The `add` has been removed. + assert len(modified_nodes) == 5 + assert modified_nodes[3].target == torch.ops.aten.linear.default + assert len(modified_nodes[3].args) == 3 + assert "ones" in modified_nodes[3].args[2].name + assert not graph_contains_any_of_ops( + modified_module.graph, [torch.ops.aten.add.Tensor] + ) + + # Verify that the behavior has not changed. + input_data = torch.randn(input_shape, dtype=torch.float32) + out1 = original_module(input_data).detach().numpy() + out2 = modified_module(input_data).detach().numpy() + assert np.allclose(out1, out2) + + @parameterized.expand( + [ + ["2D", [8, 10]], + ] + ) + def test_linear_add_fusing__static__no_bias__invalid_shape( + self, _, input_shape: list[int] + ): + example_input = (torch.ones(input_shape),) + + module = LinearAddModule( + input_shape[-1], 5, False, [8, 5] # Unsupported `linear` bias shape. + ) + program = torch.export.export(module, example_input, strict=True) + original_module = program.module() + + modified_module = NeutronAtenPassManager( + [ + RemoveNodesWithKnownOutputs(), # Make the added tensor static. + FuseLinearAndAddPass(), + ] + )(deepcopy(program.module())).graph_module + + # Make sure the module wasn't broken. + original_nodes = list(original_module.graph.nodes) + modified_nodes = list(modified_module.graph.nodes) + + assert len(original_nodes) == 6 + assert original_nodes[3].target == torch.ops.aten.linear.default + assert len(original_nodes[3].args) == 2 + assert original_nodes[4].target == torch.ops.aten.add.Tensor + + # Nothing changed. + assert len(modified_nodes) == 6 + assert modified_nodes[3].target == torch.ops.aten.linear.default + assert modified_nodes[4].target == torch.ops.aten.add.Tensor + + # Verify that the behavior has not changed. + input_data = torch.randn(input_shape, dtype=torch.float32) + out1 = original_module(input_data).detach().numpy() + out2 = modified_module(input_data).detach().numpy() + assert np.allclose(out1, out2) + + @parameterized.expand( + [ + ["2D", [4, 6]], + ["4D", [2, 3, 4, 5]], + ] + ) + def test_linear_add_fusing__static__bias__valid_shape( + self, _, input_shape: list[int] + ): + example_input = (torch.ones(input_shape),) + + module = LinearAddModule(input_shape[-1], 5, True, [5]) + program = torch.export.export(module, example_input, strict=True) + original_module = program.module() + + modified_module = NeutronAtenPassManager( + [ + RemoveNodesWithKnownOutputs(), # Make the added tensor static. + FuseLinearAndAddPass(), + ] + )(deepcopy(program.module())).graph_module + + # Make sure the module wasn't broken. + original_nodes = list(original_module.graph.nodes) + modified_nodes = list(modified_module.graph.nodes) + + assert len(original_nodes) == 7 + assert original_nodes[3].target == torch.ops.aten.ones.default + assert original_nodes[4].target == torch.ops.aten.linear.default + assert len(original_nodes[4].args) == 3 + assert original_nodes[5].target == torch.ops.aten.add.Tensor + + # make sure the `add` and the `ones` were removed. + assert len(modified_nodes) == 5 + assert not graph_contains_any_of_ops( + modified_module.graph, [torch.ops.aten.ones.default] + ) + assert modified_nodes[3].target == torch.ops.aten.linear.default + assert len(modified_nodes[3].args) == 3 + assert "combined" in modified_nodes[3].args[2].name + assert not graph_contains_any_of_ops( + modified_module.graph, [torch.ops.aten.add.Tensor] + ) + + # Verify that the behavior has not changed. + input_data = torch.randn(input_shape, dtype=torch.float32) + out1 = original_module(input_data).detach().numpy() + out2 = modified_module(input_data).detach().numpy() + assert np.allclose(out1, out2) + + def test_linear_add_fusing__static__no_bias__reverse_order(self): + input_shape = [4, 8] + example_input = (torch.ones(input_shape),) + + # Use a module where the `bias` is generated after the `linear` node, which prevents the change. + module = LinearAddModuleReverseNodeOrder(input_shape[-1], 5, False, [5]) + program = torch.export.export(module, example_input, strict=True) + original_module = program.module() + + modified_module = NeutronAtenPassManager( + [ + RemoveNodesWithKnownOutputs(), # Make the added tensor static. + FuseLinearAndAddPass(), + ] + )(deepcopy(program.module())).graph_module + + # Make sure the module wasn't broken. + original_nodes = list(original_module.graph.nodes) + modified_nodes = list(modified_module.graph.nodes) + + assert len(original_nodes) == 6 + assert original_nodes[2].target == torch.ops.aten.linear.default + assert len(original_nodes[2].args) == 2 + assert ( + original_nodes[3].target == torch.ops.aten.ones.default + ) # `ones` after `linear`. + assert original_nodes[4].target == torch.ops.aten.add.Tensor + + # The `add` has been removed. + assert len(modified_nodes) == 5 + assert modified_nodes[3].target == torch.ops.aten.linear.default + assert len(modified_nodes[3].args) == 3 + assert not graph_contains_any_of_ops( + modified_module.graph, [torch.ops.aten.add.Tensor] + ) + + # Verify that the behavior has not changed. + input_data = torch.randn(input_shape, dtype=torch.float32) + out1 = original_module(input_data).detach().numpy() + out2 = modified_module(input_data).detach().numpy() + assert np.allclose(out1, out2) + + def test_linear_add_fusing__static__bias__reverse_order(self): + input_shape = [4, 8] + example_input = (torch.ones(input_shape),) + + # Use a module where the `bias` is generated after the `linear` node, which prevents the change. + module = LinearAddModuleReverseNodeOrder(input_shape[-1], 5, True, [5]) + program = torch.export.export(module, example_input, strict=True) + original_module = program.module() + + modified_module = NeutronAtenPassManager( + [ + RemoveNodesWithKnownOutputs(), # Make the added tensor static. + FuseLinearAndAddPass(), + ] + )(deepcopy(program.module())).graph_module + + # Make sure the module wasn't broken. + original_nodes = list(original_module.graph.nodes) + modified_nodes = list(modified_module.graph.nodes) + + assert len(original_nodes) == 7 + assert original_nodes[3].target == torch.ops.aten.linear.default + assert len(original_nodes[3].args) == 3 + assert ( + original_nodes[4].target == torch.ops.aten.ones.default + ) # `ones` after `linear`. + assert original_nodes[5].target == torch.ops.aten.add.Tensor + + # The `add` and `ones` have been removed. + assert len(modified_nodes) == 5 + assert not graph_contains_any_of_ops( + modified_module.graph, [torch.ops.aten.ones.default] + ) + assert modified_nodes[3].target == torch.ops.aten.linear.default + assert len(modified_nodes[3].args) == 3 + assert not graph_contains_any_of_ops( + modified_module.graph, [torch.ops.aten.add.Tensor] + ) + + # Verify that the behavior has not changed. + input_data = torch.randn(input_shape, dtype=torch.float32) + out1 = original_module(input_data).detach().numpy() + out2 = modified_module(input_data).detach().numpy() + assert np.allclose(out1, out2) + + def test_linear_add_fusing__static__alpha__no_bias(self): + alpha = 2.34 + input_shape = [4, 8] + example_input = (torch.ones(input_shape),) + + module = LinearAddModule(input_shape[-1], 5, False, [5], alpha=alpha) + program = torch.export.export(module, example_input, strict=True) + original_module = program.module() + + modified_module = NeutronAtenPassManager( + [ + RemoveNodesWithKnownOutputs(), # Make the added tensor static. + FuseLinearAndAddPass(), + ] + )(deepcopy(program.module())).graph_module + + # Make sure the module wasn't broken. + original_nodes = list(original_module.graph.nodes) + modified_nodes = list(modified_module.graph.nodes) + + assert len(original_nodes) == 6 + assert original_nodes[2].target == torch.ops.aten.ones.default + assert original_nodes[3].target == torch.ops.aten.linear.default + assert len(original_nodes[3].args) == 2 + assert original_nodes[4].target == torch.ops.aten.add.Tensor + assert original_nodes[4].kwargs["alpha"] == alpha + + # The `add` has been removed. + assert len(modified_nodes) == 5 + assert modified_nodes[3].target == torch.ops.aten.linear.default + assert len(modified_nodes[3].args) == 3 + assert not graph_contains_any_of_ops( + modified_module.graph, [torch.ops.aten.add.Tensor] + ) + + # Verify that the behavior has not changed. + input_data = torch.randn(input_shape, dtype=torch.float32) + out1 = original_module(input_data).detach().numpy() + out2 = modified_module(input_data).detach().numpy() + assert np.allclose(out1, out2) + + def test_linear_add_fusing__static__alpha__bias(self): + alpha = 2.34 + input_shape = [4, 8] + example_input = (torch.ones(input_shape),) + + module = LinearAddModule(input_shape[-1], 5, True, [5], alpha=alpha) + program = torch.export.export(module, example_input, strict=True) + original_module = program.module() + + modified_module = NeutronAtenPassManager( + [ + RemoveNodesWithKnownOutputs(), # Make the added tensor static. + FuseLinearAndAddPass(), + ] + )(deepcopy(program.module())).graph_module + + # Make sure the module wasn't broken. + original_nodes = list(original_module.graph.nodes) + modified_nodes = list(modified_module.graph.nodes) + + assert len(original_nodes) == 7 + assert original_nodes[3].target == torch.ops.aten.ones.default + assert original_nodes[4].target == torch.ops.aten.linear.default + assert len(original_nodes[4].args) == 3 + assert original_nodes[5].target == torch.ops.aten.add.Tensor + assert original_nodes[5].kwargs["alpha"] == alpha + + # The `add` has been removed. + assert len(modified_nodes) == 5 + assert modified_nodes[3].target == torch.ops.aten.linear.default + assert len(modified_nodes[3].args) == 3 + assert not graph_contains_any_of_ops( + modified_module.graph, [torch.ops.aten.add.Tensor] + ) + + # Verify that the behavior has not changed. + input_data = torch.randn(input_shape, dtype=torch.float32) + out1 = original_module(input_data).detach().numpy() + out2 = modified_module(input_data).detach().numpy() + assert np.allclose(out1, out2) + + def test_linear_add_fusing__static__alpha__reversed_add_inputs(self): + alpha = 2.34 + input_shape = [4, 8] + example_input = (torch.ones(input_shape),) + + module = LinearAddModuleReverseInputOrder( + input_shape[-1], 5, True, [5], alpha=alpha + ) + program = torch.export.export(module, example_input, strict=True) + original_module = program.module() + + modified_module = NeutronAtenPassManager( + [ + RemoveNodesWithKnownOutputs(), # Make the added tensor static. + FuseLinearAndAddPass(), + ] + )(deepcopy(program.module())).graph_module + + # Make sure the module wasn't broken. + original_nodes = list(original_module.graph.nodes) + modified_nodes = list(modified_module.graph.nodes) + + assert len(original_nodes) == 7 + assert original_nodes[3].target == torch.ops.aten.ones.default + assert original_nodes[4].target == torch.ops.aten.linear.default + assert len(original_nodes[4].args) == 3 + assert original_nodes[5].target == torch.ops.aten.add.Tensor + assert ( + original_nodes[5].args[1] == original_nodes[4] + ) # `linear` is the second input. + assert original_nodes[5].kwargs["alpha"] == alpha + + # Nothing changed (except the `ones` was replaced by static data). + assert len(modified_nodes) == 7 + assert modified_nodes[4].target == torch.ops.aten.linear.default + assert len(modified_nodes[4].args) == 3 + assert modified_nodes[5].target == torch.ops.aten.add.Tensor + assert ( + modified_nodes[5].args[1] == modified_nodes[4] + ) # `linear` is the second input. + assert modified_nodes[5].kwargs["alpha"] == alpha + + # Verify that the behavior has not changed. + input_data = torch.randn(input_shape, dtype=torch.float32) + out1 = original_module(input_data).detach().numpy() + out2 = modified_module(input_data).detach().numpy() + assert np.allclose(out1, out2) + + @parameterized.expand( + [ + ["2D", [4, 6]], + ] + ) + def test_linear_add_fusing__dynamic__no_bias__valid_shape( + self, _, input_shape: list[int] + ): + example_input = (torch.ones(input_shape),) + + module = LinearAddModule(input_shape[-1], 5, False, [5]) + program = torch.export.export(module, example_input, strict=True) + original_module = program.module() + + modified_module = NeutronAtenPassManager([FuseLinearAndAddPass()])( + deepcopy(program.module()) + ).graph_module + + # Make sure the module wasn't broken. + original_nodes = list(original_module.graph.nodes) + modified_nodes = list(modified_module.graph.nodes) + + assert len(original_nodes) == 6 + assert original_nodes[3].target == torch.ops.aten.linear.default + assert original_nodes[4].target == torch.ops.aten.add.Tensor + + # Nothing changed. + assert len(modified_nodes) == 6 + assert modified_nodes[3].target == torch.ops.aten.linear.default + assert modified_nodes[4].target == torch.ops.aten.add.Tensor + + # Verify that the behavior has not changed. + input_data = torch.randn(input_shape, dtype=torch.float32) + out1 = original_module(input_data).detach().numpy() + out2 = modified_module(input_data).detach().numpy() + assert np.allclose(out1, out2) + + @parameterized.expand( + [ + ["2D", [8, 10]], + ] + ) + def test_linear_add_fusing__dynamic__no_bias__invalid_shape( + self, _, input_shape: list[int] + ): + example_input = (torch.ones(input_shape),) + + module = LinearAddModule( + input_shape[-1], 5, False, [8, 5] # Unsupported `linear` bias shape. + ) + program = torch.export.export(module, example_input, strict=True) + original_module = program.module() + + modified_module = NeutronAtenPassManager([FuseLinearAndAddPass()])( + deepcopy(program.module()) + ).graph_module + + # Make sure the module wasn't broken. + original_nodes = list(original_module.graph.nodes) + modified_nodes = list(modified_module.graph.nodes) + + assert len(original_nodes) == 6 + assert original_nodes[3].target == torch.ops.aten.linear.default + assert original_nodes[4].target == torch.ops.aten.add.Tensor + + # Nothing changed. + assert len(modified_nodes) == 6 + assert modified_nodes[3].target == torch.ops.aten.linear.default + assert modified_nodes[4].target == torch.ops.aten.add.Tensor + + # Verify that the behavior has not changed. + input_data = torch.randn(input_shape, dtype=torch.float32) + out1 = original_module(input_data).detach().numpy() + out2 = modified_module(input_data).detach().numpy() + assert np.allclose(out1, out2) + + @parameterized.expand( + [ + ["2D", [4, 6]], + ] + ) + def test_linear_add_fusing__dynamic__bias__valid_shape( + self, _, input_shape: list[int] + ): + example_input = (torch.ones(input_shape),) + + module = LinearAddModule(input_shape[-1], 5, True, [5]) + program = torch.export.export(module, example_input, strict=True) + original_module = program.module() + + modified_module = NeutronAtenPassManager([FuseLinearAndAddPass()])( + deepcopy(program.module()) + ).graph_module + + # Make sure the module wasn't broken. + original_nodes = list(original_module.graph.nodes) + modified_nodes = list(modified_module.graph.nodes) + + assert len(original_nodes) == 7 + assert original_nodes[3].target == torch.ops.aten.ones.default + assert original_nodes[4].target == torch.ops.aten.linear.default + assert original_nodes[5].target == torch.ops.aten.add.Tensor + + # Nothing has changed, as the second bias is dynamic, so it cannot be added together with the first bias. + assert len(modified_nodes) == 7 + assert modified_nodes[3].target == torch.ops.aten.ones.default + assert modified_nodes[4].target == torch.ops.aten.linear.default + assert modified_nodes[5].target == torch.ops.aten.add.Tensor + + # Verify that the behavior has not changed. + input_data = torch.randn(input_shape, dtype=torch.float32) + out1 = original_module(input_data).detach().numpy() + out2 = modified_module(input_data).detach().numpy() + assert np.allclose(out1, out2) + + def test_linear_add_fusing__dynamic__reverse_order(self): + input_shape = [4, 8] + example_input = (torch.ones(input_shape),) + + # Use a module where the `bias` is generated after the `linear` node, which prevents the change. + module = LinearAddModuleReverseNodeOrder(input_shape[-1], 5, False, [5]) + program = torch.export.export(module, example_input, strict=True) + original_module = program.module() + + modified_module = NeutronAtenPassManager([FuseLinearAndAddPass()])( + deepcopy(program.module()) + ).graph_module + + # Make sure the module wasn't broken. + original_nodes = list(original_module.graph.nodes) + modified_nodes = list(modified_module.graph.nodes) + + assert len(original_nodes) == 6 + assert original_nodes[2].target == torch.ops.aten.linear.default + assert original_nodes[3].target == torch.ops.aten.ones.default + assert original_nodes[4].target == torch.ops.aten.add.Tensor + + # Nothing has changed. + assert len(modified_nodes) == 6 + assert modified_nodes[2].target == torch.ops.aten.linear.default + assert modified_nodes[3].target == torch.ops.aten.ones.default + assert modified_nodes[4].target == torch.ops.aten.add.Tensor + + # Verify that the behavior has not changed. + input_data = torch.randn(input_shape, dtype=torch.float32) + out1 = original_module(input_data).detach().numpy() + out2 = modified_module(input_data).detach().numpy() + assert np.allclose(out1, out2) + + def test_linear_add_fusing__dynamic__alpha(self): + alpha = 2.34 + input_shape = [4, 8] + example_input = (torch.ones(input_shape),) + + module = LinearAddModule(input_shape[-1], 5, False, [5], alpha=alpha) + program = torch.export.export(module, example_input, strict=True) + original_module = program.module() + + modified_module = NeutronAtenPassManager([FuseLinearAndAddPass()])( + deepcopy(program.module()) + ).graph_module + + # Make sure the module wasn't broken. + original_nodes = list(original_module.graph.nodes) + modified_nodes = list(modified_module.graph.nodes) + + assert len(original_nodes) == 6 + assert original_nodes[2].target == torch.ops.aten.ones.default + assert original_nodes[3].target == torch.ops.aten.linear.default + assert original_nodes[4].target == torch.ops.aten.add.Tensor + + # Nothing has changed. + assert len(modified_nodes) == 6 + assert modified_nodes[2].target == torch.ops.aten.ones.default + assert modified_nodes[3].target == torch.ops.aten.linear.default + assert modified_nodes[4].target == torch.ops.aten.add.Tensor + + # Verify that the behavior has not changed. + input_data = torch.randn(input_shape, dtype=torch.float32) + out1 = original_module(input_data).detach().numpy() + out2 = modified_module(input_data).detach().numpy() + assert np.allclose(out1, out2) From 02b39bf943c6a4a6c57fad4457f6e43ac53d3f0a Mon Sep 17 00:00:00 2001 From: Hansong Zhang <107070759+kirklandsign@users.noreply.github.com> Date: Wed, 17 Sep 2025 12:38:19 -0700 Subject: [PATCH 013/395] Android use new llm runner deps (#14361) Use extension_llm_runner instead of old llama_runner llava_runner --- .../qualcomm/oss_scripts/llama/runner/runner.cpp | 3 +-- extension/android/CMakeLists.txt | 15 +-------------- 2 files changed, 2 insertions(+), 16 deletions(-) diff --git a/examples/qualcomm/oss_scripts/llama/runner/runner.cpp b/examples/qualcomm/oss_scripts/llama/runner/runner.cpp index 253e083a80e..fc4ff006a90 100644 --- a/examples/qualcomm/oss_scripts/llama/runner/runner.cpp +++ b/examples/qualcomm/oss_scripts/llama/runner/runner.cpp @@ -182,8 +182,7 @@ Error Runner::load() { eos_ids->insert(tokenizer_->encode("<|eot|>", 0, 0).get()[0]); eos_ids->insert(tokenizer_->encode("<|end_of_text|>", 0, 0).get()[0]); } else { - tokenizer_ = - example::load_llama_tokenizer(tokenizer_path_, Version::Default); + tokenizer_ = llm::load_tokenizer(tokenizer_path_); if (tokenizer_ == nullptr) { ET_LOG( Error, "Failed to load tokenizer with %s", tokenizer_path_.c_str()); diff --git a/extension/android/CMakeLists.txt b/extension/android/CMakeLists.txt index e959e6858dc..2599d202e61 100644 --- a/extension/android/CMakeLists.txt +++ b/extension/android/CMakeLists.txt @@ -168,21 +168,8 @@ endif() if(EXECUTORCH_BUILD_LLAMA_JNI) target_sources(executorch_jni PRIVATE jni/jni_layer_llama.cpp jni/log.cpp) - list(APPEND link_libraries llama_runner) + list(APPEND link_libraries extension_llm_runner) target_compile_definitions(executorch_jni PUBLIC EXECUTORCH_BUILD_LLAMA_JNI=1) - add_subdirectory( - ${EXECUTORCH_ROOT}/examples/models/llama/runner - ${CMAKE_CURRENT_BINARY_DIR}/../../examples/models/llama/runner - ) - - target_sources( - executorch_jni - PRIVATE ${EXECUTORCH_ROOT}/extension/llm/runner/llm_runner_helper.cpp - ) - - target_include_directories( - executorch_jni PRIVATE ${EXECUTORCH_ROOT}/extension/llm/runner - ) if(QNN_SDK_ROOT) target_sources( From 97e229906b4fa6141ed6388a8089567871fc8381 Mon Sep 17 00:00:00 2001 From: Ethan Ng Date: Wed, 17 Sep 2025 13:35:03 -0700 Subject: [PATCH 014/395] Rename conv -> conv2d, conv1d_nchw -> conv1d_ncl, conv1d_nhwc -> conv1d_nlc Differential Revision: D82329465 Pull Request resolved: https://github.com/pytorch/executorch/pull/14310 --- backends/cadence/aot/TARGETS | 4 +- backends/cadence/aot/functions.yaml | 80 ++++----- backends/cadence/aot/functions_hifi.yaml | 80 ++++----- backends/cadence/aot/ops_registrations.py | 168 +++++++++--------- backends/cadence/aot/quantizer/patterns.py | 6 +- backends/cadence/aot/ref_implementations.py | 84 +++++---- backends/cadence/aot/replace_ops.py | 32 ++-- .../aot/tests/test_ref_implementations.py | 28 +-- .../aot/tests/test_replace_ops_passes.py | 14 +- .../aot/tests/test_type_dispatch_passes.py | 64 +++---- backends/cadence/aot/type_dispatch.py | 22 +-- .../cadence/generic/operators/CMakeLists.txt | 4 +- ..._out.cpp => quantized_conv2d_nchw_out.cpp} | 42 ++--- ..._out.cpp => quantized_conv2d_nhwc_out.cpp} | 42 ++--- .../cadence/generic/operators/targets.bzl | 8 +- .../cadence/hifi/operators/CMakeLists.txt | 4 +- ...cl_asym8sxsym8s_asym8s_per_tensor_out.cpp} | 6 +- ...cl_asym8uxsym8u_asym8u_per_tensor_out.cpp} | 6 +- ...lc_asym8sxsym8s_asym8s_per_tensor_out.cpp} | 6 +- ...lc_asym8uxsym8u_asym8u_per_tensor_out.cpp} | 6 +- ...hw_asym8sxsym8s_asym8s_per_tensor_out.cpp} | 6 +- ...hw_asym8uxsym8u_asym8u_per_tensor_out.cpp} | 6 +- ...se_asym8sxsym8s_asym8s_per_tensor_out.cpp} | 6 +- ...se_asym8uxsym8u_asym8u_per_tensor_out.cpp} | 6 +- ...ed_asym8sxsym8s_asym8s_per_tensor_out.cpp} | 2 +- ...ed_asym8uxsym8u_asym8u_per_tensor_out.cpp} | 2 +- ...t.cpp => op_quantized_conv2d_nchw_out.cpp} | 16 +- ...wc_asym8sxsym8s_asym8s_per_tensor_out.cpp} | 6 +- ...wc_asym8uxsym8u_asym8u_per_tensor_out.cpp} | 6 +- ...se_asym8sxsym8s_asym8s_per_tensor_out.cpp} | 6 +- ...se_asym8uxsym8u_asym8u_per_tensor_out.cpp} | 6 +- ...ed_asym8sxsym8s_asym8s_per_tensor_out.cpp} | 2 +- ...ed_asym8uxsym8u_asym8u_per_tensor_out.cpp} | 2 +- ...t.cpp => op_quantized_conv2d_nhwc_out.cpp} | 16 +- backends/cadence/hifi/operators/operators.h | 8 +- backends/cadence/hifi/operators/targets.bzl | 36 ++-- 36 files changed, 429 insertions(+), 409 deletions(-) rename backends/cadence/generic/operators/{quantized_conv_nchw_out.cpp => quantized_conv2d_nchw_out.cpp} (94%) rename backends/cadence/generic/operators/{quantized_conv_nhwc_out.cpp => quantized_conv2d_nhwc_out.cpp} (94%) rename backends/cadence/hifi/operators/{op_quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor_out.cpp => op_quantized_conv1d_ncl_asym8sxsym8s_asym8s_per_tensor_out.cpp} (96%) rename backends/cadence/hifi/operators/{op_quantized_conv1d_nchw_asym8uxsym8u_asym8u_per_tensor_out.cpp => op_quantized_conv1d_ncl_asym8uxsym8u_asym8u_per_tensor_out.cpp} (96%) rename backends/cadence/hifi/operators/{op_quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor_out.cpp => op_quantized_conv1d_nlc_asym8sxsym8s_asym8s_per_tensor_out.cpp} (95%) rename backends/cadence/hifi/operators/{op_quantized_conv1d_nhwc_asym8uxsym8u_asym8u_per_tensor_out.cpp => op_quantized_conv1d_nlc_asym8uxsym8u_asym8u_per_tensor_out.cpp} (95%) rename backends/cadence/hifi/operators/{op_quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor_out.cpp => op_quantized_conv2d_nchw_asym8sxsym8s_asym8s_per_tensor_out.cpp} (97%) rename backends/cadence/hifi/operators/{op_quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor_out.cpp => op_quantized_conv2d_nchw_asym8uxsym8u_asym8u_per_tensor_out.cpp} (97%) rename backends/cadence/hifi/operators/{op_quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp => op_quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp} (96%) rename backends/cadence/hifi/operators/{op_quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp => op_quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp} (96%) rename backends/cadence/hifi/operators/{op_quantized_conv_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp => op_quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp} (98%) rename backends/cadence/hifi/operators/{op_quantized_conv_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp => op_quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp} (98%) rename backends/cadence/hifi/operators/{op_quantized_conv_nchw_out.cpp => op_quantized_conv2d_nchw_out.cpp} (98%) rename backends/cadence/hifi/operators/{op_quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor_out.cpp => op_quantized_conv2d_nhwc_asym8sxsym8s_asym8s_per_tensor_out.cpp} (96%) rename backends/cadence/hifi/operators/{op_quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor_out.cpp => op_quantized_conv2d_nhwc_asym8uxsym8u_asym8u_per_tensor_out.cpp} (96%) rename backends/cadence/hifi/operators/{op_quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp => op_quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp} (95%) rename backends/cadence/hifi/operators/{op_quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp => op_quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp} (95%) rename backends/cadence/hifi/operators/{op_quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp => op_quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp} (98%) rename backends/cadence/hifi/operators/{op_quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp => op_quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp} (98%) rename backends/cadence/hifi/operators/{op_quantized_conv_nhwc_out.cpp => op_quantized_conv2d_nhwc_out.cpp} (98%) diff --git a/backends/cadence/aot/TARGETS b/backends/cadence/aot/TARGETS index 0ec09bf4f9e..b54f1ac3ba6 100644 --- a/backends/cadence/aot/TARGETS +++ b/backends/cadence/aot/TARGETS @@ -153,8 +153,8 @@ executorch_generated_lib( "//executorch/backends/cadence/generic/operators:dequantize_per_tensor", "//executorch/backends/cadence/generic/operators:quantize_per_tensor", "//executorch/backends/cadence/generic/operators:quantized_add_out", - "//executorch/backends/cadence/generic/operators:quantized_conv_nchw_out", - "//executorch/backends/cadence/generic/operators:quantized_conv_nhwc_out", + "//executorch/backends/cadence/generic/operators:quantized_conv2d_nchw_out", + "//executorch/backends/cadence/generic/operators:quantized_conv2d_nhwc_out", "//executorch/backends/cadence/generic/operators:quantized_fully_connected_out", "//executorch/backends/cadence/generic/operators:quantized_layer_norm", "//executorch/backends/cadence/generic/operators:quantized_linear_out", diff --git a/backends/cadence/aot/functions.yaml b/backends/cadence/aot/functions.yaml index 1c626887649..95c35055e9c 100644 --- a/backends/cadence/aot/functions.yaml +++ b/backends/cadence/aot/functions.yaml @@ -190,15 +190,15 @@ - arg_meta: null kernel_name: impl::generic::dequantize_per_tensor_out -- func: cadence::quantized_conv_nchw.out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw.out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nchw_out + kernel_name: impl::generic::quantized_conv2d_nchw_out -- func: cadence::quantized_conv_nhwc.out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc.out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nhwc_out + kernel_name: impl::generic::quantized_conv2d_nhwc_out - func: cadence::quantized_layer_norm.out(Tensor input, Tensor in_scale, Tensor in_zero_point, int[] normalized_shape, Tensor weight, Tensor bias, float eps, float output_scale, int output_zero_point, *, Tensor(a!) out) -> Tensor(a!) kernels: @@ -289,95 +289,95 @@ - arg_meta: null kernel_name: impl::generic::im2row_per_tensor_out -- func: cadence::quantized_conv_nchw.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, bool channel_last=False, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, bool channel_last=False, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nchw_per_tensor_out + kernel_name: impl::generic::quantized_conv2d_nchw_per_tensor_out -- func: cadence::quantized_conv_nhwc.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, bool channel_last=False, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, bool channel_last=False, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nhwc_per_tensor_out + kernel_name: impl::generic::quantized_conv2d_nhwc_per_tensor_out -- func: cadence::quantized_conv_nchw_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::generic::quantized_conv2d_nchw_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv_nchw_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::generic::quantized_conv2d_nchw_asym8uxsym8u_asym8u_per_tensor_out -- func: cadence::quantized_conv_nhwc_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::generic::quantized_conv2d_nhwc_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv_nhwc_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::generic::quantized_conv2d_nhwc_asym8uxsym8u_asym8u_per_tensor_out -- func: cadence::quantized_conv_nchw_dilated_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::generic::quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv_nchw_dilated_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::generic::quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out -- func: cadence::quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::generic::quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::generic::quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out -- func: cadence::quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::generic::quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::generic::quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out -- func: cadence::quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::generic::quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::generic::quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out -- func: cadence::quantized_conv1d_nchw_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv1d_ncl_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::generic::quantized_conv1d_ncl_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv1d_nchw_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv1d_ncl_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv1d_nchw_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::generic::quantized_conv1d_ncl_asym8uxsym8u_asym8u_per_tensor_out -- func: cadence::quantized_conv1d_nhwc_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv1d_nlc_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::generic::quantized_conv1d_nlc_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv1d_nhwc_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv1d_nlc_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::generic::quantized_conv1d_nhwc_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::generic::quantized_conv1d_nlc_asym8uxsym8u_asym8u_per_tensor_out - func: cadence::quantized_fully_connected.out(Tensor src, Tensor weight, Tensor bias, int src_zero_point, Tensor weight_zero_point, Tensor out_multiplier, Tensor out_shift, int out_zero_point, Tensor? offset, *, Tensor(a!) out) -> Tensor(a!) kernels: diff --git a/backends/cadence/aot/functions_hifi.yaml b/backends/cadence/aot/functions_hifi.yaml index a5f3102d600..a0e84d94300 100644 --- a/backends/cadence/aot/functions_hifi.yaml +++ b/backends/cadence/aot/functions_hifi.yaml @@ -290,105 +290,105 @@ - arg_meta: null kernel_name: impl::HiFi::dequantize_per_tensor_out -- func: cadence::quantized_conv_nchw.out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw.out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nchw_out + kernel_name: impl::HiFi::quantized_conv2d_nchw_out -- func: cadence::quantized_conv_nhwc.out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc.out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nhwc_out + kernel_name: impl::HiFi::quantized_conv2d_nhwc_out -- func: cadence::quantized_conv_nchw.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nchw_per_tensor_out + kernel_name: impl::HiFi::quantized_conv2d_nchw_per_tensor_out -- func: cadence::quantized_conv_nhwc.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nhwc_per_tensor_out + kernel_name: impl::HiFi::quantized_conv2d_nhwc_per_tensor_out -- func: cadence::quantized_conv_nchw_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::HiFi::quantized_conv2d_nchw_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv_nchw_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::HiFi::quantized_conv2d_nchw_asym8uxsym8u_asym8u_per_tensor_out -- func: cadence::quantized_conv_nhwc_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::HiFi::quantized_conv2d_nhwc_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv_nhwc_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::HiFi::quantized_conv2d_nhwc_asym8uxsym8u_asym8u_per_tensor_out -- func: cadence::quantized_conv_nchw_dilated_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::HiFi::quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv_nchw_dilated_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::HiFi::quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out -- func: cadence::quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::HiFi::quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::HiFi::quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out -- func: cadence::quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::HiFi::quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::HiFi::quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out -- func: cadence::quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::HiFi::quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::HiFi::quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out -- func: cadence::quantized_conv1d_nchw_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv1d_ncl_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::HiFi::quantized_conv1d_ncl_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv1d_nchw_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv1d_ncl_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv1d_nchw_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::HiFi::quantized_conv1d_ncl_asym8uxsym8u_asym8u_per_tensor_out -- func: cadence::quantized_conv1d_nhwc_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv1d_nlc_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor_out + kernel_name: impl::HiFi::quantized_conv1d_nlc_asym8sxsym8s_asym8s_per_tensor_out -- func: cadence::quantized_conv1d_nhwc_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) +- func: cadence::quantized_conv1d_nlc_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null - kernel_name: impl::HiFi::quantized_conv1d_nhwc_asym8uxsym8u_asym8u_per_tensor_out + kernel_name: impl::HiFi::quantized_conv1d_nlc_asym8uxsym8u_asym8u_per_tensor_out - func: cadence::quantized_layer_norm.out(Tensor input, Tensor in_scale, Tensor in_zero_point, int[] normalized_shape, Tensor weight, Tensor bias, float eps, float output_scale, int output_zero_point, *, Tensor(a!) out) -> Tensor(a!) kernels: diff --git a/backends/cadence/aot/ops_registrations.py b/backends/cadence/aot/ops_registrations.py index efb22a9e7d6..e483bea79d1 100644 --- a/backends/cadence/aot/ops_registrations.py +++ b/backends/cadence/aot/ops_registrations.py @@ -86,28 +86,28 @@ ) lib.define( - "quantized_conv_nhwc(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift) -> (Tensor Z)" + "quantized_conv2d_nhwc(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nhwc.out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nhwc.out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nhwc.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv2d_nhwc.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nhwc.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nhwc.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nchw(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift) -> (Tensor Z)" + "quantized_conv2d_nchw(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nchw.out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nchw.out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nchw.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv2d_nchw.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nchw.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nchw.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( "quantized_matmul(Tensor X, int X_zero_point, Tensor Y, int Y_zero_point, Tensor? bias, int out_multiplier, int out_shift, int out_zero_point, bool transposed=False) -> (Tensor Z)" @@ -122,100 +122,100 @@ "quantized_matmul_asym8sxasym8s_asym8s.out(Tensor X, int X_zero_point, Tensor Y, int Y_zero_point, Tensor? bias, int out_multiplier, int out_shift, int out_zero_point, bool transposed=False, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nchw_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv2d_nchw_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nchw_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nchw_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nchw_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv2d_nchw_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nchw_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nchw_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nhwc_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv2d_nhwc_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nhwc_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nhwc_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nhwc_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv2d_nhwc_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nhwc_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nhwc_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nchw_dilated_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nchw_dilated_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nchw_dilated_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nchw_dilated_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv1d_nchw_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv1d_ncl_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv1d_nchw_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv1d_ncl_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv1d_nchw_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv1d_ncl_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv1d_nchw_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv1d_ncl_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv1d_nhwc_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv1d_nlc_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv1d_nhwc_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv1d_nlc_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv1d_nhwc_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv1d_nlc_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv1d_nhwc_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv1d_nlc_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( - "quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" + "quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift) -> (Tensor Z)" ) lib.define( - "quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" + "quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor_out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, int weight_zero_point, float bias_scale, float out_scale, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)" ) lib.define( "quantized_matmul_asym8uxasym8u_asym8u(Tensor X, int X_zero_point, Tensor Y, int Y_zero_point, Tensor? bias, int out_multiplier, int out_shift, int out_zero_point, bool transposed=False) -> (Tensor Z)" @@ -704,8 +704,8 @@ def quantized_linear_asym8uxasym8u_asym8u_per_tensor_meta( return src.new_empty(out_size, dtype=src.dtype) -@register_fake("cadence::quantized_conv_nhwc") -def quantized_conv_nhwc_meta( +@register_fake("cadence::quantized_conv2d_nhwc") +def quantized_conv2d_nhwc_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -748,8 +748,8 @@ def quantized_conv_nhwc_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nchw") -def quantized_conv_nchw_meta( +@register_fake("cadence::quantized_conv2d_nchw") +def quantized_conv2d_nchw_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -792,8 +792,8 @@ def quantized_conv_nchw_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nchw.per_tensor") -def quantized_conv_nchw_per_tensor_meta( +@register_fake("cadence::quantized_conv2d_nchw.per_tensor") +def quantized_conv2d_nchw_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -836,8 +836,8 @@ def quantized_conv_nchw_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nhwc.per_tensor") -def quantized_conv_nhwc_per_tensor_meta( +@register_fake("cadence::quantized_conv2d_nhwc.per_tensor") +def quantized_conv2d_nhwc_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -880,8 +880,8 @@ def quantized_conv_nhwc_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nchw_asym8sxsym8s_asym8s.per_tensor") -def quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor_meta( +@register_fake("cadence::quantized_conv2d_nchw_asym8sxsym8s_asym8s.per_tensor") +def quantized_conv2d_nchw_asym8sxsym8s_asym8s_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -929,8 +929,8 @@ def quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nchw_asym8uxsym8u_asym8u.per_tensor") -def quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor_meta( +@register_fake("cadence::quantized_conv2d_nchw_asym8uxsym8u_asym8u.per_tensor") +def quantized_conv2d_nchw_asym8uxsym8u_asym8u_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -978,8 +978,8 @@ def quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nhwc_asym8sxsym8s_asym8s.per_tensor") -def quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor_meta( +@register_fake("cadence::quantized_conv2d_nhwc_asym8sxsym8s_asym8s.per_tensor") +def quantized_conv2d_nhwc_asym8sxsym8s_asym8s_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -1027,8 +1027,8 @@ def quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nhwc_asym8uxsym8u_asym8u.per_tensor") -def quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor_meta( +@register_fake("cadence::quantized_conv2d_nhwc_asym8uxsym8u_asym8u.per_tensor") +def quantized_conv2d_nhwc_asym8uxsym8u_asym8u_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -1076,8 +1076,8 @@ def quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nchw_dilated_asym8sxsym8s_asym8s.per_tensor") -def quantized_conv_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_meta( +@register_fake("cadence::quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s.per_tensor") +def quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -1125,8 +1125,8 @@ def quantized_conv_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nchw_dilated_asym8uxsym8u_asym8u.per_tensor") -def quantized_conv_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_meta( +@register_fake("cadence::quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u.per_tensor") +def quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -1174,8 +1174,8 @@ def quantized_conv_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor") -def quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_meta( +@register_fake("cadence::quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor") +def quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -1223,8 +1223,8 @@ def quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor") -def quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_meta( +@register_fake("cadence::quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor") +def quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -1272,8 +1272,10 @@ def quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor") -def quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_meta( +@register_fake( + "cadence::quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor" +) +def quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -1321,8 +1323,10 @@ def quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor") -def quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_meta( +@register_fake( + "cadence::quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor" +) +def quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -1370,8 +1374,10 @@ def quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor") -def quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_meta( +@register_fake( + "cadence::quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor" +) +def quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -1419,8 +1425,10 @@ def quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor") -def quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_meta( +@register_fake( + "cadence::quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor" +) +def quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -2177,8 +2185,8 @@ def roi_align_box_processor_meta( return rois.new_empty((rois.shape[0], 80), dtype=torch.uint8) -@register_fake("cadence::quantized_conv1d_nchw_asym8sxsym8s_asym8s.per_tensor") -def quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor_meta( +@register_fake("cadence::quantized_conv1d_ncl_asym8sxsym8s_asym8s.per_tensor") +def quantized_conv1d_ncl_asym8sxsym8s_asym8s_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -2213,8 +2221,8 @@ def quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv1d_nchw_asym8uxsym8u_asym8u.per_tensor") -def quantized_conv1d_nchw_asym8uxsym8u_asym8u_per_tensor_meta( +@register_fake("cadence::quantized_conv1d_ncl_asym8uxsym8u_asym8u.per_tensor") +def quantized_conv1d_ncl_asym8uxsym8u_asym8u_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -2249,8 +2257,8 @@ def quantized_conv1d_nchw_asym8uxsym8u_asym8u_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv1d_nhwc_asym8sxsym8s_asym8s.per_tensor") -def quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor_meta( +@register_fake("cadence::quantized_conv1d_nlc_asym8sxsym8s_asym8s.per_tensor") +def quantized_conv1d_nlc_asym8sxsym8s_asym8s_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -2285,8 +2293,8 @@ def quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor_meta( return input.new_empty(output_size, dtype=input.dtype) -@register_fake("cadence::quantized_conv1d_nhwc_asym8uxsym8u_asym8u.per_tensor") -def quantized_conv1d_nhwc_asym8uxsym8u_asym8u_per_tensor_meta( +@register_fake("cadence::quantized_conv1d_nlc_asym8uxsym8u_asym8u.per_tensor") +def quantized_conv1d_nlc_asym8uxsym8u_asym8u_per_tensor_meta( input: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, diff --git a/backends/cadence/aot/quantizer/patterns.py b/backends/cadence/aot/quantizer/patterns.py index b653be27e8f..9f67204fcf9 100644 --- a/backends/cadence/aot/quantizer/patterns.py +++ b/backends/cadence/aot/quantizer/patterns.py @@ -247,7 +247,7 @@ def get_anchors( ) def replacement_op(self) -> OpOverload: - return torch.ops.cadence.quantized_conv_nchw.default + return torch.ops.cadence.quantized_conv2d_nchw.default class Conv2dPattern(QuantizationPattern): @@ -286,7 +286,7 @@ def get_anchors( ) def replacement_op(self) -> OpOverload: - return torch.ops.cadence.quantized_conv_nchw.default + return torch.ops.cadence.quantized_conv2d_nchw.default class LayerNormPattern(QuantizationPattern): @@ -460,7 +460,7 @@ def get_anchors( ) def replacement_op(self) -> OpOverload: - return torch.ops.cadence.quantized_conv_nchw.default + return torch.ops.cadence.quantized_conv2d_nchw.default # Conv1d + regular relu op fusion diff --git a/backends/cadence/aot/ref_implementations.py b/backends/cadence/aot/ref_implementations.py index 2a53c2dde7a..5530b7c8117 100644 --- a/backends/cadence/aot/ref_implementations.py +++ b/backends/cadence/aot/ref_implementations.py @@ -623,8 +623,8 @@ def quantized_conv_per_tensor( ) -@impl(m, "quantized_conv_nchw.per_tensor") -def quantized_conv_nchw_per_tensor( +@impl(m, "quantized_conv2d_nchw.per_tensor") +def quantized_conv2d_nchw_per_tensor( input_tensor: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -679,8 +679,8 @@ def quantized_conv_nchw_per_tensor( ) -@impl(m, "quantized_conv_nhwc.per_tensor") -def quantized_conv_nhwc_per_tensor( +@impl(m, "quantized_conv2d_nhwc.per_tensor") +def quantized_conv2d_nhwc_per_tensor( input_tensor: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, @@ -800,7 +800,7 @@ def variant( # Call the appropriate base function match layout: case "nchw": - return quantized_conv_nchw_per_tensor( + return quantized_conv2d_nchw_per_tensor( input_tensor, weight, bias, @@ -817,7 +817,7 @@ def variant( out_shift, ) case "nhwc": - return quantized_conv_nhwc_per_tensor( + return quantized_conv2d_nhwc_per_tensor( input_tensor, weight, bias, @@ -841,84 +841,92 @@ def variant( return decorator -@impl(m, "quantized_conv_nchw_asym8sxsym8s_asym8s.per_tensor") +@impl(m, "quantized_conv2d_nchw_asym8sxsym8s_asym8s.per_tensor") @quantized_conv_variant("nchw", torch.int8, torch.int8) -def quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor() -> torch.Tensor: ... +def quantized_conv2d_nchw_asym8sxsym8s_asym8s_per_tensor() -> torch.Tensor: ... -@impl(m, "quantized_conv_nchw_asym8uxsym8u_asym8u.per_tensor") +@impl(m, "quantized_conv2d_nchw_asym8uxsym8u_asym8u.per_tensor") @quantized_conv_variant("nchw", torch.uint8, torch.uint8) -def quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor() -> torch.Tensor: ... +def quantized_conv2d_nchw_asym8uxsym8u_asym8u_per_tensor() -> torch.Tensor: ... -@impl(m, "quantized_conv_nhwc_asym8sxsym8s_asym8s.per_tensor") +@impl(m, "quantized_conv2d_nhwc_asym8sxsym8s_asym8s.per_tensor") @quantized_conv_variant("nhwc", torch.int8, torch.int8) -def quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor() -> torch.Tensor: ... +def quantized_conv2d_nhwc_asym8sxsym8s_asym8s_per_tensor() -> torch.Tensor: ... -@impl(m, "quantized_conv_nhwc_asym8uxsym8u_asym8u.per_tensor") +@impl(m, "quantized_conv2d_nhwc_asym8uxsym8u_asym8u.per_tensor") @quantized_conv_variant("nhwc", torch.uint8, torch.uint8) -def quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor() -> torch.Tensor: ... +def quantized_conv2d_nhwc_asym8uxsym8u_asym8u_per_tensor() -> torch.Tensor: ... -@impl(m, "quantized_conv_nchw_dilated_asym8sxsym8s_asym8s.per_tensor") +@impl(m, "quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s.per_tensor") @quantized_conv_variant("nchw", torch.int8, torch.int8) -def quantized_conv_nchw_dilated_asym8sxsym8s_asym8s_per_tensor() -> torch.Tensor: ... +def quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s_per_tensor() -> torch.Tensor: ... -@impl(m, "quantized_conv_nchw_dilated_asym8uxsym8u_asym8u.per_tensor") +@impl(m, "quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u.per_tensor") @quantized_conv_variant("nchw", torch.uint8, torch.uint8) -def quantized_conv_nchw_dilated_asym8uxsym8u_asym8u_per_tensor() -> torch.Tensor: ... +def quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u_per_tensor() -> torch.Tensor: ... -@impl(m, "quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor") +@impl(m, "quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor") @quantized_conv_variant("nhwc", torch.int8, torch.int8) -def quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor() -> torch.Tensor: ... +def quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor() -> torch.Tensor: ... -@impl(m, "quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor") +@impl(m, "quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor") @quantized_conv_variant("nhwc", torch.uint8, torch.uint8) -def quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor() -> torch.Tensor: ... +def quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor() -> torch.Tensor: ... -@impl(m, "quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor") +@impl(m, "quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor") @quantized_conv_variant("nchw", torch.int8, torch.int8) -def quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor() -> torch.Tensor: ... +def quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor() -> ( + torch.Tensor +): ... -@impl(m, "quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor") +@impl(m, "quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor") @quantized_conv_variant("nchw", torch.uint8, torch.uint8) -def quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor() -> torch.Tensor: ... +def quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor() -> ( + torch.Tensor +): ... -@impl(m, "quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor") +@impl(m, "quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor") @quantized_conv_variant("nhwc", torch.int8, torch.int8) -def quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor() -> torch.Tensor: ... +def quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor() -> ( + torch.Tensor +): ... -@impl(m, "quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor") +@impl(m, "quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor") @quantized_conv_variant("nhwc", torch.uint8, torch.uint8) -def quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor() -> torch.Tensor: ... +def quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor() -> ( + torch.Tensor +): ... -@impl(m, "quantized_conv1d_nchw_asym8sxsym8s_asym8s.per_tensor") +@impl(m, "quantized_conv1d_ncl_asym8sxsym8s_asym8s.per_tensor") @quantized_conv_variant("nchw", torch.int8, torch.int8, is_1d=True) -def quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor() -> torch.Tensor: ... +def quantized_conv1d_ncl_asym8sxsym8s_asym8s_per_tensor() -> torch.Tensor: ... -@impl(m, "quantized_conv1d_nchw_asym8uxsym8u_asym8u.per_tensor") +@impl(m, "quantized_conv1d_ncl_asym8uxsym8u_asym8u.per_tensor") @quantized_conv_variant("nchw", torch.uint8, torch.uint8, is_1d=True) -def quantized_conv1d_nchw_asym8uxsym8u_asym8u_per_tensor() -> torch.Tensor: ... +def quantized_conv1d_ncl_asym8uxsym8u_asym8u_per_tensor() -> torch.Tensor: ... -@impl(m, "quantized_conv1d_nhwc_asym8sxsym8s_asym8s.per_tensor") +@impl(m, "quantized_conv1d_nlc_asym8sxsym8s_asym8s.per_tensor") @quantized_conv_variant("nhwc", torch.int8, torch.int8, is_1d=True) -def quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor() -> torch.Tensor: ... +def quantized_conv1d_nlc_asym8sxsym8s_asym8s_per_tensor() -> torch.Tensor: ... -@impl(m, "quantized_conv1d_nhwc_asym8uxsym8u_asym8u.per_tensor") +@impl(m, "quantized_conv1d_nlc_asym8uxsym8u_asym8u.per_tensor") @quantized_conv_variant("nhwc", torch.uint8, torch.uint8, is_1d=True) -def quantized_conv1d_nhwc_asym8uxsym8u_asym8u_per_tensor() -> torch.Tensor: ... +def quantized_conv1d_nlc_asym8uxsym8u_asym8u_per_tensor() -> torch.Tensor: ... def quantized_relu_common( diff --git a/backends/cadence/aot/replace_ops.py b/backends/cadence/aot/replace_ops.py index c575be6e7fc..3d5bd493cfe 100644 --- a/backends/cadence/aot/replace_ops.py +++ b/backends/cadence/aot/replace_ops.py @@ -787,8 +787,8 @@ class ReplaceTrivialConvWithLinear(ExportPass): trivial_conv_op_to_linear_op: Dict[EdgeOpOverload, EdgeOpOverload] = { exir_ops.edge.cadence.convolution.default: exir_ops.edge.aten.linear.default, - exir_ops.edge.cadence.quantized_conv_nchw.default: exir_ops.edge.cadence.quantized_linear.default, - exir_ops.edge.cadence.quantized_conv_nhwc.default: exir_ops.edge.cadence.quantized_linear.default, + exir_ops.edge.cadence.quantized_conv2d_nchw.default: exir_ops.edge.cadence.quantized_linear.default, + exir_ops.edge.cadence.quantized_conv2d_nhwc.default: exir_ops.edge.cadence.quantized_linear.default, } def call_operator(self, op, args, kwargs, meta): @@ -800,8 +800,8 @@ def call_operator(self, op, args, kwargs, meta): # extra args holding at least the zero point and scale of input, weight, bias, # and output tensor. quantized_op = ( - op == exir_ops.edge.cadence.quantized_conv_nchw.default - or op == exir_ops.edge.cadence.quantized_conv_nhwc.default + op == exir_ops.edge.cadence.quantized_conv2d_nchw.default + or op == exir_ops.edge.cadence.quantized_conv2d_nhwc.default ) assert (len(args) == 8 and not quantized_op) or ( len(args) >= 12 and quantized_op @@ -979,18 +979,18 @@ def call_operator( ) -> ProxyValue: if op not in { exir_ops.edge.cadence.convolution.default, - exir_ops.edge.cadence.quantized_conv_nchw.default, + exir_ops.edge.cadence.quantized_conv2d_nchw.default, }: return super().call_operator(op, args, kwargs, meta) - quantized_op = op == exir_ops.edge.cadence.quantized_conv_nchw.default + quantized_op = op == exir_ops.edge.cadence.quantized_conv2d_nchw.default if not quantized_op and len(args) == 8 and args[-1] is True: # Already in NHWC layout. return super().call_operator(op, args, kwargs, meta) new_op = ( - exir_ops.edge.cadence.quantized_conv_nhwc.default + exir_ops.edge.cadence.quantized_conv2d_nhwc.default if quantized_op else exir_ops.edge.cadence.convolution.default ) @@ -1067,8 +1067,8 @@ class ReplaceConvWithIm2RowAndLinear(ExportPass): # decompose to. conv_op_to_linear_op: Dict[EdgeOpOverload, EdgeOpOverload] = { exir_ops.edge.cadence.convolution.default: exir_ops.edge.aten.linear.default, - exir_ops.edge.cadence.quantized_conv_nchw.default: exir_ops.edge.cadence.quantized_linear.default, - exir_ops.edge.cadence.quantized_conv_nhwc.default: exir_ops.edge.cadence.quantized_linear.default, + exir_ops.edge.cadence.quantized_conv2d_nchw.default: exir_ops.edge.cadence.quantized_linear.default, + exir_ops.edge.cadence.quantized_conv2d_nhwc.default: exir_ops.edge.cadence.quantized_linear.default, } def call_operator(self, op, args, kwargs, meta): @@ -1077,8 +1077,8 @@ def call_operator(self, op, args, kwargs, meta): # Get the relevant args from convolution node. quantized_op = ( - op == exir_ops.edge.cadence.quantized_conv_nchw.default - or op == exir_ops.edge.cadence.quantized_conv_nhwc.default + op == exir_ops.edge.cadence.quantized_conv2d_nchw.default + or op == exir_ops.edge.cadence.quantized_conv2d_nhwc.default ) assert (len(args) == 8 and not quantized_op) or ( len(args) >= 12 and quantized_op @@ -1110,7 +1110,7 @@ def call_operator(self, op, args, kwargs, meta): # channel_last layout is specified by the channel_last arg of conv # op, which is either the last argument (15th) or implicitely False # if the op is quantized, or the last argument if not. - channel_last = op == exir_ops.edge.cadence.quantized_conv_nhwc.default + channel_last = op == exir_ops.edge.cadence.quantized_conv2d_nhwc.default # The weight tensor is [out_channels, in_channels, X] for NCHW layout, # and [out_channels, X, in_channels] for NHWC layout. Here, X is the # kernel_width for conv1d, and X = kernel_height * kernel_width for @@ -1622,12 +1622,12 @@ class ReplaceSingleElementTensorArgumentsFromFullOpWithScalarPass(ExportPass): exir_ops.edge.cadence.quantized_add.per_tensor, [1, 2, 4, 5], ), - exir_ops.edge.cadence.quantized_conv_nchw: ( - exir_ops.edge.cadence.quantized_conv_nchw.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw: ( + exir_ops.edge.cadence.quantized_conv2d_nchw.per_tensor, [8, 9, 12, 13], ), - exir_ops.edge.cadence.quantized_conv_nhwc: ( - exir_ops.edge.cadence.quantized_conv_nhwc.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc: ( + exir_ops.edge.cadence.quantized_conv2d_nhwc.per_tensor, [8, 9, 12, 13], ), exir_ops.edge.cadence.quantized_fully_connected: ( diff --git a/backends/cadence/aot/tests/test_ref_implementations.py b/backends/cadence/aot/tests/test_ref_implementations.py index 30b30e085dc..2589bd88601 100644 --- a/backends/cadence/aot/tests/test_ref_implementations.py +++ b/backends/cadence/aot/tests/test_ref_implementations.py @@ -906,9 +906,9 @@ def test_quantized_conv_per_tensor( convs = [ ( - torch.ops.cadence.quantized_conv_nchw.per_tensor + torch.ops.cadence.quantized_conv2d_nchw.per_tensor if memory_format == torch.contiguous_format - else torch.ops.cadence.quantized_conv_nhwc.per_tensor + else torch.ops.cadence.quantized_conv2d_nhwc.per_tensor ) ] @@ -916,30 +916,30 @@ def test_quantized_conv_per_tensor( if input_tensor.dtype == torch.int8 and weight.dtype == torch.int8: if memory_format == torch.contiguous_format: optimized_convs = [ - torch.ops.cadence.quantized_conv_nchw_asym8sxsym8s_asym8s.per_tensor, - torch.ops.cadence.quantized_conv_nchw_dilated_asym8sxsym8s_asym8s.per_tensor, - torch.ops.cadence.quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor, + torch.ops.cadence.quantized_conv2d_nchw_asym8sxsym8s_asym8s.per_tensor, + torch.ops.cadence.quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s.per_tensor, + torch.ops.cadence.quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor, ] else: optimized_convs = [ - torch.ops.cadence.quantized_conv_nhwc_asym8sxsym8s_asym8s.per_tensor, - torch.ops.cadence.quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor, - torch.ops.cadence.quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor, + torch.ops.cadence.quantized_conv2d_nhwc_asym8sxsym8s_asym8s.per_tensor, + torch.ops.cadence.quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor, + torch.ops.cadence.quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor, ] elif input_tensor.dtype == torch.uint8 and weight.dtype == torch.uint8: if memory_format == torch.contiguous_format: optimized_convs = [ - torch.ops.cadence.quantized_conv_nchw_asym8uxsym8u_asym8u.per_tensor, - torch.ops.cadence.quantized_conv_nchw_dilated_asym8uxsym8u_asym8u.per_tensor, - torch.ops.cadence.quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor, + torch.ops.cadence.quantized_conv2d_nchw_asym8uxsym8u_asym8u.per_tensor, + torch.ops.cadence.quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u.per_tensor, + torch.ops.cadence.quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor, ] else: optimized_convs = [ - torch.ops.cadence.quantized_conv_nhwc_asym8uxsym8u_asym8u.per_tensor, - torch.ops.cadence.quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor, - torch.ops.cadence.quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor, + torch.ops.cadence.quantized_conv2d_nhwc_asym8uxsym8u_asym8u.per_tensor, + torch.ops.cadence.quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor, + torch.ops.cadence.quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor, ] convs.extend(optimized_convs) diff --git a/backends/cadence/aot/tests/test_replace_ops_passes.py b/backends/cadence/aot/tests/test_replace_ops_passes.py index ca5168db2be..8f1f2e86deb 100644 --- a/backends/cadence/aot/tests/test_replace_ops_passes.py +++ b/backends/cadence/aot/tests/test_replace_ops_passes.py @@ -1666,7 +1666,7 @@ def create_quantized_convolution_graph_module( out_multiplier, out_shift, ), - op=exir_ops.edge.cadence.quantized_conv_nhwc.default, + op=exir_ops.edge.cadence.quantized_conv2d_nhwc.default, args=args, ) else: @@ -1680,7 +1680,7 @@ def create_quantized_convolution_graph_module( out_multiplier, out_shift, ), - op=exir_ops.edge.cadence.quantized_conv_nchw.default, + op=exir_ops.edge.cadence.quantized_conv2d_nchw.default, args=args, ) @@ -1688,7 +1688,7 @@ def test_quantized_convolution_default_channel_last(self) -> None: # Create a graph with a single convolution node. gm = self.create_quantized_convolution_graph_module() self.assertEqual( - count_node(gm, exir_ops.edge.cadence.quantized_conv_nchw.default), 1 + count_node(gm, exir_ops.edge.cadence.quantized_conv2d_nchw.default), 1 ) self.assertEqual(count_node(gm, exir_ops.edge.aten.permute_copy.default), 0) @@ -1698,7 +1698,8 @@ def test_quantized_convolution_default_channel_last(self) -> None: # Check that no replacement was made. self.assertEqual( count_node( - gm_after_replacement, exir_ops.edge.cadence.quantized_conv_nhwc.default + gm_after_replacement, + exir_ops.edge.cadence.quantized_conv2d_nhwc.default, ), 1, ) @@ -1714,7 +1715,7 @@ def test_no_transpose_if_already_quantized_conv_channel_last(self) -> None: # Check if graph module is valid by running exportpass on it. gm = ExportPass().call(gm).graph_module self.assertEqual( - count_node(gm, exir_ops.edge.cadence.quantized_conv_nhwc.default), 1 + count_node(gm, exir_ops.edge.cadence.quantized_conv2d_nhwc.default), 1 ) # Apply replacement pass. @@ -1723,7 +1724,8 @@ def test_no_transpose_if_already_quantized_conv_channel_last(self) -> None: # Check that no replacement was made. self.assertEqual( count_node( - gm_after_replacement, exir_ops.edge.cadence.quantized_conv_nhwc.default + gm_after_replacement, + exir_ops.edge.cadence.quantized_conv2d_nhwc.default, ), 1, ) diff --git a/backends/cadence/aot/tests/test_type_dispatch_passes.py b/backends/cadence/aot/tests/test_type_dispatch_passes.py index 4ae10ea83dd..870735aad1a 100644 --- a/backends/cadence/aot/tests/test_type_dispatch_passes.py +++ b/backends/cadence/aot/tests/test_type_dispatch_passes.py @@ -199,29 +199,29 @@ def test_dispatch_quantized_matmul( "int8_nchw", torch.int8, (1, 3, 8, 8), # x_shape - exir_ops.edge.cadence.quantized_conv_nchw.per_tensor, - exir_ops.edge.cadence.quantized_conv_nchw_asym8sxsym8s_asym8s.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw_asym8sxsym8s_asym8s.per_tensor, ), ( "uint8_nchw", torch.uint8, (1, 3, 8, 8), # x_shape - exir_ops.edge.cadence.quantized_conv_nchw.per_tensor, - exir_ops.edge.cadence.quantized_conv_nchw_asym8uxsym8u_asym8u.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw_asym8uxsym8u_asym8u.per_tensor, ), ( "int8_nhwc", torch.int8, (1, 8, 8, 3), # x_shape - exir_ops.edge.cadence.quantized_conv_nhwc.per_tensor, - exir_ops.edge.cadence.quantized_conv_nhwc_asym8sxsym8s_asym8s.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc_asym8sxsym8s_asym8s.per_tensor, ), ( "uint8_nhwc", torch.uint8, (1, 8, 8, 3), # x_shape - exir_ops.edge.cadence.quantized_conv_nhwc.per_tensor, - exir_ops.edge.cadence.quantized_conv_nhwc_asym8uxsym8u_asym8u.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc_asym8uxsym8u_asym8u.per_tensor, ), ] ) @@ -256,29 +256,29 @@ def test_dispatch_quantized_conv_2d( "int8_nchw_dilated", torch.int8, (1, 3, 8, 8), # x_shape - exir_ops.edge.cadence.quantized_conv_nchw.per_tensor, - exir_ops.edge.cadence.quantized_conv_nchw_dilated_asym8sxsym8s_asym8s.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s.per_tensor, ), ( "uint8_nchw_dilated", torch.uint8, (1, 3, 8, 8), # x_shape - exir_ops.edge.cadence.quantized_conv_nchw.per_tensor, - exir_ops.edge.cadence.quantized_conv_nchw_dilated_asym8uxsym8u_asym8u.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u.per_tensor, ), ( "int8_nhwc_dilated", torch.int8, (1, 8, 8, 3), # x_shape - exir_ops.edge.cadence.quantized_conv_nhwc.per_tensor, - exir_ops.edge.cadence.quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s.per_tensor, ), ( "uint8_nhwc_dilated", torch.uint8, (1, 8, 8, 3), # x_shape - exir_ops.edge.cadence.quantized_conv_nhwc.per_tensor, - exir_ops.edge.cadence.quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u.per_tensor, ), ] ) @@ -313,29 +313,29 @@ def test_dispatch_quantized_conv_2d_dilated( "int8_nchw_1d", torch.int8, (1, 3, 8), # x_shape - exir_ops.edge.cadence.quantized_conv_nchw.per_tensor, - exir_ops.edge.cadence.quantized_conv1d_nchw_asym8sxsym8s_asym8s.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw.per_tensor, + exir_ops.edge.cadence.quantized_conv1d_ncl_asym8sxsym8s_asym8s.per_tensor, ), ( "uint8_nchw_1d", torch.uint8, (1, 3, 8), # x_shape - exir_ops.edge.cadence.quantized_conv_nchw.per_tensor, - exir_ops.edge.cadence.quantized_conv1d_nchw_asym8uxsym8u_asym8u.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw.per_tensor, + exir_ops.edge.cadence.quantized_conv1d_ncl_asym8uxsym8u_asym8u.per_tensor, ), ( "int8_nhwc_1d", torch.int8, (1, 8, 3), # x_shape - exir_ops.edge.cadence.quantized_conv_nhwc.per_tensor, - exir_ops.edge.cadence.quantized_conv1d_nhwc_asym8sxsym8s_asym8s.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc.per_tensor, + exir_ops.edge.cadence.quantized_conv1d_nlc_asym8sxsym8s_asym8s.per_tensor, ), ( "uint8_nhwc_1d", torch.uint8, (1, 8, 3), # x_shape - exir_ops.edge.cadence.quantized_conv_nhwc.per_tensor, - exir_ops.edge.cadence.quantized_conv1d_nhwc_asym8uxsym8u_asym8u.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc.per_tensor, + exir_ops.edge.cadence.quantized_conv1d_nlc_asym8uxsym8u_asym8u.per_tensor, ), ] ) @@ -410,32 +410,32 @@ def test_dispatch_quantized_add( torch.int8, (1, 3, 8, 8), # x_shape (3, 1, 3, 3), # w_shape (groups=3, input_channels=3) - exir_ops.edge.cadence.quantized_conv_nchw.per_tensor, - exir_ops.edge.cadence.quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s.per_tensor, ), ( "uint8_nchw_depthwise", torch.uint8, (1, 3, 8, 8), # x_shape (3, 1, 3, 3), # w_shape (groups=3, input_channels=3) - exir_ops.edge.cadence.quantized_conv_nchw.per_tensor, - exir_ops.edge.cadence.quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u.per_tensor, ), ( "int8_nhwc_depthwise", torch.int8, (1, 8, 8, 3), # x_shape (3, 3, 3, 1), # w_shape (groups=3, input_channels=3) - exir_ops.edge.cadence.quantized_conv_nhwc.per_tensor, - exir_ops.edge.cadence.quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s.per_tensor, ), ( "uint8_nhwc_depthwise", torch.uint8, (1, 8, 8, 3), # x_shape (3, 3, 3, 1), # w_shape (groups=3, input_channels=3) - exir_ops.edge.cadence.quantized_conv_nhwc.per_tensor, - exir_ops.edge.cadence.quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u.per_tensor, ), ] ) diff --git a/backends/cadence/aot/type_dispatch.py b/backends/cadence/aot/type_dispatch.py index 958a78a4808..3bf86ad2e50 100644 --- a/backends/cadence/aot/type_dispatch.py +++ b/backends/cadence/aot/type_dispatch.py @@ -62,16 +62,16 @@ class CompileTimeTypeDispatchPass(ExportPass): weight_arg_idx=2, variant="default", ), - exir_ops.edge.cadence.quantized_conv_nchw.per_tensor: OpConfig( - "quantized_conv_nchw", + exir_ops.edge.cadence.quantized_conv2d_nchw.per_tensor: OpConfig( + "quantized_conv2d_nchw", type_dispatch_suffixes={ (torch.int8, torch.int8): "asym8sxsym8s_asym8s", (torch.uint8, torch.uint8): "asym8uxsym8u_asym8u", }, weight_arg_idx=1, ), - exir_ops.edge.cadence.quantized_conv_nhwc.per_tensor: OpConfig( - "quantized_conv_nhwc", + exir_ops.edge.cadence.quantized_conv2d_nhwc.per_tensor: OpConfig( + "quantized_conv2d_nhwc", type_dispatch_suffixes={ (torch.int8, torch.int8): "asym8sxsym8s_asym8s", (torch.uint8, torch.uint8): "asym8uxsym8u_asym8u", @@ -132,13 +132,13 @@ def call_operator( typed_op_name = f"{base_name}_{type_suffix}" if op in [ - exir_ops.edge.cadence.quantized_conv_nchw.per_tensor, - exir_ops.edge.cadence.quantized_conv_nhwc.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nchw.per_tensor, + exir_ops.edge.cadence.quantized_conv2d_nhwc.per_tensor, ]: groups = args[6] input_channels = ( args[0].to_tensor().shape[1] - if op == exir_ops.edge.cadence.quantized_conv_nchw.per_tensor + if op == exir_ops.edge.cadence.quantized_conv2d_nchw.per_tensor else args[0].to_tensor().shape[-1] ) is_depthwise = groups == input_channels @@ -151,9 +151,11 @@ def call_operator( elif is_dilated: typed_op_name = f"{base_name}_dilated_{type_suffix}" elif is_1d and groups == 1: - typed_op_name = ( - f"quantized_conv1d_{base_name.split('_')[-1]}_{type_suffix}" - ) + if "nchw" in base_name: + layout_suffix = "ncl" + else: + layout_suffix = "nlc" + typed_op_name = f"quantized_conv1d_{layout_suffix}_{type_suffix}" typed_op = getattr( getattr(exir_ops.edge.cadence, typed_op_name), config.variant diff --git a/backends/cadence/generic/operators/CMakeLists.txt b/backends/cadence/generic/operators/CMakeLists.txt index ea5b699f441..d88701007f9 100644 --- a/backends/cadence/generic/operators/CMakeLists.txt +++ b/backends/cadence/generic/operators/CMakeLists.txt @@ -80,8 +80,8 @@ target_include_directories( add_library( custom_ops "quantized_linear_out.cpp" - "quantized_conv_nchw_out.cpp" - "quantized_conv_nhwc_out.cpp" + "quantized_conv2d_nchw_out.cpp" + "quantized_conv2d_nhwc_out.cpp" "quantized_relu_out.cpp" "quantized_layer_norm.cpp" "quantize_per_tensor.cpp" diff --git a/backends/cadence/generic/operators/quantized_conv_nchw_out.cpp b/backends/cadence/generic/operators/quantized_conv2d_nchw_out.cpp similarity index 94% rename from backends/cadence/generic/operators/quantized_conv_nchw_out.cpp rename to backends/cadence/generic/operators/quantized_conv2d_nchw_out.cpp index 6eeabcf1d52..fbb01c82e65 100644 --- a/backends/cadence/generic/operators/quantized_conv_nchw_out.cpp +++ b/backends/cadence/generic/operators/quantized_conv2d_nchw_out.cpp @@ -157,7 +157,7 @@ __attribute__((noinline)) void conv2d_nchw_core_generic( // bias_scale, since it is a product of the two. The kernel will branch to // quantized::conv1d or quantized::conv2d based on the dimensionality of // activation tensor. -void quantized_conv_nchw( +void quantized_conv2d_nchw( const Tensor& input, const Tensor& weight, const Tensor& bias, @@ -228,7 +228,7 @@ void quantized_conv_nchw( #undef typed_quantized_conv2d_nchw } -void quantized_conv_nchw_out( +void quantized_conv2d_nchw_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -248,7 +248,7 @@ void quantized_conv_nchw_out( const float bias_scale_float = bias_scale.const_data_ptr()[0]; const int32_t weight_zero_point_int = weight_zero_point.const_data_ptr()[0]; - quantized_conv_nchw( + quantized_conv2d_nchw( input, weight, bias, @@ -264,7 +264,7 @@ void quantized_conv_nchw_out( out); } -void quantized_conv_nchw_per_tensor_out( +void quantized_conv2d_nchw_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -282,7 +282,7 @@ void quantized_conv_nchw_per_tensor_out( __ET_UNUSED int64_t out_shift, bool channel_last, Tensor& out) { - quantized_conv_nchw( + quantized_conv2d_nchw( input, weight, bias, @@ -298,7 +298,7 @@ void quantized_conv_nchw_per_tensor_out( out); } -void quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv2d_nchw_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -315,7 +315,7 @@ void quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nchw( + quantized_conv2d_nchw( input, weight, bias, @@ -331,7 +331,7 @@ void quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor_out( out); } -void quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv2d_nchw_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -348,7 +348,7 @@ void quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nchw( + quantized_conv2d_nchw( input, weight, bias, @@ -364,7 +364,7 @@ void quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor_out( out); } -void quantized_conv_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -381,7 +381,7 @@ void quantized_conv_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nchw( + quantized_conv2d_nchw( input, weight, bias, @@ -397,7 +397,7 @@ void quantized_conv_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out( out); } -void quantized_conv_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -414,7 +414,7 @@ void quantized_conv_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nchw( + quantized_conv2d_nchw( input, weight, bias, @@ -430,7 +430,7 @@ void quantized_conv_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out( out); } -void quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -447,7 +447,7 @@ void quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nchw( + quantized_conv2d_nchw( input, weight, bias, @@ -463,7 +463,7 @@ void quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out( out); } -void quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -480,7 +480,7 @@ void quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nchw( + quantized_conv2d_nchw( input, weight, bias, @@ -496,7 +496,7 @@ void quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out( out); } -void quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv1d_ncl_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -513,7 +513,7 @@ void quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nchw( + quantized_conv2d_nchw( input, weight, bias, @@ -529,7 +529,7 @@ void quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor_out( out); } -void quantized_conv1d_nchw_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv1d_ncl_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -546,7 +546,7 @@ void quantized_conv1d_nchw_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nchw( + quantized_conv2d_nchw( input, weight, bias, diff --git a/backends/cadence/generic/operators/quantized_conv_nhwc_out.cpp b/backends/cadence/generic/operators/quantized_conv2d_nhwc_out.cpp similarity index 94% rename from backends/cadence/generic/operators/quantized_conv_nhwc_out.cpp rename to backends/cadence/generic/operators/quantized_conv2d_nhwc_out.cpp index d377048b142..eca836dcc94 100644 --- a/backends/cadence/generic/operators/quantized_conv_nhwc_out.cpp +++ b/backends/cadence/generic/operators/quantized_conv2d_nhwc_out.cpp @@ -144,7 +144,7 @@ __attribute__((noinline)) void conv2d_nhwc_core_generic( } } -void quantized_conv_nhwc( +void quantized_conv2d_nhwc( const Tensor& input, const Tensor& weight, const Tensor& bias, @@ -215,7 +215,7 @@ void quantized_conv_nhwc( #undef typed_quantized_conv2d_nhwc } -void quantized_conv_nhwc_out( +void quantized_conv2d_nhwc_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -235,7 +235,7 @@ void quantized_conv_nhwc_out( const float bias_scale_float = bias_scale.const_data_ptr()[0]; const int32_t weight_zero_point_int = weight_zero_point.const_data_ptr()[0]; - quantized_conv_nhwc( + quantized_conv2d_nhwc( input, weight, bias, @@ -251,7 +251,7 @@ void quantized_conv_nhwc_out( out); } -void quantized_conv_nhwc_per_tensor_out( +void quantized_conv2d_nhwc_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -269,7 +269,7 @@ void quantized_conv_nhwc_per_tensor_out( __ET_UNUSED int64_t out_shift, bool channel_last, Tensor& out) { - quantized_conv_nhwc( + quantized_conv2d_nhwc( input, weight, bias, @@ -285,7 +285,7 @@ void quantized_conv_nhwc_per_tensor_out( out); } -void quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv2d_nhwc_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -302,7 +302,7 @@ void quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nhwc( + quantized_conv2d_nhwc( input, weight, bias, @@ -318,7 +318,7 @@ void quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor_out( out); } -void quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv2d_nhwc_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -335,7 +335,7 @@ void quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nhwc( + quantized_conv2d_nhwc( input, weight, bias, @@ -351,7 +351,7 @@ void quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor_out( out); } -void quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -368,7 +368,7 @@ void quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nhwc( + quantized_conv2d_nhwc( input, weight, bias, @@ -384,7 +384,7 @@ void quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out( out); } -void quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -401,7 +401,7 @@ void quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nhwc( + quantized_conv2d_nhwc( input, weight, bias, @@ -417,7 +417,7 @@ void quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out( out); } -void quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv1d_nlc_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -434,7 +434,7 @@ void quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nhwc( + quantized_conv2d_nhwc( input, weight, bias, @@ -450,7 +450,7 @@ void quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor_out( out); } -void quantized_conv1d_nhwc_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv1d_nlc_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -467,7 +467,7 @@ void quantized_conv1d_nhwc_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nhwc( + quantized_conv2d_nhwc( input, weight, bias, @@ -483,7 +483,7 @@ void quantized_conv1d_nhwc_asym8uxsym8u_asym8u_per_tensor_out( out); } -void quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -500,7 +500,7 @@ void quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nhwc( + quantized_conv2d_nhwc( input, weight, bias, @@ -516,7 +516,7 @@ void quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out( out); } -void quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -533,7 +533,7 @@ void quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - quantized_conv_nhwc( + quantized_conv2d_nhwc( input, weight, bias, diff --git a/backends/cadence/generic/operators/targets.bzl b/backends/cadence/generic/operators/targets.bzl index 4ff821158bc..b3c305c9c02 100644 --- a/backends/cadence/generic/operators/targets.bzl +++ b/backends/cadence/generic/operators/targets.bzl @@ -136,8 +136,8 @@ def define_common_targets(): ) runtime.cxx_library( - name = "quantized_conv_nchw_out", - srcs = ["quantized_conv_nchw_out.cpp"], + name = "quantized_conv2d_nchw_out", + srcs = ["quantized_conv2d_nchw_out.cpp"], exported_headers = ["operators.h", "quantized_ops.h"], platforms = CXX, deps = [ @@ -151,8 +151,8 @@ def define_common_targets(): ) runtime.cxx_library( - name = "quantized_conv_nhwc_out", - srcs = ["quantized_conv_nhwc_out.cpp"], + name = "quantized_conv2d_nhwc_out", + srcs = ["quantized_conv2d_nhwc_out.cpp"], exported_headers = ["operators.h", "quantized_ops.h"], platforms = CXX, deps = [ diff --git a/backends/cadence/hifi/operators/CMakeLists.txt b/backends/cadence/hifi/operators/CMakeLists.txt index 6bd63c6d9f6..26555da9760 100644 --- a/backends/cadence/hifi/operators/CMakeLists.txt +++ b/backends/cadence/hifi/operators/CMakeLists.txt @@ -96,8 +96,8 @@ add_library( "op_quantize_per_tensor.cpp" "op_quantized_relu_out.cpp" "op_dequantize_per_tensor.cpp" - "op_quantized_conv_nchw_out.cpp" - "op_quantized_conv_nhwc_out.cpp" + "op_quantized_conv2d_nchw_out.cpp" + "op_quantized_conv2d_nhwc_out.cpp" "op_quantized_fully_connected_out" ) target_include_directories( diff --git a/backends/cadence/hifi/operators/op_quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv1d_ncl_asym8sxsym8s_asym8s_per_tensor_out.cpp similarity index 96% rename from backends/cadence/hifi/operators/op_quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv1d_ncl_asym8sxsym8s_asym8s_per_tensor_out.cpp index 566325e0f10..b5ab0cdbaa2 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv1d_ncl_asym8sxsym8s_asym8s_per_tensor_out.cpp @@ -22,7 +22,7 @@ namespace HiFi { namespace native { // Optimized NCHW 1D convolution for int8 x int8 -> int8 -void xa_opt_quantized_conv1d_nchw_asym8sxsym8s_asym8s( +void xa_opt_quantized_conv1d_ncl_asym8sxsym8s_asym8s( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -144,7 +144,7 @@ void xa_opt_quantized_conv1d_nchw_asym8sxsym8s_asym8s( } } -void quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv1d_ncl_asym8sxsym8s_asym8s_per_tensor_out( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -161,7 +161,7 @@ void quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - xa_opt_quantized_conv1d_nchw_asym8sxsym8s_asym8s( + xa_opt_quantized_conv1d_ncl_asym8sxsym8s_asym8s( ctx, input, weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv1d_nchw_asym8uxsym8u_asym8u_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv1d_ncl_asym8uxsym8u_asym8u_per_tensor_out.cpp similarity index 96% rename from backends/cadence/hifi/operators/op_quantized_conv1d_nchw_asym8uxsym8u_asym8u_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv1d_ncl_asym8uxsym8u_asym8u_per_tensor_out.cpp index de5f76b0fff..60e700f563b 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv1d_nchw_asym8uxsym8u_asym8u_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv1d_ncl_asym8uxsym8u_asym8u_per_tensor_out.cpp @@ -22,7 +22,7 @@ namespace HiFi { namespace native { // Optimized NCHW 1D convolution for uint8 x uint8 -> uint8 -void xa_opt_quantized_conv1d_nchw_asym8uxsym8u_asym8u( +void xa_opt_quantized_conv1d_ncl_asym8uxsym8u_asym8u( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -144,7 +144,7 @@ void xa_opt_quantized_conv1d_nchw_asym8uxsym8u_asym8u( } } -void quantized_conv1d_nchw_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv1d_ncl_asym8uxsym8u_asym8u_per_tensor_out( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -161,7 +161,7 @@ void quantized_conv1d_nchw_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - xa_opt_quantized_conv1d_nchw_asym8uxsym8u_asym8u( + xa_opt_quantized_conv1d_ncl_asym8uxsym8u_asym8u( ctx, input, weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv1d_nlc_asym8sxsym8s_asym8s_per_tensor_out.cpp similarity index 95% rename from backends/cadence/hifi/operators/op_quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv1d_nlc_asym8sxsym8s_asym8s_per_tensor_out.cpp index b549ad13307..c9a3d2b58de 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv1d_nlc_asym8sxsym8s_asym8s_per_tensor_out.cpp @@ -22,7 +22,7 @@ namespace HiFi { namespace native { // Optimized NHWC 1D convolution for int8 x int8 -> int8 -void xa_opt_quantized_conv1d_nhwc_asym8sxsym8s_asym8s( +void xa_opt_quantized_conv1d_nlc_asym8sxsym8s_asym8s( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -93,7 +93,7 @@ void xa_opt_quantized_conv1d_nhwc_asym8sxsym8s_asym8s( } } -void quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv1d_nlc_asym8sxsym8s_asym8s_per_tensor_out( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -110,7 +110,7 @@ void quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - xa_opt_quantized_conv1d_nhwc_asym8sxsym8s_asym8s( + xa_opt_quantized_conv1d_nlc_asym8sxsym8s_asym8s( ctx, input, weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv1d_nhwc_asym8uxsym8u_asym8u_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv1d_nlc_asym8uxsym8u_asym8u_per_tensor_out.cpp similarity index 95% rename from backends/cadence/hifi/operators/op_quantized_conv1d_nhwc_asym8uxsym8u_asym8u_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv1d_nlc_asym8uxsym8u_asym8u_per_tensor_out.cpp index f5dbb083522..2d7a4cba509 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv1d_nhwc_asym8uxsym8u_asym8u_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv1d_nlc_asym8uxsym8u_asym8u_per_tensor_out.cpp @@ -22,7 +22,7 @@ namespace HiFi { namespace native { // Optimized NHWC 1D convolution for uint8 x uint8 -> uint8 -void xa_opt_quantized_conv1d_nhwc_asym8uxsym8u_asym8u( +void xa_opt_quantized_conv1d_nlc_asym8uxsym8u_asym8u( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -93,7 +93,7 @@ void xa_opt_quantized_conv1d_nhwc_asym8uxsym8u_asym8u( } } -void quantized_conv1d_nhwc_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv1d_nlc_asym8uxsym8u_asym8u_per_tensor_out( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -110,7 +110,7 @@ void quantized_conv1d_nhwc_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - xa_opt_quantized_conv1d_nhwc_asym8uxsym8u_asym8u( + xa_opt_quantized_conv1d_nlc_asym8uxsym8u_asym8u( ctx, input, weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv2d_nchw_asym8sxsym8s_asym8s_per_tensor_out.cpp similarity index 97% rename from backends/cadence/hifi/operators/op_quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv2d_nchw_asym8sxsym8s_asym8s_per_tensor_out.cpp index e4074829cf0..e2584485686 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv2d_nchw_asym8sxsym8s_asym8s_per_tensor_out.cpp @@ -22,7 +22,7 @@ namespace HiFi { namespace native { // Optimized NCHW convolution for int8 x int8 -> int8 -void xa_opt_quantized_conv_nchw_asym8sxsym8s_asym8s( +void xa_opt_quantized_conv2d_nchw_asym8sxsym8s_asym8s( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -207,7 +207,7 @@ void xa_opt_quantized_conv_nchw_asym8sxsym8s_asym8s( } } -void quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv2d_nchw_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -224,7 +224,7 @@ void quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - xa_opt_quantized_conv_nchw_asym8sxsym8s_asym8s( + xa_opt_quantized_conv2d_nchw_asym8sxsym8s_asym8s( ctx, input, weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv2d_nchw_asym8uxsym8u_asym8u_per_tensor_out.cpp similarity index 97% rename from backends/cadence/hifi/operators/op_quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv2d_nchw_asym8uxsym8u_asym8u_per_tensor_out.cpp index 201b5d7da16..8444fef6bd1 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv2d_nchw_asym8uxsym8u_asym8u_per_tensor_out.cpp @@ -22,7 +22,7 @@ namespace HiFi { namespace native { // Optimized NCHW convolution for uint8 x uint8 -> uint8 -void xa_opt_quantized_conv_nchw_asym8uxsym8u_asym8u( +void xa_opt_quantized_conv2d_nchw_asym8uxsym8u_asym8u( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -207,7 +207,7 @@ void xa_opt_quantized_conv_nchw_asym8uxsym8u_asym8u( } } -void quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv2d_nchw_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -224,7 +224,7 @@ void quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - xa_opt_quantized_conv_nchw_asym8uxsym8u_asym8u( + xa_opt_quantized_conv2d_nchw_asym8uxsym8u_asym8u( ctx, input, weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp similarity index 96% rename from backends/cadence/hifi/operators/op_quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp index a0e47104e18..787984e52db 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp @@ -22,7 +22,7 @@ namespace HiFi { namespace native { // Specialized depthwise NCHW convolution for int8 x int8 -> int8 -void xa_opt_quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s( +void xa_opt_quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -162,7 +162,7 @@ void xa_opt_quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s( kNnlibMaxDim); } -void quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -179,7 +179,7 @@ void quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - xa_opt_quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s( + xa_opt_quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s( ctx, input, weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp similarity index 96% rename from backends/cadence/hifi/operators/op_quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp index 03274413f65..219eaf44ad7 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp @@ -22,7 +22,7 @@ namespace HiFi { namespace native { // Specialized depthwise NCHW convolution for uint8 x uint8 -> uint8 -void xa_opt_quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u( +void xa_opt_quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -162,7 +162,7 @@ void xa_opt_quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u( kNnlibMaxDim); } -void quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -179,7 +179,7 @@ void quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - xa_opt_quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u( + xa_opt_quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u( ctx, input, weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp similarity index 98% rename from backends/cadence/hifi/operators/op_quantized_conv_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp index 34c861faed5..fc279f2bbdf 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp @@ -122,7 +122,7 @@ __attribute__((noinline)) void conv2d_nchw_dilated_asym8sxsym8s_asym8s_core( } } -void quantized_conv_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp similarity index 98% rename from backends/cadence/hifi/operators/op_quantized_conv_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp index 6393554e18f..08ca4657c75 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp @@ -123,7 +123,7 @@ __attribute__((noinline)) void conv2d_nchw_dilated_asym8uxsym8u_asym8u_core( } } -void quantized_conv_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv_nchw_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv2d_nchw_out.cpp similarity index 98% rename from backends/cadence/hifi/operators/op_quantized_conv_nchw_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv2d_nchw_out.cpp index 604f881ab96..984747d9316 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv_nchw_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv2d_nchw_out.cpp @@ -156,7 +156,7 @@ __attribute__((noinline)) void conv2d_nchw_core_generic( } } -void xa_opt_quantized_conv_nchw( +void xa_opt_quantized_conv2d_nchw( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -444,7 +444,7 @@ void xa_opt_quantized_conv_nchw( // bias_scale, since it is a product of the two. The kernel will branch to // quantized::conv1d or quantized::conv2d based on the dimensionality of // activation tensor. -void quantized_conv_nchw( +void quantized_conv2d_nchw( const Tensor& input, const Tensor& weight, const Tensor& bias, @@ -515,7 +515,7 @@ void quantized_conv_nchw( #undef typed_quantized_conv2d_nchw } -void quantized_conv_nchw_out( +void quantized_conv2d_nchw_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -546,7 +546,7 @@ void quantized_conv_nchw_out( optimized = 0; if (optimized) { - xa_opt_quantized_conv_nchw( + xa_opt_quantized_conv2d_nchw( ctx, input, weight, @@ -562,7 +562,7 @@ void quantized_conv_nchw_out( output_zero_point, out); } else { - quantized_conv_nchw( + quantized_conv2d_nchw( input, weight, bias, @@ -579,7 +579,7 @@ void quantized_conv_nchw_out( } } -void quantized_conv_nchw_per_tensor_out( +void quantized_conv2d_nchw_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -606,7 +606,7 @@ void quantized_conv_nchw_per_tensor_out( optimized = 0; if (optimized) { - xa_opt_quantized_conv_nchw( + xa_opt_quantized_conv2d_nchw( ctx, input, weight, @@ -622,7 +622,7 @@ void quantized_conv_nchw_per_tensor_out( output_zero_point, out); } else { - quantized_conv_nchw( + quantized_conv2d_nchw( input, weight, bias, diff --git a/backends/cadence/hifi/operators/op_quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_asym8sxsym8s_asym8s_per_tensor_out.cpp similarity index 96% rename from backends/cadence/hifi/operators/op_quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_asym8sxsym8s_asym8s_per_tensor_out.cpp index 3f62c82bfcd..9bd7e641144 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_asym8sxsym8s_asym8s_per_tensor_out.cpp @@ -22,7 +22,7 @@ namespace HiFi { namespace native { // Optimized NHWC convolution for int8 x int8 -> int8 -void xa_opt_quantized_conv_nhwc_asym8sxsym8s_asym8s( +void xa_opt_quantized_conv2d_nhwc_asym8sxsym8s_asym8s( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -150,7 +150,7 @@ void xa_opt_quantized_conv_nhwc_asym8sxsym8s_asym8s( } } -void quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv2d_nhwc_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -167,7 +167,7 @@ void quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - xa_opt_quantized_conv_nhwc_asym8sxsym8s_asym8s( + xa_opt_quantized_conv2d_nhwc_asym8sxsym8s_asym8s( ctx, input, weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_asym8uxsym8u_asym8u_per_tensor_out.cpp similarity index 96% rename from backends/cadence/hifi/operators/op_quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_asym8uxsym8u_asym8u_per_tensor_out.cpp index 32267591cf3..433cbf76fce 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_asym8uxsym8u_asym8u_per_tensor_out.cpp @@ -22,7 +22,7 @@ namespace HiFi { namespace native { // Optimized NHWC convolution for uint8 x uint8 -> uint8 -void xa_opt_quantized_conv_nhwc_asym8uxsym8u_asym8u( +void xa_opt_quantized_conv2d_nhwc_asym8uxsym8u_asym8u( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -150,7 +150,7 @@ void xa_opt_quantized_conv_nhwc_asym8uxsym8u_asym8u( } } -void quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv2d_nhwc_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -167,7 +167,7 @@ void quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - xa_opt_quantized_conv_nhwc_asym8uxsym8u_asym8u( + xa_opt_quantized_conv2d_nhwc_asym8uxsym8u_asym8u( ctx, input, weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp similarity index 95% rename from backends/cadence/hifi/operators/op_quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp index c232f7e5ef2..384ebbb4f48 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out.cpp @@ -22,7 +22,7 @@ namespace HiFi { namespace native { // Specialized depthwise NHWC convolution for int8 x int8 -> int8 -void xa_opt_quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s( +void xa_opt_quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -132,7 +132,7 @@ void xa_opt_quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s( } } -void quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -149,7 +149,7 @@ void quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - xa_opt_quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s( + xa_opt_quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s( ctx, input, weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp similarity index 95% rename from backends/cadence/hifi/operators/op_quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp index 5ef102c31d1..07df1a416d7 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out.cpp @@ -22,7 +22,7 @@ namespace HiFi { namespace native { // Specialized depthwise NHWC convolution for uint8 x uint8 -> uint8 -void xa_opt_quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u( +void xa_opt_quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -132,7 +132,7 @@ void xa_opt_quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u( } } -void quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -149,7 +149,7 @@ void quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED int64_t out_multiplier, __ET_UNUSED int64_t out_shift, Tensor& out) { - xa_opt_quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u( + xa_opt_quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u( ctx, input, weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp similarity index 98% rename from backends/cadence/hifi/operators/op_quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp index 35a1cbda0f9..91965594a5d 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out.cpp @@ -122,7 +122,7 @@ __attribute__((noinline)) void conv2d_nhwc_dilated_asym8sxsym8s_asym8s_core( } } -void quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out( +void quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp similarity index 98% rename from backends/cadence/hifi/operators/op_quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp index 62b5008ab7e..14dc31a719f 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out.cpp @@ -122,7 +122,7 @@ __attribute__((noinline)) void conv2d_nhwc_dilated_asym8uxsym8u_asym8u_core( } } -void quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out( +void quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, diff --git a/backends/cadence/hifi/operators/op_quantized_conv_nhwc_out.cpp b/backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_out.cpp similarity index 98% rename from backends/cadence/hifi/operators/op_quantized_conv_nhwc_out.cpp rename to backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_out.cpp index 5aa087c4b75..a5d503853c4 100644 --- a/backends/cadence/hifi/operators/op_quantized_conv_nhwc_out.cpp +++ b/backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_out.cpp @@ -147,7 +147,7 @@ __attribute__((noinline)) void conv2d_nhwc_core_generic( } } -void xa_opt_quantized_conv_nhwc( +void xa_opt_quantized_conv2d_nhwc( KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -350,7 +350,7 @@ void xa_opt_quantized_conv_nhwc( } } -void quantized_conv_nhwc( +void quantized_conv2d_nhwc( const Tensor& input, const Tensor& weight, const Tensor& bias, @@ -421,7 +421,7 @@ void quantized_conv_nhwc( #undef typed_quantized_conv2d_nhwc } -void quantized_conv_nhwc_out( +void quantized_conv2d_nhwc_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -452,7 +452,7 @@ void quantized_conv_nhwc_out( optimized = 0; if (optimized) { - xa_opt_quantized_conv_nhwc( + xa_opt_quantized_conv2d_nhwc( ctx, input, weight, @@ -468,7 +468,7 @@ void quantized_conv_nhwc_out( output_zero_point, out); } else { - quantized_conv_nhwc( + quantized_conv2d_nhwc( input, weight, bias, @@ -485,7 +485,7 @@ void quantized_conv_nhwc_out( } } -void quantized_conv_nhwc_per_tensor_out( +void quantized_conv2d_nhwc_per_tensor_out( __ET_UNUSED KernelRuntimeContext& ctx, const Tensor& input, const Tensor& weight, @@ -512,7 +512,7 @@ void quantized_conv_nhwc_per_tensor_out( optimized = 0; if (optimized) { - xa_opt_quantized_conv_nhwc( + xa_opt_quantized_conv2d_nhwc( ctx, input, weight, @@ -528,7 +528,7 @@ void quantized_conv_nhwc_per_tensor_out( output_zero_point, out); } else { - quantized_conv_nhwc( + quantized_conv2d_nhwc( input, weight, bias, diff --git a/backends/cadence/hifi/operators/operators.h b/backends/cadence/hifi/operators/operators.h index 11b93f4a89c..f7f5194d91a 100644 --- a/backends/cadence/hifi/operators/operators.h +++ b/backends/cadence/hifi/operators/operators.h @@ -83,7 +83,7 @@ void quantized_linear_per_tensor_out( const ::executorch::aten::optional<::executorch::aten::Tensor>& offset, ::executorch::aten::Tensor& out); -void quantized_conv_nhwc_out( +void quantized_conv2d_nhwc_out( ::executorch::runtime::KernelRuntimeContext& ctx, const ::executorch::aten::Tensor& input, const ::executorch::aten::Tensor& weight, @@ -101,7 +101,7 @@ void quantized_conv_nhwc_out( const ::executorch::aten::Tensor& out_shift, ::executorch::aten::Tensor& out); -void quantized_conv_nchw_out( +void quantized_conv2d_nchw_out( ::executorch::runtime::KernelRuntimeContext& ctx, const ::executorch::aten::Tensor& input, const ::executorch::aten::Tensor& weight, @@ -119,7 +119,7 @@ void quantized_conv_nchw_out( const ::executorch::aten::Tensor& out_shift, ::executorch::aten::Tensor& out); -void quantized_conv_nchw_per_tensor_out( +void quantized_conv2d_nchw_per_tensor_out( ::executorch::runtime::KernelRuntimeContext& ctx, const ::executorch::aten::Tensor& input, const ::executorch::aten::Tensor& weight, @@ -137,7 +137,7 @@ void quantized_conv_nchw_per_tensor_out( int64_t out_shift, ::executorch::aten::Tensor& out); -void quantized_conv_nhwc_per_tensor_out( +void quantized_conv2d_nhwc_per_tensor_out( ::executorch::runtime::KernelRuntimeContext& ctx, const ::executorch::aten::Tensor& input, const ::executorch::aten::Tensor& weight, diff --git a/backends/cadence/hifi/operators/targets.bzl b/backends/cadence/hifi/operators/targets.bzl index fa263d4017c..ca474e8183b 100644 --- a/backends/cadence/hifi/operators/targets.bzl +++ b/backends/cadence/hifi/operators/targets.bzl @@ -63,24 +63,24 @@ OPERATORS = [ "ne", "permute_copy", "pow", - "quantized_conv_nchw_out", - "quantized_conv_nchw_asym8sxsym8s_asym8s_per_tensor_out", - "quantized_conv_nchw_asym8uxsym8u_asym8u_per_tensor_out", - "quantized_conv1d_nchw_asym8sxsym8s_asym8s_per_tensor_out", - "quantized_conv1d_nchw_asym8uxsym8u_asym8u_per_tensor_out", - "quantized_conv_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out", - "quantized_conv_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out", - "quantized_conv_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out", - "quantized_conv_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out", - "quantized_conv_nhwc_out", - "quantized_conv_nhwc_asym8sxsym8s_asym8s_per_tensor_out", - "quantized_conv_nhwc_asym8uxsym8u_asym8u_per_tensor_out", - "quantized_conv1d_nhwc_asym8sxsym8s_asym8s_per_tensor_out", - "quantized_conv1d_nhwc_asym8uxsym8u_asym8u_per_tensor_out", - "quantized_conv_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out", - "quantized_conv_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out", - "quantized_conv_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out", - "quantized_conv_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out", + "quantized_conv2d_nchw_out", + "quantized_conv2d_nchw_asym8sxsym8s_asym8s_per_tensor_out", + "quantized_conv2d_nchw_asym8uxsym8u_asym8u_per_tensor_out", + "quantized_conv1d_ncl_asym8sxsym8s_asym8s_per_tensor_out", + "quantized_conv1d_ncl_asym8uxsym8u_asym8u_per_tensor_out", + "quantized_conv2d_nchw_depthwise_asym8sxsym8s_asym8s_per_tensor_out", + "quantized_conv2d_nchw_depthwise_asym8uxsym8u_asym8u_per_tensor_out", + "quantized_conv2d_nchw_dilated_asym8sxsym8s_asym8s_per_tensor_out", + "quantized_conv2d_nchw_dilated_asym8uxsym8u_asym8u_per_tensor_out", + "quantized_conv2d_nhwc_out", + "quantized_conv2d_nhwc_asym8sxsym8s_asym8s_per_tensor_out", + "quantized_conv2d_nhwc_asym8uxsym8u_asym8u_per_tensor_out", + "quantized_conv1d_nlc_asym8sxsym8s_asym8s_per_tensor_out", + "quantized_conv1d_nlc_asym8uxsym8u_asym8u_per_tensor_out", + "quantized_conv2d_nhwc_depthwise_asym8sxsym8s_asym8s_per_tensor_out", + "quantized_conv2d_nhwc_depthwise_asym8uxsym8u_asym8u_per_tensor_out", + "quantized_conv2d_nhwc_dilated_asym8sxsym8s_asym8s_per_tensor_out", + "quantized_conv2d_nhwc_dilated_asym8uxsym8u_asym8u_per_tensor_out", "quantized_fully_connected_out", "quantized_fully_connected_asym8sxasym8s_asym8s_per_tensor_out", "quantized_fully_connected_asym8uxasym8u_asym8u_per_tensor_out", From 5348ea9503326987ccd06245be992a51420f6722 Mon Sep 17 00:00:00 2001 From: Adrian Lundell <36153706+AdrianLundell@users.noreply.github.com> Date: Wed, 17 Sep 2025 22:50:31 +0200 Subject: [PATCH 015/395] Arm backend: Support channels-last input and output Differential Revision: D82449155 Pull Request resolved: https://github.com/pytorch/executorch/pull/14259 --- .../arm/_passes/to_tosa_memory_format_pass.py | 111 +++++++--------- backends/arm/constants.py | 12 ++ .../to_dim_order_copy_support.py | 1 + backends/arm/process_node.py | 7 - backends/arm/runtime/EthosUBackend.cpp | 9 -- backends/arm/test/misc/test_dim_order.py | 123 ++++++++++++++++++ .../arm/test/misc/test_dim_order_guards.py | 67 ---------- .../arm/test/models/test_mobilenet_v2_arm.py | 17 +++ .../arm/test/models/test_torch_functions.py | 1 - .../test/passes/test_to_tosa_memory_format.py | 10 +- backends/arm/test/runner_utils.py | 108 ++++++++++----- backends/arm/test/targets.bzl | 2 +- docs/source/backends-arm-ethos-u.md | 9 ++ 13 files changed, 296 insertions(+), 181 deletions(-) create mode 100644 backends/arm/test/misc/test_dim_order.py delete mode 100644 backends/arm/test/misc/test_dim_order_guards.py diff --git a/backends/arm/_passes/to_tosa_memory_format_pass.py b/backends/arm/_passes/to_tosa_memory_format_pass.py index e4436d638f4..ac16cbaf8cb 100644 --- a/backends/arm/_passes/to_tosa_memory_format_pass.py +++ b/backends/arm/_passes/to_tosa_memory_format_pass.py @@ -9,13 +9,23 @@ import logging import torch -from executorch.backends.arm._passes import AnnotateOutputDimOrderPass +from executorch.backends.arm._passes.annotate_decomposed_matmul import ( + AnnotateDecomposedMatmulPass, +) from executorch.backends.arm._passes.arm_pass_utils import ( create_node, get_first_fake_tensor, - get_output_dim_orders, is_param_node, ) +from executorch.backends.arm.constants import ( + HWCM_ORDER, + NCHW_ORDER, + NHWC_INVERSE_ORDER, + NHWC_ORDER, + NNCHW_ORDER, + NNHWC_INVERSE_ORDER, + NNHWC_ORDER, +) from executorch.exir import ExportedProgram from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.pass_base import ExportPass, PassResult @@ -38,12 +48,6 @@ class ToTosaMemoryFormatPass(ExportPass): The annotated tosa_dim_order is used to permute the node's shape such that it gives a TOSA-compliant shape. """ - NHWC_order = (0, 2, 3, 1) - NHWC_inverse_order = (0, 3, 1, 2) - HWCM_order = (2, 3, 0, 1) - NNHWC_order = (0, 1, 3, 4, 2) - NNHWC_inverse_order = (0, 1, 4, 2, 3) - def __init__(self, exported_program: ExportedProgram) -> None: self.exported_program = exported_program super().__init__() @@ -135,9 +139,9 @@ def insert_input_transpose(node, input_node, graph_module): args=( input_node, list( - ToTosaMemoryFormatPass.NNHWC_inverse_order + NNHWC_INVERSE_ORDER if len(get_first_fake_tensor(input_node).size()) == 5 - else ToTosaMemoryFormatPass.NHWC_inverse_order + else NHWC_INVERSE_ORDER ), ), from_node=node, @@ -157,18 +161,18 @@ def insert_output_transpose(node, graph_module): args=( node, list( - ToTosaMemoryFormatPass.NNHWC_order + NNHWC_ORDER if len(get_first_fake_tensor(node).size()) == 5 - else ToTosaMemoryFormatPass.NHWC_order + else NHWC_ORDER ), ), from_node=node, ) permute_node.meta["tosa_dim_order"] = ( - ToTosaMemoryFormatPass.NNHWC_order + NNHWC_ORDER if len(get_first_fake_tensor(node).size()) == 5 - else ToTosaMemoryFormatPass.NHWC_order + else NHWC_ORDER ) node.meta["tosa_dim_order"] = tuple( range(len(get_first_fake_tensor(node).size())) @@ -218,7 +222,7 @@ def insert_tosa_transposes(self, graph_module: torch.fx.GraphModule): for node in graph_module.graph.nodes: # call_function and placeholder allowed due to # index.Tensor being able to come in as both - if node.op not in ["call_function", "placeholder", "output"]: + if node.op != "call_function": continue # Transpose views @@ -240,21 +244,33 @@ def insert_tosa_transposes(self, graph_module: torch.fx.GraphModule): graph_module, ) - # Transpose inputs - elif _is_input(node, self.exported_program): - input_shape = get_first_fake_tensor(node).size() - if len(input_shape) in (4, 5): - ToTosaMemoryFormatPass.insert_output_transpose(node, graph_module) + output_node = graph_module.graph.output_node() - # Transpose outputs - elif node.op == "output": - output_shape = get_first_fake_tensor(node).size() + # Transpose inputs if they are in (N)NCHW format + inputs = [ + n for n in graph_module.graph.nodes if _is_input(n, self.exported_program) + ] + for input_node in inputs: + input_dim_order = get_first_fake_tensor(input_node).dim_order() + if input_dim_order in (NCHW_ORDER, NNCHW_ORDER): + self.insert_output_transpose(input_node, graph_module) + + # Transpose outputs if they are in (N)NCHW format + outputs = output_node.args[0] + output_dim_orders = output_node.meta.get("original_dim_orders") + if output_dim_orders is None: + raise RuntimeError( + f"{AnnotateDecomposedMatmulPass.__name__} is required to run at the beginning of the pass pipeline when using {ToTosaMemoryFormatPass.__name__}." + ) - if len(output_shape) in (4, 5): - for input_node in node.all_input_nodes: - ToTosaMemoryFormatPass.insert_input_transpose( - node, input_node, graph_module - ) + for output_node_input, output_dim_order in zip(outputs, output_dim_orders): # type: ignore[arg-type] + if output_dim_order in ( + NCHW_ORDER, + NNCHW_ORDER, + ): + self.insert_input_transpose( + output_node, output_node_input, graph_module + ) def remove_dim_order_kwargs( self, graph_module: torch.fx.GraphModule, node: torch.fx.Node @@ -277,17 +293,17 @@ def call(self, graph_module: torch.fx.GraphModule): node_data = get_first_fake_tensor(node).data self.remove_dim_order_kwargs(graph_module, node) - # Inputs and outputs are always in (N)NCHW format + # Inputs and outputs may vary in dim_order if _is_input(node, self.exported_program) or node.op == "output": - dim_order = tuple(range(node_data.dim())) + dim_order = node_data.dim_order() elif node_data.dim() == 4: - dim_order = self.NHWC_order + dim_order = NHWC_ORDER if self.is_weight_node_for_depthwise_conv2d(node): # The weights of TOSA DEPTHWISE_CONV2D have shape (H, W, C, M) which corresponds to # dim_order = (2, 3, 0, 1) (https://www.mlplatform.org/tosa/tosa_spec.html#_depthwise_conv2d). - dim_order = self.HWCM_order + dim_order = HWCM_ORDER elif node_data.dim() == 5: - dim_order = self.NNHWC_order + dim_order = NNHWC_ORDER else: dim_order = tuple(range(node_data.dim())) # type: ignore[assignment] @@ -300,32 +316,3 @@ def call(self, graph_module: torch.fx.GraphModule): graph_module = super().call(graph_module).graph_module return PassResult(graph_module, True) - - def requires(self, graph_module) -> None: - """ - This is the only pass which handles dim_orders, so verify that the output dim_orders has not changed since the beginning of the lowering pipeline. - """ - - dim_orders = get_output_dim_orders(graph_module) - original_dim_orders = graph_module.graph.output_node().meta.get( - "original_dim_orders" - ) - output_node = graph_module.graph.output_node() - - if original_dim_orders is None: - raise RuntimeError( - f"{AnnotateOutputDimOrderPass.__name__} must be run in the beginning of the pass pipeline to verify that the dim order has not changed unexpectedly during its run." - ) - - if len(dim_orders) != len(original_dim_orders): - raise RuntimeError( - f"The number of outputs has changed since {AnnotateOutputDimOrderPass.__name__} was run." - ) - - for node, dim_order, original_dim_order in zip( - output_node.args[0], dim_orders, original_dim_orders - ): - if dim_order != original_dim_order: - raise RuntimeError( - f"The dim order of output {node.name} has changed from {original_dim_order} to {dim_order} since {AnnotateOutputDimOrderPass.__name__} was run." - ) diff --git a/backends/arm/constants.py b/backends/arm/constants.py index fd8710d3ead..b9995410b23 100644 --- a/backends/arm/constants.py +++ b/backends/arm/constants.py @@ -29,3 +29,15 @@ DEQUANT_PER_TENSOR_OP_T, ) PER_CHANNEL_QDQ_OPS: Final = (QUANT_PER_CHANNEL_OP, DEQUANT_PER_CHANNEL_OP) + +NHWC_ORDER: Final = (0, 2, 3, 1) +NHWC_INVERSE_ORDER: Final = (0, 3, 1, 2) +NNHWC_ORDER: Final = (0, 1, 3, 4, 2) +NNHWC_INVERSE_ORDER: Final = (0, 1, 4, 2, 3) + +NCHW_ORDER: Final = (0, 1, 2, 3) +NCHW_INVERSE_ORDER: Final = (0, 2, 3, 1) +NNCHW_ORDER: Final = (0, 1, 2, 3, 4) +NNCHW_INVERSE_ORDER: Final = (0, 1, 3, 4, 2) + +HWCM_ORDER: Final = (2, 3, 0, 1) diff --git a/backends/arm/operator_support/to_dim_order_copy_support.py b/backends/arm/operator_support/to_dim_order_copy_support.py index e21f8a68ad6..ced9b7c5afc 100644 --- a/backends/arm/operator_support/to_dim_order_copy_support.py +++ b/backends/arm/operator_support/to_dim_order_copy_support.py @@ -89,6 +89,7 @@ def _merge_supported_types( torch.int32, torch.bfloat16, torch.float16, + torch.float32, ], } ALL_SUPPORTED_TYPES = _merge_supported_types( diff --git a/backends/arm/process_node.py b/backends/arm/process_node.py index 9ca435c60c5..5093ea32d4c 100644 --- a/backends/arm/process_node.py +++ b/backends/arm/process_node.py @@ -70,13 +70,6 @@ def process_inputs( tosa_spec: TosaSpecification, ): """Serialize an input node""" - # inputs need to be in default dim_order (contiguous memory format) - meta = node.meta["val"] - if meta.dim_order() != tuple(range(meta.dim())): - raise RuntimeError( - f"Arm backend only supports contiguous memory format for inputs. " - f"Expected dim_order: {tuple(range(meta.dim()))}, but got: {meta.dim_order()} for node {node.name}" - ) try: tosa_arg = TosaArg(node, tosa_spec) except ValueError as e: diff --git a/backends/arm/runtime/EthosUBackend.cpp b/backends/arm/runtime/EthosUBackend.cpp index 8f63569eece..08589c34c69 100644 --- a/backends/arm/runtime/EthosUBackend.cpp +++ b/backends/arm/runtime/EthosUBackend.cpp @@ -249,15 +249,6 @@ class EthosUBackend final : public ::executorch::runtime::BackendInterface { handles.inputs->io[i].elem_size); return Error::InvalidProgram; } - supported = executorch::runtime::is_contiguous_dim_order( - tensor_in.dim_order().data(), tensor_in.dim()); - if (!supported) { - ET_LOG( - Error, - "Input %d expected contiguous dim_order, but got non-contiguous dim_order", - i); - return Error::InvalidProgram; - } // Select a compatible copy routine including checking for input layouts // which require permutation. diff --git a/backends/arm/test/misc/test_dim_order.py b/backends/arm/test/misc/test_dim_order.py new file mode 100644 index 00000000000..6b0b79add99 --- /dev/null +++ b/backends/arm/test/misc/test_dim_order.py @@ -0,0 +1,123 @@ +# Copyright 2024-2025 Arm Limited and/or its affiliates. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + + +from typing import Tuple + +import torch +from executorch.backends.arm.test import common + +from executorch.backends.arm.test.tester.test_pipeline import ( + EthosU55PipelineINT, + EthosU85PipelineINT, + TosaPipelineFP, + TosaPipelineINT, +) + + +input_t1 = Tuple[torch.Tensor] # Input x + + +class ChannelsLastInput(torch.nn.Module): + """ + Test a complex case with (channels last, channels first) input, + and (channels first, channels last) output. + """ + + inputs: input_t1 = ( + torch.arange(1, 25, dtype=torch.float32) + .reshape((1, 2, 3, 4)) + .to(memory_format=torch.channels_last), + torch.arange(1, 25, dtype=torch.float32).reshape((1, 2, 3, 4)), + ) + + def forward(self, x, y): + x = x * x + return y, x + + +class ChannelsFirstOutput(torch.nn.Module): + """ + Test coverting to channels_first inside the delegate. + """ + + inputs: input_t1 = ( + torch.arange(1, 25, dtype=torch.float32) + .reshape((1, 2, 3, 4)) + .to(memory_format=torch.channels_last), + ) + + def forward(self, x): + x = x.clone(memory_format=torch.contiguous_format) * x + return x + + +class ChannelsLastOutput(torch.nn.Module): + """ + Test changing of dim_order inside the delegate. + """ + + inputs: input_t1 = (torch.arange(1, 9, dtype=torch.float32).reshape((1, 2, 2, 2)),) + + def forward(self, x): + x = x * x + x = x.clone(memory_format=torch.channels_last) + return x + + +class ChannelsLastInsidePartition(torch.nn.Module): + """ + Test dim_order changes inside the partiton, but no dim_order changes at input/output. + """ + + inputs: input_t1 = (torch.randn((1, 2, 3, 3)),) + + def __init__(self): + super().__init__() + self.conv2d = torch.nn.Conv2d(in_channels=2, out_channels=2, kernel_size=(3, 3)) + + def forward(self, x): + return ( + self.conv2d(x.clone(memory_format=torch.channels_last)).clone( + memory_format=torch.contiguous_format + ) + * 1 + ) + + +test_modules = { + "channels_last_input": ChannelsLastInput, + "channels_first_output": ChannelsFirstOutput, + "channels_last_output": ChannelsLastOutput, + "channels_last_inside_partition": ChannelsLastInsidePartition, +} + + +@common.parametrize("module", test_modules) +def test_dim_order_tosa_FP(module): + pipeline = TosaPipelineFP[input_t1](module(), module.inputs, []) + pipeline.run() + + +@common.parametrize("module", test_modules) +def test_dim_order_tosa_INT(module): + pipeline = TosaPipelineINT[input_t1]( + module(), module.inputs, [], symmetric_io_quantization=True + ) + pipeline.run() + + +@common.XfailIfNoCorstone300 +@common.parametrize("module", test_modules) +def test_dim_order_u55_INT(module): + pipeline = EthosU55PipelineINT[input_t1](module(), module.inputs, []) + pipeline.run() + + +@common.XfailIfNoCorstone320 +@common.parametrize("module", test_modules) +def test_dim_order_u85_INT(module): + pipeline = EthosU85PipelineINT[input_t1](module(), module.inputs, []) + pipeline.run() diff --git a/backends/arm/test/misc/test_dim_order_guards.py b/backends/arm/test/misc/test_dim_order_guards.py deleted file mode 100644 index 80a3c014abc..00000000000 --- a/backends/arm/test/misc/test_dim_order_guards.py +++ /dev/null @@ -1,67 +0,0 @@ -# Copyright 2024-2025 Arm Limited and/or its affiliates. -# -# This source code is licensed under the BSD-style license found in the -# LICENSE file in the root directory of this source tree. - - -from typing import Tuple - -import pytest - -import torch -from executorch.backends.arm.test import common - -from executorch.backends.arm.test.tester.test_pipeline import ( - TosaPipelineFP, - TosaPipelineINT, -) - - -input_t1 = Tuple[torch.Tensor] # Input x - - -class Conv2D(torch.nn.Module): - inputs: dict[str, input_t1] = { - "randn": (torch.randn(1, 2, 20, 20).to(memory_format=torch.channels_last),), - } - - def __init__(self): - super().__init__() - self.conv2d = torch.nn.Conv2d(in_channels=2, out_channels=3, kernel_size=(3, 3)) - - def forward(self, x): - return self.conv2d(x) - - -@common.parametrize("test_data", Conv2D.inputs) -def test_tosa_FP_pipeline(test_data: input_t1): - module = Conv2D() - pipeline = TosaPipelineFP[input_t1]( - module, - test_data, - [], - [], - use_to_edge_transform_and_lower=False, - ) - pos = pipeline.find_pos("partition") - pipeline._stages = pipeline._stages[:pos] - pipeline.run() - with pytest.raises(RuntimeError): - pipeline.tester.partition() - - -@common.parametrize("test_data", Conv2D.inputs) -def test_tosa_INT_pipeline(test_data: input_t1): - module = Conv2D() - pipeline = TosaPipelineINT[input_t1]( - module, - test_data, - [], - [], - use_to_edge_transform_and_lower=False, - ) - pos = pipeline.find_pos("partition") - pipeline._stages = pipeline._stages[:pos] - pipeline.run() - with pytest.raises(RuntimeError): - pipeline.tester.partition() diff --git a/backends/arm/test/models/test_mobilenet_v2_arm.py b/backends/arm/test/models/test_mobilenet_v2_arm.py index d4e3bbc8e28..84de432155e 100644 --- a/backends/arm/test/models/test_mobilenet_v2_arm.py +++ b/backends/arm/test/models/test_mobilenet_v2_arm.py @@ -46,6 +46,23 @@ def test_mv2_tosa_FP(): pipeline.run() +def test_mv2_tosa_FP_channels_last(): + input_tensor = model_inputs[0].to(memory_format=torch.channels_last) + pipeline = TosaPipelineFP[input_t]( + mv2, + (input_tensor,), + aten_op=[], + exir_op=[], + use_to_edge_transform_and_lower=True, + ) + # Changing memory format leads to an unsupported as_strided_copy op being inserted into the graph, + # leading to a graph break. + pipeline.change_args( + "check_count.exir", {"torch.ops.higher_order.executorch_call_delegate": 2} + ) + pipeline.run() + + @common.parametrize("per_channel_quantization", quant_test_data) def test_mv2_tosa_INT(per_channel_quantization): pipeline = TosaPipelineINT[input_t]( diff --git a/backends/arm/test/models/test_torch_functions.py b/backends/arm/test/models/test_torch_functions.py index 580438f6da8..de45dbe0356 100644 --- a/backends/arm/test/models/test_torch_functions.py +++ b/backends/arm/test/models/test_torch_functions.py @@ -101,7 +101,6 @@ def forward(self, *args): "Requires dynamic output shape.", "topk": "NotImplementedError: No registered serialization name for found", "sort": "NotImplementedError: No registered serialization name for found", - "norm": "An error occurred when running the 'KeepDimsFalseToSqueezePass' pass after the following passes:", }, ) def test_torch_fns_FP(test_data): diff --git a/backends/arm/test/passes/test_to_tosa_memory_format.py b/backends/arm/test/passes/test_to_tosa_memory_format.py index 1e9b8ffc63d..643a3bf5733 100644 --- a/backends/arm/test/passes/test_to_tosa_memory_format.py +++ b/backends/arm/test/passes/test_to_tosa_memory_format.py @@ -6,7 +6,10 @@ from typing import Tuple import torch -from executorch.backends.arm._passes import ToTosaMemoryFormatPass +from executorch.backends.arm._passes import ( + AnnotateOutputDimOrderPass, + ToTosaMemoryFormatPass, +) from executorch.backends.arm.test import common from executorch.backends.arm.test.tester.test_pipeline import ( @@ -177,7 +180,10 @@ def test_to_tosa_memory_format_tosa_INT(module): ops_after_pass=module.ops_after_pass, ops_not_after_pass=module.ops_not_after_pass, pass_list=[RemoveGetItemPass], - passes_with_exported_program=[ToTosaMemoryFormatPass], + passes_with_exported_program=[ + AnnotateOutputDimOrderPass, + ToTosaMemoryFormatPass, + ], ) pipeline.pop_stage( "run_method_and_compare_outputs" diff --git a/backends/arm/test/runner_utils.py b/backends/arm/test/runner_utils.py index 1b59b186a2e..3d002eff25e 100644 --- a/backends/arm/test/runner_utils.py +++ b/backends/arm/test/runner_utils.py @@ -13,11 +13,19 @@ from pathlib import Path +from types import NoneType from typing import Any, cast, Dict, List, Literal, Optional, Tuple import numpy as np import torch +from executorch.backends.arm._passes.arm_pass_utils import get_first_fake_tensor from executorch.backends.arm.common.arm_compile_spec import ArmCompileSpec +from executorch.backends.arm.constants import ( + NHWC_INVERSE_ORDER, + NHWC_ORDER, + NNHWC_INVERSE_ORDER, + NNHWC_ORDER, +) from executorch.backends.arm.ethosu import EthosUCompileSpec from executorch.backends.arm.test.conftest import is_option_enabled @@ -157,6 +165,36 @@ def get_output_quantization_params( return quant_params +def torch_tensor_to_numpy(tensor: torch.Tensor) -> np.ndarray: + dtype = _torch_to_numpy_dtype_dict[tensor.dtype] + array = tensor.detach().numpy().astype(dtype) + dim_order = tensor.dim_order() + if dim_order == NHWC_ORDER: + a = array.transpose(NHWC_ORDER) + return a + elif dim_order == NNHWC_ORDER: + return array.transpose(NNHWC_ORDER) + else: + return array + + +def numpy_to_torch_tensor(array: np.ndarray, output_node: Node) -> torch.Tensor: + output_tensor = get_first_fake_tensor(output_node) + shape = output_tensor.shape + dim_order = output_tensor.dim_order() + if dim_order == NHWC_ORDER: + shape_with_dim_order = [shape[i] for i in NHWC_ORDER] + tensor = torch.from_numpy(array).reshape(shape_with_dim_order) + return tensor.permute(NHWC_INVERSE_ORDER).to(memory_format=torch.channels_last) + elif dim_order == NNHWC_ORDER: + shape_with_dim_order = [shape[i] for i in NNHWC_ORDER] + tensor = torch.from_numpy(array).reshape(shape_with_dim_order) + return tensor.permute(NNHWC_INVERSE_ORDER).to(memory_format=torch.channels_last) + else: + tensor = torch.from_numpy(array).reshape(shape) + return tensor + + class TosaReferenceModelDispatch(TorchFunctionMode): """A context manager for executing call_delegate nodes using the reference model""" @@ -168,7 +206,8 @@ def _tosa_dispatch(self, lowered_backend_module: LoweredBackendModule, inputs): tosa_buffer = lowered_backend_module.processed_bytes compile_spec = TosaCompileSpec.from_list(lowered_backend_module.compile_specs) - return run_tosa_graph(tosa_buffer, compile_spec.tosa_spec, inputs) + output_node = lowered_backend_module.original_module.graph.output_node() + return run_tosa_graph(tosa_buffer, compile_spec.tosa_spec, inputs, output_node) def __exit__(self, exc_type, exc_val, exc_tb): super().__exit__(exc_type, exc_val, exc_tb) @@ -190,6 +229,22 @@ def __torch_function__(self, func, types, args=..., kwargs=None): ) kwargs = kwargs or {} + + # This is a hack since Q/DQ ops does not handle channels last input correctly: the simplest and most robust + # workaround is to simply run them in channels first format and then convert back to channels last. + if func in ( + torch.ops.quantized_decomposed.quantize_per_tensor.out, + torch.ops.quantized_decomposed.dequantize_per_tensor.out, + torch.ops.quantized_decomposed.quantize_per_channel.out, + torch.ops.quantized_decomposed.dequantize_per_channel.out, + ): + + input_dim_order = args[0].dim_order() + if input_dim_order in (NHWC_ORDER, NNHWC_ORDER): + args = [args[0].to(memory_format=torch.contiguous_format), *args[1:]] + res = func(*args, **kwargs) + return res.to(memory_format=torch.channels_last) + return func(*args, **kwargs) @@ -244,14 +299,13 @@ def get_output_from_file( output_np = [] output_node = exported_program.graph_module.graph.output_node() for i, node in enumerate(output_node.args[0]): - output_shape = node.meta["val"].shape output_dtype = node.meta["val"].dtype tosa_ref_output = np.fromfile( os.path.join(intermediate_path, f"{output_base_name}-{i}.bin"), _torch_to_numpy_dtype_dict[output_dtype], ) - output_np.append(torch.from_numpy(tosa_ref_output).reshape(output_shape)) + output_np.append(numpy_to_torch_tensor(tosa_ref_output, node)) return tuple(output_np) @@ -437,11 +491,14 @@ def prep_data_for_save( quant_param: Optional[QuantizationParams] = None, ): if isinstance(data, torch.Tensor): - data_np = np.array(data.detach(), order="C").astype( - _torch_to_numpy_dtype_dict[data.dtype] - ) + data_np = torch_tensor_to_numpy(data) + elif isinstance(data, (int, float, bool, NoneType)): + return np.array(data) else: - data_np = np.array(data) + raise RuntimeError( + f"Input dtype {type(data)} could not be converted to numpy array." + ) + if quant_param is not None: assert quant_param.node_name in input_name, ( f"The quantization params name '{quant_param.node_name}' does not " @@ -455,30 +512,8 @@ def prep_data_for_save( f"{quant_param.dtype}".replace("torch.", "") ) # Use string format of dtype to convert to numpy dtype ) - return data_np - - -def save_npy( - path: str, - data, - input_name: str, - quant_param: Optional[QuantizationParams] = None, -) -> str: - """Serializes and saves 'data' as a .npy file, possibly quantizing it before. - - Parameters: - path: the directory where to save the data. - data: the data to save. - input_name: the name of the file, without file-ending. - quant_param: the parameters to use for quantization. - Returns: - the full file path of the output. - """ - data_np = prep_data_for_save(data, input_name, quant_param) - file_path = os.path.join(path, input_name + ".npy") - np.save(file_path, data_np, allow_pickle=False) - return file_path + return data_np def save_bytes( @@ -691,9 +726,12 @@ def run_tosa_graph( graph: Any, tosa_version: TosaSpecification, inputs: list[torch.Tensor], + output_node: Node, ) -> list[torch.Tensor]: """Runs the TOSA reference model with inputs and returns the result.""" - inputs_np = [input.numpy() for input in inputs] + + # Convert tensors to numpy arrays with correct dim_order + inputs_np = [torch_tensor_to_numpy(input_tensor) for input_tensor in inputs] if isinstance(tosa_version, Tosa_1_00): import tosa_reference_model as reference_model @@ -715,7 +753,13 @@ def run_tosa_graph( status == reference_model.GraphStatus.TOSA_VALID ), "Non-valid TOSA given to reference model." - return [torch.from_numpy(output) for output in outputs_np] + # Convert output numpy arrays to tensors with same dim_order as the output nodes + result = [ + numpy_to_torch_tensor(output_array, node) + for output_array, node in zip(outputs_np, output_node.args[0]) + ] + + return result def get_target_board(compile_spec: ArmCompileSpec) -> str | None: diff --git a/backends/arm/test/targets.bzl b/backends/arm/test/targets.bzl index f240855cdf4..7634eed7a53 100644 --- a/backends/arm/test/targets.bzl +++ b/backends/arm/test/targets.bzl @@ -39,7 +39,7 @@ def define_arm_tests(): "misc/test_bn_relu_folding_qat.py", "misc/test_custom_partition.py", "misc/test_debug_hook.py", - "misc/test_dim_order_guards.py", + "misc/test_dim_order.py", "misc/test_outputs_order.py", ] diff --git a/docs/source/backends-arm-ethos-u.md b/docs/source/backends-arm-ethos-u.md index 9b3d02b21c1..0a5d1dded74 100644 --- a/docs/source/backends-arm-ethos-u.md +++ b/docs/source/backends-arm-ethos-u.md @@ -273,5 +273,14 @@ non delegated Aten ops manually by setting `EXECUTORCH_SELECT_OPS_LIST`. To enab when building the executor_runner. +## Memory formats + +Tensors of rank 4 and higher have two differing [memory format](https://pytorch.org/blog/tensor-memory-format-matters/) standards used. +Pytorch defaults to contiguous/ channels first/ NCHW memory formats, compared to TOSA which only supports channels last/NHWC memory format. +To support this, the backend inserts a transpose in the beginning if the incoming memory format is contiguous, and correspondingly a +transpose in the end if the outgoing memory format is contiguous. Note that this means that you may avoid transposing the data unneccessarily if the runtime integration and +full network is converted to use channels last. A word of caution must be given here however - changing memory format has been noted to have side effects such as +unsupported ops being inserted into the graph, and it is currently not widely tested, so the feature must so far be viewed as experimental. + ## See Also - [Arm Ethos-U Backend Tutorial](tutorial-arm.md) \ No newline at end of file From ed179c0acceb27e37b869025bb9359fd2ebfbfac Mon Sep 17 00:00:00 2001 From: Andrew Grebenisan <33402477+DrJessop@users.noreply.github.com> Date: Wed, 17 Sep 2025 14:16:28 -0700 Subject: [PATCH 016/395] Ref implementations interface fixes Differential Revision: D82566217 Pull Request resolved: https://github.com/pytorch/executorch/pull/14357 --- backends/cadence/aot/TARGETS | 1 + backends/cadence/aot/ref_implementations.py | 88 +++++++++++-------- .../aot/tests/test_ref_implementations.py | 65 ++++++++++---- 3 files changed, 102 insertions(+), 52 deletions(-) diff --git a/backends/cadence/aot/TARGETS b/backends/cadence/aot/TARGETS index b54f1ac3ba6..16d88512b96 100644 --- a/backends/cadence/aot/TARGETS +++ b/backends/cadence/aot/TARGETS @@ -130,6 +130,7 @@ runtime.python_library( deps = [ "fbcode//caffe2:torch", "fbcode//executorch/exir:scalar_type", + "fbcode//executorch/kernels/quantized:custom_ops_generated_lib", ], ) diff --git a/backends/cadence/aot/ref_implementations.py b/backends/cadence/aot/ref_implementations.py index 5530b7c8117..fe012837870 100644 --- a/backends/cadence/aot/ref_implementations.py +++ b/backends/cadence/aot/ref_implementations.py @@ -6,16 +6,17 @@ # pyre-strict - from typing import Callable import torch +import torch.nn as nn +import torch.nn.functional as F from executorch.exir.scalar_type import ScalarType from torch.library import impl, Library - m = Library("cadence", "IMPL", "CompositeExplicitAutograd") +torch.ops.load_library("//executorch/kernels/quantized:custom_ops_generated_lib") qdtype_map: dict[ScalarType, torch.dtype] = { ScalarType.QINT8: torch.qint8, @@ -38,7 +39,7 @@ def quantize_per_tensor( Args: - input_tensor (Tensor): input tensor - - scale (float): Inverse of quantization scale. Derived from the ratio + - scale (float): Quantization scale. Derived from the ratio between the min/max of the floating-point tensor and the min/max of the quantized range, and then inverted. - zero_point (int): The point which represents 0 in the quantized @@ -64,10 +65,13 @@ def quantize_per_tensor( f"Unsupported dtype to quantize to. Supported dtypes must be one of {supported_quant_types}" ) - quantized = torch.round(input_tensor * scale + zero_point).to(dtype) - return torch.max( - torch.min(quantized, torch.tensor(quant_max)), - torch.tensor(quant_min), + return torch.ops.quantized_decomposed.quantize_per_tensor( + input_tensor, + scale, + zero_point, + quant_min, + quant_max, + dtype, ) @@ -97,7 +101,7 @@ def dequantize_per_tensor( is already provided. - quant_max (int): The largest value in the quantized domain. Unused since scale is already provided. - - dtype (torch.dtype): The type of the output tensor. Must be a floating point type. + - dtype (torch.dtype): The type of the input tensor. """ supported_quant_types = [ torch.int8, @@ -108,23 +112,15 @@ def dequantize_per_tensor( ] if input_tensor.dtype not in supported_quant_types: raise ValueError(f"Input dtype must be one of {supported_quant_types}") - supported_dequant_types = [ - torch.float, - torch.float32, - torch.float16, - torch.bfloat16, - ] - if dtype not in supported_dequant_types: - raise ValueError( - f"Unsupported dtype to dequantize to. Supported dtypes must be one of {supported_dequant_types}" - ) - - # Needed to prevent underflow in cases where the zero_point is larger than - # the quantized value. - if not input_tensor.dtype.is_signed: - input_tensor = input_tensor.to(torch.int32) - - return (input_tensor - zero_point).to(dtype) * scale + if input_tensor.dtype != dtype: + raise ValueError("Input dtype must match dtype") + + # Use the reference implementation from torch quantized_decomposed library + # Unlike quantize_per_tensor, dequantize_per_tensor doesn't have a behavior + # difference, since there's no rounding algorithm (just arithmetic). + return torch.ops.quantized_decomposed.dequantize_per_tensor( + input_tensor, scale, zero_point, quant_min, quant_max, dtype + ) @impl(m, "quantized_add.per_tensor") @@ -180,12 +176,10 @@ def quantized_add_per_tensor( dequant_X = X_scale * (X - X_zero_point) dequant_Y = Y_scale * (Y - Y_zero_point) - out_scale_inv = 1 / out_scale - # q_min/q_max are unused args return quantize_per_tensor( dequant_X + dequant_Y, - out_scale_inv, + out_scale, out_zero_point, torch.iinfo(dtype).min, torch.iinfo(dtype).max, @@ -259,8 +253,7 @@ def quantized_linear_common( - out_zero_point (int): The quantized mapping of zero for the output - offset (Tensor): Unused """ - out_scale = -out_multiplier * (1 / (1 << 31)) * (2**out_shift) - out_scale_inv = 1 / out_scale + out_scale = 1.0 / (-out_multiplier * (1 / (1 << 31)) * (2**out_shift)) N, K = weight.shape @@ -281,7 +274,7 @@ def quantized_linear_common( ) return quantize_per_tensor( out, - out_scale_inv, + out_scale, out_zero_point, torch.iinfo(dtype).min, torch.iinfo(dtype).max, @@ -399,6 +392,17 @@ def quantized_fully_connected_asym8sxasym8s_asym8s_per_tensor() -> torch.Tensor: def quantized_fully_connected_asym8uxasym8u_asym8u_per_tensor() -> torch.Tensor: ... +@impl(m, "fully_connected") +def fully_connected( + input_tensor: torch.Tensor, + weight: torch.Tensor, + bias: torch.Tensor, +) -> torch.Tensor: + if input_tensor.shape[0] != 1: + raise ValueError("Fully connected linear only supports batch size of 1") + return F.linear(input_tensor, weight, bias) + + @impl(m, "quantized_matmul") def quantized_matmul( X: torch.Tensor, @@ -538,7 +542,7 @@ def quantized_layer_norm_per_tensor( ) float_input_tensor = dequantize_per_tensor( - input_tensor, X_scale, X_zero_point, -128, 127, torch.float32 + input_tensor, X_scale, X_zero_point, -128, 127, input_tensor.dtype ) out = torch.nn.functional.layer_norm( float_input_tensor, normalized_shape, weight, bias, eps=eps @@ -546,7 +550,7 @@ def quantized_layer_norm_per_tensor( return quantize_per_tensor( out, - 1 / output_scale, + output_scale, output_zero_point, torch.iinfo(input_tensor.dtype).min, torch.iinfo(input_tensor.dtype).max, @@ -615,7 +619,7 @@ def quantized_conv_per_tensor( return quantize_per_tensor( float_out, - 1.0 / output_scale, + output_scale, output_zero_point, torch.iinfo(input_tensor.dtype).min, torch.iinfo(input_tensor.dtype).max, @@ -950,8 +954,10 @@ def quantized_relu_common( if X.dtype not in supported_dtypes: raise ValueError(f"X dtype must be one of {supported_dtypes}. Got {X.dtype}") - out_scale = -out_multiplier * (1 / (1 << 31)) * (2**out_shift) - dequantized_X = torch.where(X > X_zero_point, X - X_zero_point, torch.zeros_like(X)) + out_scale = 1.0 / (-out_multiplier * (1 / (1 << 31)) * (2**out_shift)) + dequantized_X = torch.where( + X > X_zero_point, X - X_zero_point, torch.zeros_like(X) + ).to(torch.float32) return quantize_per_tensor( dequantized_X, out_scale, @@ -1076,3 +1082,13 @@ def requantize( out_quant_max, dtype, ) + + +@impl(m, "rms_norm") +def rms_norm( + X: torch.Tensor, + normalized_shape: tuple[int], + W: torch.Tensor, + eps: float, +) -> torch.Tensor: + return W * nn.RMSNorm(list(normalized_shape), eps=eps, dtype=X.dtype)(X) diff --git a/backends/cadence/aot/tests/test_ref_implementations.py b/backends/cadence/aot/tests/test_ref_implementations.py index 2589bd88601..bc025f4c894 100644 --- a/backends/cadence/aot/tests/test_ref_implementations.py +++ b/backends/cadence/aot/tests/test_ref_implementations.py @@ -36,12 +36,11 @@ def test_quantize_per_tensor( ) -> None: input_tensor = torch.tensor([input_value]) scale = (f_max - f_min) / (q_max - q_min) - inv_scale = 1.0 / scale - zero_point = round(-f_min * inv_scale) + q_min + zero_point = round(-f_min * 1 / scale) + q_min expected_output = torch.tensor([expected_value], dtype=target_dtype) output = torch.ops.cadence.quantize_per_tensor( - input_tensor, inv_scale, zero_point, q_min, q_max, target_dtype + input_tensor, scale, zero_point, q_min, q_max, target_dtype ) self.assertEqual( @@ -85,7 +84,7 @@ def test_dequantize_per_tensor( expected_output = torch.tensor([expected_value], dtype=torch.float32) output = torch.ops.cadence.dequantize_per_tensor( - input_tensor, scale, zero_point, q_min, q_max, torch.float32 + input_tensor, scale, zero_point, q_min, q_max, input_tensor.dtype ) self.assertEqual( @@ -175,7 +174,7 @@ def test_quantized_add( ), # out_multiplier (0.5 * 2^31) torch.tensor([0], dtype=torch.int64), # out_shift 0, # out_zero_point - torch.tensor([[-2]], dtype=dtype), # expected_output + torch.tensor([[0]], dtype=dtype), # expected_output per_tensor, False, False, @@ -200,7 +199,7 @@ def test_quantized_add( ), # out_multiplier (0.5 * 2^31) torch.tensor([0], dtype=torch.int64), # out_shift 0, # out_zero_point - torch.tensor([[-10, -30]], dtype=dtype), # expected_output + torch.tensor([[-2, -8]], dtype=dtype), # expected_output per_tensor, False, False, @@ -208,6 +207,28 @@ def test_quantized_add( for (per_tensor, dtype) in ( (False, torch.int8), (True, torch.int8), + ) + ], + *[ + ( + torch.Size([1, 3]), # src_shape: 1 sample, 3 input features + torch.Size( + [2, 3] + ), # weight_shape: 2 output features, 3 input features + 0, # in_zero_point + torch.tensor([0, 0, 0], dtype=dtype), # weight_zero_point + torch.tensor( + [1073741824], dtype=torch.int32 + ), # out_multiplier (0.5 * 2^31) + torch.tensor([0], dtype=torch.int64), # out_shift + 0, # out_zero_point + torch.tensor([[0, 0]], dtype=dtype), # expected_output + per_tensor, + False, + False, + ) + for (per_tensor, dtype) in ( + (False, torch.uint8), (True, torch.uint8), ) ], @@ -226,7 +247,7 @@ def test_quantized_add( torch.tensor([0], dtype=torch.int64), # out_shift 0, # out_zero_point torch.tensor( - [[[-2, -8, -14], [-6, -28, -50]]], dtype=dtype + [[[0, -2, -4], [-2, -7, -12]]], dtype=dtype ), # expected_output per_tensor, False, @@ -235,7 +256,6 @@ def test_quantized_add( for (per_tensor, dtype) in ( (False, torch.int8), (True, torch.int8), - (True, torch.uint8), ) ], # Test case 4: Non-zero zero points @@ -252,7 +272,7 @@ def test_quantized_add( ), # out_multiplier (1.0 * 2^31) torch.tensor([0], dtype=torch.int64), # out_shift 1, # out_zero_point - torch.tensor([[-15, 25]], dtype=dtype), # expected_output + torch.tensor([[1, 1]], dtype=dtype), # expected_output per_tensor, False, False, @@ -260,7 +280,7 @@ def test_quantized_add( for (per_tensor, dtype) in ( (False, torch.int8), (True, torch.int8), - (True, torch.uint8), + # (True, torch.uint8), ) ], # Test case 5: Non-uniform weight zero points @@ -277,12 +297,12 @@ def test_quantized_add( ), # out_multiplier (1.0 * 2^31) torch.tensor([0], dtype=torch.int64), # out_shift 1, # out_zero_point - torch.tensor([[-23, 17]], dtype=dtype), # expected_output + torch.tensor([[1, 1]], dtype=dtype), # expected_output False, False, False, ) - for dtype in (torch.int8, torch.uint8) + for dtype in (torch.int8,) ], # Test case 6: Non-zero out_shift (shift=1) *[ @@ -300,7 +320,7 @@ def test_quantized_add( [1], dtype=torch.int64 ), # out_shift (shift=1, doubles the scale) 1, # out_zero_point - torch.tensor([[-7, 13]], dtype=dtype), # expected_output + torch.tensor([[1, 2]], dtype=dtype), # expected_output per_tensor, False, False, @@ -322,13 +342,13 @@ def test_quantized_add( [1], dtype=torch.int64 ), # out_shift (shift=1, doubles the scale) 1, # out_zero_point - torch.tensor([[-7, 17]], dtype=dtype), # expected_output + torch.tensor([[1, 2]], dtype=dtype), # expected_output per_tensor, matmul, transposed_matmul, ) for (matmul, transposed_matmul) in ((True, False), (True, True)) - for (per_tensor, dtype) in ((True, torch.int8), (True, torch.uint8)) + for (per_tensor, dtype) in ((True, torch.int8),) ], ] ) @@ -1045,7 +1065,20 @@ def test_quantized_conv_per_tensor( [4, 2, 0, -2], dtype=dtype ), # expected: relu(1,3,5,7) = (1,3,5,7) * (-1.0) + 5 = (4,2,0,-2) ) - for dtype in [torch.int8, torch.uint8] + for dtype in [torch.int8] + ], + *[ + ( + "positive_with_shift_unsigned", + torch.tensor([2, 4, 6, 8], dtype=dtype), # input + 1, # X_zero_point + 5, # out_zero_point + 1073741824, # out_multiplier (0.5 * 2^31) + 1, # out_shift (multiply by 2^1 = 2) + dtype, # dtype + torch.tensor([4, 2, 0, 0], dtype=dtype), + ) + for dtype in [torch.uint8] ], # Test case 4: Non-per-tensor *[ From e1ea74fdb38ba251c03ab307d925a258f28c1dcd Mon Sep 17 00:00:00 2001 From: Ethan Ng Date: Wed, 17 Sep 2025 14:28:38 -0700 Subject: [PATCH 017/395] Enforce tensor a dtype == tensor b dtype for where.out in facto Differential Revision: D82577515 Pull Request resolved: https://github.com/pytorch/executorch/pull/14352 --- backends/cadence/utils/facto_util.py | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/backends/cadence/utils/facto_util.py b/backends/cadence/utils/facto_util.py index 173f543a46e..a09f3578391 100644 --- a/backends/cadence/utils/facto_util.py +++ b/backends/cadence/utils/facto_util.py @@ -167,7 +167,25 @@ def random_size_constraint(deps: object, r: int, d: int) -> int: cp.Size.Ge(lambda deps, r, d: 1), max_size_constraint, ] - else: + elif index == 1: # input tensor(a) + tensor_constraints = [ + cp.Dtype.In( + lambda deps: [ + torch.int8, + torch.int16, + torch.uint8, + torch.uint16, + torch.int32, + torch.float32, + ] + ), + cp.Value.Ge(lambda deps, dtype, struct: -(2**4)), + cp.Value.Le(lambda deps, dtype, struct: 2**4), + cp.Rank.Ge(lambda deps: 1), + cp.Size.Ge(lambda deps, r, d: 1), + max_size_constraint, + ] + else: # input tensor(b) tensor_constraints = [ cp.Dtype.In( lambda deps: [ @@ -179,6 +197,7 @@ def random_size_constraint(deps: object, r: int, d: int) -> int: torch.float32, ] ), + cp.Dtype.Eq(lambda deps: deps[1].dtype), cp.Value.Ge(lambda deps, dtype, struct: -(2**4)), cp.Value.Le(lambda deps, dtype, struct: 2**4), cp.Rank.Ge(lambda deps: 1), From 2b54a19ec5d35e1981848fa86ad423c2b37d49f4 Mon Sep 17 00:00:00 2001 From: haowhsu-quic <111341466+haowhsu-quic@users.noreply.github.com> Date: Thu, 18 Sep 2025 05:52:54 +0800 Subject: [PATCH 018/395] Qualcomm AI Engine Direct - issue fix #2 (#14378) ### Summary - #14048 > add quantized test case with GLU decomposition - #14049 > add e2e example where constant expansion is applied - #14050 > add e2e example and source transform for 6D operation - #14051 > add e2e example and complement missed annotation - #14052 > add e2e example and dedicated passe for 6D partition Fixes #14048 Fixes #14049 Fixes #14050 Fixes #14051 Fixes #14052 ### Test plan MATRIX = {convnext_small, maxvit_t, swin_v2_t, vit_b_16} ```bash python backends/qualcomm/tests/test_qnn_delegate.py TestExampleOssScript.test_${MATRIX} -b build-android/ -m SM8750 -s $SN -a /path/to/test_artifacts/ -i /path/to/imagenet_1k/imagenet-mini/val -r . ``` ```bash python backends/qualcomm/tests/test_qnn_delegate.py TestQuantizedModel.test_qnn_backend_conformer -b build-android/ -m SM8750 -s $SN -a /path/to/test_artifacts/ ``` --- backends/qualcomm/_passes/__init__.py | 2 + .../qualcomm/_passes/annotate_quant_attrs.py | 8 + backends/qualcomm/_passes/decompose_any.py | 28 +- backends/qualcomm/_passes/decompose_cdist.py | 28 +- backends/qualcomm/_passes/decompose_einsum.py | 33 +-- backends/qualcomm/_passes/decompose_glu.py | 55 ++++ .../_passes/decompose_linalg_vector_norm.py | 29 +-- backends/qualcomm/_passes/decompose_roll.py | 29 +-- .../_passes/decompose_wrap_with_autocast.py | 27 +- .../qualcomm/_passes/fixed_linear_keep_dim.py | 23 +- backends/qualcomm/_passes/qnn_pass_manager.py | 2 + backends/qualcomm/_passes/utils.py | 39 +++ backends/qualcomm/quantizer/annotators.py | 4 +- backends/qualcomm/tests/models.py | 20 ++ backends/qualcomm/tests/test_qnn_delegate.py | 230 +++++++++++++++++ examples/qualcomm/oss_scripts/README.md | 6 +- .../qualcomm/oss_scripts/convnext_small.py | 145 +++++++++++ examples/qualcomm/oss_scripts/maxvit_t.py | 244 ++++++++++++++++++ examples/qualcomm/oss_scripts/swin_v2_t.py | 185 +++++++++++++ examples/qualcomm/oss_scripts/vit_b_16.py | 135 ++++++++++ examples/qualcomm/utils.py | 3 + 21 files changed, 1140 insertions(+), 135 deletions(-) create mode 100644 backends/qualcomm/_passes/decompose_glu.py create mode 100755 examples/qualcomm/oss_scripts/convnext_small.py create mode 100755 examples/qualcomm/oss_scripts/maxvit_t.py create mode 100755 examples/qualcomm/oss_scripts/swin_v2_t.py create mode 100755 examples/qualcomm/oss_scripts/vit_b_16.py diff --git a/backends/qualcomm/_passes/__init__.py b/backends/qualcomm/_passes/__init__.py index 15fce79ea12..f7b7ff62c42 100644 --- a/backends/qualcomm/_passes/__init__.py +++ b/backends/qualcomm/_passes/__init__.py @@ -17,6 +17,7 @@ from .decompose_col_im import DecomposeColIm from .decompose_einsum import DecomposeEinsum from .decompose_expm1 import DecomposeExpM1 +from .decompose_glu import DecomposeGlu from .decompose_linalg_vector_norm import DecomposeLinalgVectorNorm from .decompose_minmaxdim import DecomposeMinMaxDim from .decompose_roll import DecomposeRoll @@ -57,6 +58,7 @@ DecomposeColIm, DecomposeEinsum, DecomposeExpM1, + DecomposeGlu, DecomposeLinalgVectorNorm, DecomposeMinMaxDim, DecomposeRoll, diff --git a/backends/qualcomm/_passes/annotate_quant_attrs.py b/backends/qualcomm/_passes/annotate_quant_attrs.py index 610e88e6d3b..6077d51b099 100644 --- a/backends/qualcomm/_passes/annotate_quant_attrs.py +++ b/backends/qualcomm/_passes/annotate_quant_attrs.py @@ -19,6 +19,7 @@ QCOM_SCALE, QCOM_ZERO_POINT, ) +from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.pass_base import ExportPass, PassResult from .utils import get_quant_attrs @@ -38,6 +39,9 @@ def __init__( super(AnnotateQuantAttrs, self).__init__() self.edge_program = edge_program self.skip_advanced_requant = skip_advanced_requant + self.skip_requant_allowlist = { + exir_ops.edge.aten.sigmoid.default, + } def _annotate_source_nodes( self, quant_node: torch.fx.Node, quant_attrs: Dict[str, Any] @@ -80,6 +84,10 @@ def _annotate_requant(self, n): # node1 -> q_ui8 (n) -> dq_ui8 -> q_int32 -> dq_int32 -> node2 -> .... # We store {node2: quant_attr in dq_int32} in node1.meta if n.target in q_ops and n.args[0].target not in dq_ops: + # for some fixed scale op, there is no need to requantize it + if n.args[0].target in self.skip_requant_allowlist: + return + dq_nodes = self._find_last_dq_nodes(n) q_attrs = get_quant_attrs(self.edge_program, n) for dq_node in dq_nodes: diff --git a/backends/qualcomm/_passes/decompose_any.py b/backends/qualcomm/_passes/decompose_any.py index e92bf11dd18..0cb959ff77f 100644 --- a/backends/qualcomm/_passes/decompose_any.py +++ b/backends/qualcomm/_passes/decompose_any.py @@ -8,6 +8,8 @@ from executorch.exir import to_edge from executorch.exir.pass_base import ExportPass, PassResult +from .utils import merge_decomposed_graph + class Any(torch.nn.Module): def __init__(self, dim, keepdim): @@ -49,26 +51,12 @@ def call(self, graph_module: torch.fx.GraphModule) -> PassResult: # remap is used to map original node values to new node values, # which ensures that reference to nodes are correctly updated in the new graph remap = {"x": node.args[0]} - - for decomposed_node in decomposed_module.graph.nodes: - # no need to copy existent 'output' - if decomposed_node.op == "output": - for user in node.users.copy(): - # remap - user.replace_input_with( - node, - remap[decomposed_node.args[0][0]], - ) - # no need to copy existent placeholders - elif decomposed_node.op == "placeholder": - # replace node map from string to graph node - remap[decomposed_node] = remap.pop(decomposed_node.name) - else: - remap[decomposed_node] = graph.node_copy( - decomposed_node, - arg_transform=lambda x, remap=remap: remap[x], - ) - + merge_decomposed_graph( + remap=remap, + target_node=node, + target_graph=graph, + decomposed_graph_module=decomposed_module, + ) graph.erase_node(node) graph.eliminate_dead_code() diff --git a/backends/qualcomm/_passes/decompose_cdist.py b/backends/qualcomm/_passes/decompose_cdist.py index d18a0295ffb..a3c812bdc37 100644 --- a/backends/qualcomm/_passes/decompose_cdist.py +++ b/backends/qualcomm/_passes/decompose_cdist.py @@ -7,6 +7,8 @@ import torch from executorch.exir.pass_base import ExportPass, PassResult +from .utils import merge_decomposed_graph + class CDist(torch.nn.Module): def __init__(self): @@ -54,26 +56,12 @@ def call(self, graph_module: torch.fx.GraphModule) -> PassResult: # remap is used to map original node values to new node values, # which ensures that reference to nodes are correctly updated in the new graph remap = {"x": node.args[0], "y": node.args[1]} - - for decomposed_node in decomposed_module.graph.nodes: - # no need to copy existent 'output' - if decomposed_node.op == "output": - for user in node.users.copy(): - # remap - user.replace_input_with( - node, - remap[decomposed_node.args[0][0]], - ) - # no need to copy existent placeholders - elif decomposed_node.op == "placeholder": - # replace node map from string to graph node - remap[decomposed_node] = remap.pop(decomposed_node.name) - else: - remap[decomposed_node] = graph.node_copy( - decomposed_node, - arg_transform=lambda x, remap=remap: remap[x], - ) - + merge_decomposed_graph( + remap=remap, + target_node=node, + target_graph=graph, + decomposed_graph_module=decomposed_module, + ) graph.erase_node(node) graph.eliminate_dead_code() diff --git a/backends/qualcomm/_passes/decompose_einsum.py b/backends/qualcomm/_passes/decompose_einsum.py index 046c1598311..464d989333f 100644 --- a/backends/qualcomm/_passes/decompose_einsum.py +++ b/backends/qualcomm/_passes/decompose_einsum.py @@ -8,7 +8,7 @@ from executorch.exir.pass_base import ExportPass, PassResult from torch.fx.experimental.proxy_tensor import make_fx -from .utils import copy_nn_module_stack +from .utils import merge_decomposed_graph class DecomposeEinsum(ExportPass): @@ -37,30 +37,13 @@ def call(self, graph_module: torch.fx.GraphModule) -> PassResult: for i, arg in enumerate(node.args[1]): remap[f"arg1_{i+1}"] = arg - for decomposed_node in decomposed_module.graph.nodes: - copy_nn_module_stack(node, decomposed_node) - # This is the arg[0] equation string, which is not required anymore after decomposition - if "arg0" in decomposed_node.name: - continue - - # no need to copy existent 'output' - if decomposed_node.op == "output": - for user in node.users.copy(): - # remap - user.replace_input_with( - node, - remap[decomposed_node.args[0][0]], - ) - # no need to copy existent placeholders - elif decomposed_node.op == "placeholder": - # replace node map from string to graph node - remap[decomposed_node] = remap.pop(decomposed_node.name) - else: - remap[decomposed_node] = graph.node_copy( - decomposed_node, - arg_transform=lambda x, remap=remap: remap[x], - ) - + merge_decomposed_graph( + remap=remap, + target_node=node, + target_graph=graph, + decomposed_graph_module=decomposed_module, + predicate=lambda decomp_node: "arg0" not in decomp_node.name, + ) graph.erase_node(node) graph.eliminate_dead_code() diff --git a/backends/qualcomm/_passes/decompose_glu.py b/backends/qualcomm/_passes/decompose_glu.py new file mode 100644 index 00000000000..de363468799 --- /dev/null +++ b/backends/qualcomm/_passes/decompose_glu.py @@ -0,0 +1,55 @@ +# Copyright (c) Qualcomm Innovation Center, Inc. +# All rights reserved +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +import torch +from executorch.exir.pass_base import ExportPass, PassResult + +from .utils import merge_decomposed_graph + + +# this wrapper is required for IO name mapping with decomposed graph +class Glu(torch.nn.Module): + def __init__(self, dim=-1): + super().__init__() + self.glu = torch.nn.GLU(dim=dim) + + def forward(self, x): + return self.glu(x) + + +class DecomposeGlu(ExportPass): + """ + Decompose glu for quantization annotation to work properly. + """ + + def __init__(self) -> None: + super().__init__() + + def call(self, graph_module: torch.fx.GraphModule) -> PassResult: + graph = graph_module.graph + for node in graph.nodes: + if node.target == torch.ops.aten.glu.default: + ep = torch.export.export( + Glu(dim=-1 if len(node.args) < 2 else node.args[1]), + (node.args[0].meta["val"],), + ) + decomposed_module = ep.run_decompositions().graph_module + + with graph.inserting_before(node): + # remap is used to map original node values to new node values, + # which ensures that reference to nodes are correctly updated in the new graph + remap = {"x": node.args[0]} + merge_decomposed_graph( + remap=remap, + target_node=node, + target_graph=graph, + decomposed_graph_module=decomposed_module, + ) + graph.erase_node(node) + + graph.eliminate_dead_code() + graph_module.recompile() + return PassResult(graph_module, True) diff --git a/backends/qualcomm/_passes/decompose_linalg_vector_norm.py b/backends/qualcomm/_passes/decompose_linalg_vector_norm.py index 993f088da12..94a5b10ba3f 100644 --- a/backends/qualcomm/_passes/decompose_linalg_vector_norm.py +++ b/backends/qualcomm/_passes/decompose_linalg_vector_norm.py @@ -8,7 +8,7 @@ from executorch.exir import to_edge from executorch.exir.pass_base import ExportPass, PassResult -from .utils import copy_nn_module_stack +from .utils import merge_decomposed_graph class LinalgVectorNorm(torch.nn.Module): @@ -62,27 +62,12 @@ def call(self, graph_module: torch.fx.GraphModule) -> PassResult: # remap is used to map original node values to new node values, # which ensures that reference to nodes are correctly updated in the new graph remap = {"x": node.args[0]} - - for decomposed_node in decomposed_module.graph.nodes: - copy_nn_module_stack(node, decomposed_node) - # no need to copy existent 'output' - if decomposed_node.op == "output": - for user in node.users.copy(): - # remap - user.replace_input_with( - node, - remap[decomposed_node.args[0][0]], - ) - # no need to copy existent placeholders - elif decomposed_node.op == "placeholder": - # replace node map from string to graph node - remap[decomposed_node] = remap.pop(decomposed_node.name) - else: - remap[decomposed_node] = graph.node_copy( - decomposed_node, - arg_transform=lambda x, remap=remap: remap[x], - ) - + merge_decomposed_graph( + remap=remap, + target_node=node, + target_graph=graph, + decomposed_graph_module=decomposed_module, + ) graph.erase_node(node) graph.eliminate_dead_code() diff --git a/backends/qualcomm/_passes/decompose_roll.py b/backends/qualcomm/_passes/decompose_roll.py index e13433508f5..e6f60d55464 100644 --- a/backends/qualcomm/_passes/decompose_roll.py +++ b/backends/qualcomm/_passes/decompose_roll.py @@ -7,7 +7,7 @@ from executorch.exir.pass_base import ExportPass, PassResult -from .utils import copy_nn_module_stack +from .utils import merge_decomposed_graph class SliceCopy(torch.nn.Module): @@ -65,27 +65,12 @@ def call(self, graph_module: torch.fx.GraphModule) -> PassResult: # remap is used to map original node values to new node values, # which ensures that reference to nodes are correctly updated in the new graph remap = {"x": input_node} - - for decomposed_node in decomposed_module.graph.nodes: - copy_nn_module_stack(node, decomposed_node) - # no need to copy existent 'output' - if decomposed_node.op == "output": - for user in node.users.copy(): - # remap - user.replace_input_with( - node, - remap[decomposed_node.args[0][0]], - ) - # no need to copy existent placeholders - elif decomposed_node.op == "placeholder": - # replace node map from string to graph node - remap[decomposed_node] = remap.pop(decomposed_node.name) - else: - remap[decomposed_node] = graph.node_copy( - decomposed_node, - arg_transform=lambda x, remap=remap: remap[x], - ) - + merge_decomposed_graph( + remap=remap, + target_node=node, + target_graph=graph, + decomposed_graph_module=decomposed_module, + ) graph.erase_node(node) graph.eliminate_dead_code() diff --git a/backends/qualcomm/_passes/decompose_wrap_with_autocast.py b/backends/qualcomm/_passes/decompose_wrap_with_autocast.py index 6c073bd309c..1b60b740ed3 100644 --- a/backends/qualcomm/_passes/decompose_wrap_with_autocast.py +++ b/backends/qualcomm/_passes/decompose_wrap_with_autocast.py @@ -10,7 +10,7 @@ import torch from executorch.exir.pass_base import ExportPass, PassResult -from .utils import copy_nn_module_stack +from .utils import merge_decomposed_graph class DecomposeWrapWithAutocast(ExportPass): @@ -52,7 +52,7 @@ def _replace(self, gm: torch.fx.GraphModule) -> None: graph = gm.graph for node in graph.nodes: if isinstance(node.target, torch._higher_order_ops.wrap.WrapWithAutocast): - submod, submod_name = self._get_submod(gm, node) + submod, _ = self._get_submod(gm, node) n_args = node.args input_submod = n_args[4] decomposed_module = submod @@ -61,22 +61,13 @@ def _replace(self, gm: torch.fx.GraphModule) -> None: # which ensures that reference to nodes are correctly updated in the new graph # remap = {"expand_1": node.args[5], "to_4": node.args[6]} remap = {n_args[i].name: n_args[i] for i in range(5, len(n_args))} - - for decomposed_node in decomposed_module.graph.nodes: - copy_nn_module_stack(node, decomposed_node) - # no need to copy existent 'output' - if decomposed_node.op == "output": - self._replace_output(node, decomposed_node, remap) - # no need to copy existent placeholders - elif decomposed_node.op == "placeholder": - # replace node map from string to graph node - remap[decomposed_node] = remap.pop(decomposed_node.name) - else: - remap[decomposed_node] = graph.node_copy( - decomposed_node, - arg_transform=lambda x, remap=remap: remap[x], - ) - + merge_decomposed_graph( + remap=remap, + target_node=node, + target_graph=graph, + decomposed_graph_module=decomposed_module, + output_processor=self._replace_output, + ) graph.erase_node(node) graph.erase_node(input_submod) diff --git a/backends/qualcomm/_passes/fixed_linear_keep_dim.py b/backends/qualcomm/_passes/fixed_linear_keep_dim.py index 19f5c631921..04c0f92cebf 100644 --- a/backends/qualcomm/_passes/fixed_linear_keep_dim.py +++ b/backends/qualcomm/_passes/fixed_linear_keep_dim.py @@ -5,10 +5,14 @@ # LICENSE file in the root directory of this source tree. import torch +from executorch.backends.qualcomm.builders.node_visitor import dq_ops +from executorch.backends.qualcomm.utils.constants import QCOM_QUANT_ATTRS from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.pass_base import ExportPass, PassResult from executorch.exir.passes import dead_code_elimination_pass +from .utils import copy_meta, get_quant_attrs + class FixedLinearKeepDim(ExportPass): """ @@ -18,8 +22,12 @@ class FixedLinearKeepDim(ExportPass): view_copy = exir_ops.edge.aten.view_copy.default linear = exir_ops.edge.aten.linear.default - def __init__(self): + def __init__( + self, + edge_program: torch.export.ExportedProgram, + ): super(FixedLinearKeepDim, self).__init__() + self.edge_program = edge_program def _fixed_keep_dim(self, graph_module: torch.fx.GraphModule): for node in graph_module.graph.nodes: @@ -46,9 +54,15 @@ def _fixed_keep_dim(self, graph_module: torch.fx.GraphModule): ) # meta needs to be copied elementwisely for fake-tensor # to be updated correctly and not affect meta of input_node - for k, v in input_node.meta.items(): - squeeze_node.meta[k] = v + squeeze_node.meta = copy_meta(input_node.meta) squeeze_node.meta["val"] = input_tensor.reshape(squeeze_dim) + # if input_node is dequantize, we need to fetch encodings manually + # TODO: remove this when constant fold mechanism is introduced + if input_node.target in dq_ops: + squeeze_node.meta[QCOM_QUANT_ATTRS] = get_quant_attrs( + self.edge_program, input_node + ) + for user in input_users: if user == linear_node: user.replace_input_with(input_node, squeeze_node) @@ -66,8 +80,7 @@ def _fixed_keep_dim(self, graph_module: torch.fx.GraphModule): ) # meta needs to be copied elementwisely for fake-tensor # to be updated correctly and not affect meta of unsqueeze_node - for k, v in linear_node.meta.items(): - unsqueeze_node.meta[k] = v + unsqueeze_node.meta = copy_meta(linear_node.meta) # update linear node's shape linear_node.meta["val"] = linear_output.reshape( (squeeze_node.meta["val"].shape[0], linear_output.shape[-1]) diff --git a/backends/qualcomm/_passes/qnn_pass_manager.py b/backends/qualcomm/_passes/qnn_pass_manager.py index ffb9f3221df..650a98bf8ce 100644 --- a/backends/qualcomm/_passes/qnn_pass_manager.py +++ b/backends/qualcomm/_passes/qnn_pass_manager.py @@ -22,6 +22,7 @@ DecomposeColIm, DecomposeEinsum, DecomposeExpM1, + DecomposeGlu, DecomposeLinalgVectorNorm, DecomposeMinMaxDim, DecomposeRoll, @@ -200,6 +201,7 @@ def transform_for_annotation_pipeline(self, graph_module: GraphModule): self.add_pass(DecomposeWrapWithAutocast()) self.add_pass(DecomposeEinsum()) self.add_pass(DecomposeExpM1()) + self.add_pass(DecomposeGlu()) self.add_pass(DecomposeLinalgVectorNorm(quantization_capture=True)) self.add_pass(ReplaceInfValues()) self.add_pass(LiftConstantScalarOperands()) diff --git a/backends/qualcomm/_passes/utils.py b/backends/qualcomm/_passes/utils.py index 6d908707892..eebfa4d9eb4 100755 --- a/backends/qualcomm/_passes/utils.py +++ b/backends/qualcomm/_passes/utils.py @@ -117,6 +117,45 @@ def copy_nn_module_stack(src, target): target.meta["nn_module_stack"] = value +def merge_decomposed_graph( + remap: Dict[str, torch.fx.Node], + target_node: torch.fx.Node, + target_graph: torch.fx.GraphModule, + decomposed_graph_module: torch.fx.GraphModule, + predicate: Callable[[torch.fx.Node], None] = None, + # target_node, decomposed_output_node, remap + output_processor: Callable[ + [torch.fx.Node, torch.fx.Node, Dict[str, torch.fx.Node]], None + ] = None, +) -> None: + def default_output_process(node): + for user in node.users.copy(): + # remap + user.replace_input_with( + node, + remap[decomposed_node.args[0][0]], + ) + + for decomposed_node in decomposed_graph_module.graph.nodes: + copy_nn_module_stack(target_node, decomposed_node) + if predicate is None or predicate(decomposed_node): + # no need to copy existent 'output' + if decomposed_node.op == "output": + if output_processor is None: + default_output_process(target_node) + else: + output_processor(target_node, decomposed_node, remap) + # no need to copy existent placeholders + elif decomposed_node.op == "placeholder": + # replace node map from string to graph node + remap[decomposed_node] = remap.pop(decomposed_node.name) + else: + remap[decomposed_node] = target_graph.node_copy( + decomposed_node, + arg_transform=lambda x, remap=remap: remap[x], + ) + + def is_float_tensor(node: torch.fx.Node) -> bool: if "val" not in node.meta or not isinstance(node.meta["val"], FakeTensor): return False diff --git a/backends/qualcomm/quantizer/annotators.py b/backends/qualcomm/quantizer/annotators.py index 88109b51697..d584cd128ec 100644 --- a/backends/qualcomm/quantizer/annotators.py +++ b/backends/qualcomm/quantizer/annotators.py @@ -674,7 +674,7 @@ def annotate_pad(node: Node, quantization_config: QuantizationConfig) -> None: annotate_single_in_single_out(node, quantization_config) -@register_annotator([torch.ops.aten.reshape.default]) +@register_annotator([torch.ops.aten.reshape.default, torch.ops.aten.unflatten.int]) def annotate_reshape(node: Node, quantization_config: QuantizationConfig) -> None: annotate_single_in_single_out(node, quantization_config) @@ -879,7 +879,7 @@ def annotate_unsqueeze_copy( annotate_single_in_share_out(node, quantization_config) -@register_annotator([torch.ops.aten.transpose.int]) +@register_annotator([torch.ops.aten.transpose.int, torch.ops.aten.swapaxes.default]) def annotate_transpose(node: Node, quantization_config: QuantizationConfig) -> None: annotate_in_out_obs_sharing_op(node, quantization_config) if not _is_annotated([node]): diff --git a/backends/qualcomm/tests/models.py b/backends/qualcomm/tests/models.py index 2de2cd098aa..97fe848c556 100644 --- a/backends/qualcomm/tests/models.py +++ b/backends/qualcomm/tests/models.py @@ -1899,6 +1899,16 @@ def forward(self, x): return torch.sum(x, dim=(2, 3), keepdim=True) +class SwapAxes(torch.nn.Module): + def __init__(self, axis0, axis1): + super().__init__() + self.axis0 = axis0 + self.axis1 = axis1 + + def forward(self, x): + return torch.swapaxes(x, axis0=self.axis0, axis1=self.axis1) + + class Tanh(torch.nn.Module): def __init__(self): super().__init__() @@ -1925,6 +1935,16 @@ def forward(self, x): return torch.unbind(x) +class Unflatten(torch.nn.Module): + def __init__(self, dim, sizes): + super().__init__() + self.dim = dim + self.sizes = sizes + + def forward(self, x): + return torch.unflatten(x, dim=self.dim, sizes=self.sizes) + + class Unfold(torch.nn.Module): def __init__(self): super().__init__() diff --git a/backends/qualcomm/tests/test_qnn_delegate.py b/backends/qualcomm/tests/test_qnn_delegate.py index 0e75cf2844a..0e4d6dfd538 100644 --- a/backends/qualcomm/tests/test_qnn_delegate.py +++ b/backends/qualcomm/tests/test_qnn_delegate.py @@ -631,6 +631,13 @@ def test_qnn_backend_gelu(self): sample_input = (torch.randn(2, 5, 1, 3),) self.lower_module_and_test_output(module, sample_input) + def test_qnn_backend_glu(self): + modules = [torch.nn.GLU(), torch.nn.GLU(dim=0)] + sample_input = (torch.randn(2, 5, 1, 4),) + for i, module in enumerate(modules): + with self.subTest(i=i): + self.lower_module_and_test_output(module, sample_input) + def test_qnn_backend_greater_equal(self): test_comb = [ { @@ -1202,11 +1209,21 @@ def test_qnn_backend_sum_int_list(self): sample_input = (torch.randn([1, 4, 8, 8]),) self.lower_module_and_test_output(module, sample_input) + def test_qnn_backend_swapaxes(self): + module = SwapAxes(0, 1) # noqa: F405 + sample_input = (torch.randn([1, 2, 3, 4]),) + self.lower_module_and_test_output(module, sample_input) + def test_qnn_backend_tanh(self): module = Tanh() # noqa: F405 sample_input = (torch.randn(2, 5, 1, 3),) self.lower_module_and_test_output(module, sample_input) + def test_qnn_backend_unflatten(self): + module = Unflatten(dim=1, sizes=(2, 3, 4)) # noqa: F405 + sample_input = (torch.randn([1, 24]),) + self.lower_module_and_test_output(module, sample_input) + def test_qnn_backend_unbind(self): module = Unbind() # noqa: F405 sample_input = (torch.randn([3, 3]),) @@ -2146,6 +2163,14 @@ def test_qnn_backend_gelu(self): module = self.get_qdq_module(module, sample_input) self.lower_module_and_test_output(module, sample_input) + def test_qnn_backend_glu(self): + modules = [torch.nn.GLU(), torch.nn.GLU(dim=0)] + sample_input = (torch.randn(2, 5, 1, 4),) + for i, module in enumerate(modules): + with self.subTest(i=i): + module = self.get_qdq_module(module, sample_input) + self.lower_module_and_test_output(module, sample_input) + def test_qnn_backend_greater_equal(self): test_comb = [ { @@ -2814,12 +2839,24 @@ def test_qnn_backend_sum_int_list(self): module = self.get_qdq_module(module, sample_input) self.lower_module_and_test_output(module, sample_input) + def test_qnn_backend_swapaxes(self): + module = SwapAxes(0, 1) # noqa: F405 + sample_input = (torch.randn([1, 2, 3, 4]),) + module = self.get_qdq_module(module, sample_input) + self.lower_module_and_test_output(module, sample_input) + def test_qnn_backend_tanh(self): module = Tanh() # noqa: F405 sample_input = (torch.randn(2, 5, 1, 3),) module = self.get_qdq_module(module, sample_input) self.lower_module_and_test_output(module, sample_input) + def test_qnn_backend_unflatten(self): + module = Unflatten(dim=1, sizes=(2, 3, 4)) # noqa: F405 + sample_input = (torch.randn([1, 24]),) + module = self.get_qdq_module(module, sample_input) + self.lower_module_and_test_output(module, sample_input) + def test_qnn_backend_unbind(self): module = Unbind() # noqa: F405 sample_input = (torch.randn([3, 3]),) @@ -2943,6 +2980,51 @@ def test_qnn_backend_chunk_add(self): module = self.get_qdq_module(module, sample_input) self.lower_module_and_test_output(module, sample_input) + def test_qnn_backend_conformer(self): + from typing import Tuple + + import torchaudio + + class PatchedConformer(torch.nn.Module): + """ + A lightly modified version of the top-level Conformer module, such that it can be exported. + Instead of taking lengths and computing the padding mask, it takes the padding mask directly. + See https://github.com/pytorch/audio/blob/main/src/torchaudio/models/conformer.py#L215 + """ + + def __init__(self, conformer): + super().__init__() + self.conformer = conformer + + def forward( + self, input: torch.Tensor, encoder_padding_mask: torch.Tensor + ) -> Tuple[torch.Tensor, torch.Tensor]: + x = input.transpose(0, 1) + for layer in self.conformer.conformer_layers: + x = layer(x, encoder_padding_mask) + return x.transpose(0, 1) + + inner_model = torchaudio.models.Conformer( + input_dim=80, + num_heads=4, + ffn_dim=128, + num_layers=4, + depthwise_conv_kernel_size=31, + ) + lengths = torch.randint(1, 400, (10,)) + encoder_padding_mask = torchaudio.models.conformer._lengths_to_padding_mask( + lengths + ) + sample_input = ( + torch.rand(10, int(lengths.max()), 80), + encoder_padding_mask.to(torch.float32), + ) + module = PatchedConformer(inner_model).eval() + module = self.get_qdq_module( + module, sample_input, quant_dtype=QuantDtype.use_16a8w + ) + self.lower_module_and_test_output(module, sample_input) + def test_qnn_backend_conv1d_relu_log_softmax(self): modules = [ Conv1dReluLogSoftmax(dim=1), # noqa: F405 @@ -5438,6 +5520,43 @@ def test_conv_former(self): self.assertGreaterEqual(msg["top_1"], 70) self.assertGreaterEqual(msg["top_5"], 92) + def test_convnext_small(self): + if not self.required_envs([self.image_dataset]): + self.skipTest("missing required envs") + cmds = [ + "python", + f"{self.executorch_root}/examples/qualcomm/oss_scripts/convnext_small.py", + "--dataset", + self.image_dataset, + "--artifact", + self.artifact_dir, + "--build_folder", + self.build_folder, + "--device", + self.device, + "--model", + self.model, + "--ip", + self.ip, + "--port", + str(self.port), + "--seed", + str(1126), + ] + if self.host: + cmds.extend(["--host", self.host]) + + p = subprocess.Popen(cmds, stdout=subprocess.DEVNULL) + with Listener((self.ip, self.port)) as listener: + conn = listener.accept() + p.communicate() + msg = json.loads(conn.recv()) + if "Error" in msg: + self.fail(msg["Error"]) + else: + self.assertGreaterEqual(msg["top_1"], 76) + self.assertGreaterEqual(msg["top_5"], 97) + def test_cvt(self): if not self.required_envs([self.image_dataset]): self.skipTest("missing required envs") @@ -5936,6 +6055,43 @@ def test_gMLP(self): self.assertGreaterEqual(msg["top_1"], 70) self.assertGreaterEqual(msg["top_5"], 88) + def test_maxvit_t(self): + if not self.required_envs([self.image_dataset]): + self.skipTest("missing required envs") + cmds = [ + "python", + f"{self.executorch_root}/examples/qualcomm/oss_scripts/maxvit_t.py", + "--dataset", + self.image_dataset, + "--artifact", + self.artifact_dir, + "--build_folder", + self.build_folder, + "--device", + self.device, + "--model", + self.model, + "--ip", + self.ip, + "--port", + str(self.port), + "--seed", + str(1126), + ] + if self.host: + cmds.extend(["--host", self.host]) + + p = subprocess.Popen(cmds, stdout=subprocess.DEVNULL) + with Listener((self.ip, self.port)) as listener: + conn = listener.accept() + p.communicate() + msg = json.loads(conn.recv()) + if "Error" in msg: + self.fail(msg["Error"]) + else: + self.assertGreaterEqual(msg["top_1"], 72) + self.assertGreaterEqual(msg["top_5"], 91) + @unittest.skip("Only outputs good accuracy in QNN 2.29") def test_mobilevit_v2(self): if not self.required_envs([self.image_dataset]): @@ -6282,6 +6438,43 @@ def test_swin_transformer(self): self.assertGreaterEqual(msg["top_1"], 71) self.assertGreaterEqual(msg["top_5"], 90) + def test_swin_v2_t(self): + if not self.required_envs([self.image_dataset]): + self.skipTest("missing required envs") + cmds = [ + "python", + f"{self.executorch_root}/examples/qualcomm/oss_scripts/swin_v2_t.py", + "--dataset", + self.image_dataset, + "--artifact", + self.artifact_dir, + "--build_folder", + self.build_folder, + "--device", + self.device, + "--model", + self.model, + "--ip", + self.ip, + "--port", + str(self.port), + "--seed", + str(1126), + ] + if self.host: + cmds.extend(["--host", self.host]) + + p = subprocess.Popen(cmds, stdout=subprocess.DEVNULL) + with Listener((self.ip, self.port)) as listener: + conn = listener.accept() + p.communicate() + msg = json.loads(conn.recv()) + if "Error" in msg: + self.fail(msg["Error"]) + else: + self.assertGreaterEqual(msg["top_1"], 63) + self.assertGreaterEqual(msg["top_5"], 92) + def test_t5(self): if not self.required_envs([self.qa_dataset]): self.skipTest("missing required envs") @@ -6318,6 +6511,43 @@ def test_t5(self): else: self.assertGreaterEqual(msg["f1"], 0.72) + def test_vit_b_16(self): + if not self.required_envs([self.image_dataset]): + self.skipTest("missing required envs") + cmds = [ + "python", + f"{self.executorch_root}/examples/qualcomm/oss_scripts/vit_b_16.py", + "--dataset", + self.image_dataset, + "--artifact", + self.artifact_dir, + "--build_folder", + self.build_folder, + "--device", + self.device, + "--model", + self.model, + "--ip", + self.ip, + "--port", + str(self.port), + "--seed", + str(1126), + ] + if self.host: + cmds.extend(["--host", self.host]) + + p = subprocess.Popen(cmds, stdout=subprocess.DEVNULL) + with Listener((self.ip, self.port)) as listener: + conn = listener.accept() + p.communicate() + msg = json.loads(conn.recv()) + if "Error" in msg: + self.fail(msg["Error"]) + else: + self.assertGreaterEqual(msg["top_1"], 72) + self.assertGreaterEqual(msg["top_5"], 96) + def test_whisper(self): if not self.required_envs(): self.skipTest("missing required envs") diff --git a/examples/qualcomm/oss_scripts/README.md b/examples/qualcomm/oss_scripts/README.md index b68024d5fbf..7971cc4a1de 100644 --- a/examples/qualcomm/oss_scripts/README.md +++ b/examples/qualcomm/oss_scripts/README.md @@ -15,6 +15,7 @@ The following models can be categorized based on their primary use cases. 2. Vision Model: - conv_former + - convnext_small - cvt - deit - dino_v2 @@ -26,6 +27,7 @@ The following models can be categorized based on their primary use cases. - fbnet - focalnet - gMLP_image_classification + - maxvit_t - mobilevit1 - mobilevit_v2 - pvt @@ -34,6 +36,8 @@ The following models can be categorized based on their primary use cases. - squeezenet - ssd300_vgg16 - swin_transformer + - swin_v2_t + - vit_b_16 ## Prerequisite Please follow another [README](../README.md) first to set up environment. @@ -51,7 +55,7 @@ If you want to export the model without running it, please add `--compile_only` ```bash python albert.py -m ${SOC_MODEL} -b path/to/build-android/ -s ${DEVICE_SERIAL} -d path/to/wikisent2 -2. `conv_former`,`cvt`,`deit`,`dino_v2`,`efficientnet`,`fbnet`, `focalnet`, `gMLP_image_classification`, `mobilevit1`,`mobilevit_v2`, `pvt`, `squeezenet`, `swin_transformer` : +2. `conv_former`, `convnext_small`, `cvt`, `deit`, `dino_v2`, `efficientnet`, `fbnet`, `focalnet`, `gMLP_image_classification`, `maxvit_t`, `mobilevit1`, `mobilevit_v2`, `pvt`, `squeezenet`, `swin_transformer`, `swin_v2_t`, `vit_b_16` : - Required Dataset : ImageNet Download [dataset](https://www.kaggle.com/datasets/ifigotin/imagenetmini-1000) first, and place it in a valid folder. diff --git a/examples/qualcomm/oss_scripts/convnext_small.py b/examples/qualcomm/oss_scripts/convnext_small.py new file mode 100755 index 00000000000..491ffb0b7c3 --- /dev/null +++ b/examples/qualcomm/oss_scripts/convnext_small.py @@ -0,0 +1,145 @@ +# Copyright (c) Qualcomm Innovation Center, Inc. +# All rights reserved +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +import json +import logging +import os + +from multiprocessing.connection import Client + +import numpy as np + +import torch +import torchvision + +from executorch.backends.qualcomm._passes.expand_broadcast_tensor_shape import ( + ExpandBroadcastTensorShape, +) +from executorch.backends.qualcomm._passes.qnn_pass_manager import ( + get_capture_program_passes, +) +from executorch.backends.qualcomm.quantizer.quantizer import QuantDtype +from executorch.backends.qualcomm.utils.constants import QCOM_PASS_ACTIVATE_KEY +from executorch.examples.qualcomm.utils import ( + build_executorch_binary, + get_imagenet_dataset, + make_output_dir, + make_quantizer, + setup_common_args_and_variables, + SimpleADB, + topk_accuracy, +) + + +def main(args): + # ensure the working directory exist. + os.makedirs(args.artifact, exist_ok=True) + + data_num = 100 + if args.ci: + inputs = [(torch.rand(1, 3, 224, 224),)] + logging.warning( + "This option is for CI to verify the export flow. It uses random input and will result in poor accuracy." + ) + else: + inputs, targets = get_imagenet_dataset( + dataset_path=f"{args.dataset}", + data_size=data_num, + image_shape=(256, 256), + crop_size=224, + ) + + pte_filename = "convnext_small_qnn_q8" + instance = torchvision.models.convnext_small(weights="IMAGENET1K_V1").eval() + passes_job = get_capture_program_passes() + passes_job[ExpandBroadcastTensorShape][QCOM_PASS_ACTIVATE_KEY] = True + build_executorch_binary( + instance, + inputs[0], + args.model, + f"{args.artifact}/{pte_filename}", + inputs, + custom_quantizer=make_quantizer( + quant_dtype=QuantDtype.use_8a8w, + per_channel_linear=True, + ), + passes_job=passes_job, + shared_buffer=args.shared_buffer, + ) + + if args.compile_only: + return + + adb = SimpleADB( + qnn_sdk=os.getenv("QNN_SDK_ROOT"), + build_path=f"{args.build_folder}", + pte_path=f"{args.artifact}/{pte_filename}.pte", + workspace=f"/data/local/tmp/executorch/{pte_filename}", + device_id=args.device, + host_id=args.host, + soc_model=args.model, + shared_buffer=args.shared_buffer, + ) + adb.push(inputs=inputs) + adb.execute() + + # collect output data + output_data_folder = f"{args.artifact}/outputs" + make_output_dir(output_data_folder) + + adb.pull(output_path=args.artifact) + + # top-k analysis + predictions = [] + for i in range(data_num): + predictions.append( + np.fromfile( + os.path.join(output_data_folder, f"output_{i}_0.raw"), dtype=np.float32 + ) + ) + + k_val = [1, 5] + topk = [topk_accuracy(predictions, targets, k).item() for k in k_val] + if args.ip and args.port != -1: + with Client((args.ip, args.port)) as conn: + conn.send(json.dumps({f"top_{k}": topk[i] for i, k in enumerate(k_val)})) + else: + for i, k in enumerate(k_val): + print(f"top_{k}->{topk[i]}%") + + +if __name__ == "__main__": + parser = setup_common_args_and_variables() + parser.add_argument( + "-d", + "--dataset", + help=( + "path to the validation folder of ImageNet dataset. " + "e.g. --dataset imagenet-mini/val " + "for https://www.kaggle.com/datasets/ifigotin/imagenetmini-1000)" + ), + type=str, + required=False, + ) + parser.add_argument( + "-a", + "--artifact", + help="path for storing generated artifacts by this example. " + "Default ./convnext_small", + default="./convnext_small", + type=str, + ) + + args = parser.parse_args() + args.validate(args) + try: + main(args) + except Exception as e: + if args.ip and args.port != -1: + with Client((args.ip, args.port)) as conn: + conn.send(json.dumps({"Error": str(e)})) + else: + raise Exception(e) diff --git a/examples/qualcomm/oss_scripts/maxvit_t.py b/examples/qualcomm/oss_scripts/maxvit_t.py new file mode 100755 index 00000000000..7a53edd715b --- /dev/null +++ b/examples/qualcomm/oss_scripts/maxvit_t.py @@ -0,0 +1,244 @@ +# Copyright (c) Qualcomm Innovation Center, Inc. +# All rights reserved +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +import functools +import json +import logging +import os + +from multiprocessing.connection import Client + +import numpy as np + +import torch +import torch.nn.functional as F +import torchvision + +from executorch.backends.qualcomm.quantizer.quantizer import QuantDtype +from executorch.examples.qualcomm.utils import ( + build_executorch_binary, + get_imagenet_dataset, + make_output_dir, + make_quantizer, + setup_common_args_and_variables, + SimpleADB, + topk_accuracy, +) +from torchvision.models.maxvit import ( + PartitionAttentionLayer, + RelativePositionalMultiHeadAttention, +) + + +class WindowPartition(torch.nn.Module): + """ + Partition the input tensor into non-overlapping windows. + """ + + def __init__(self) -> None: + super().__init__() + + def forward(self, x: torch.Tensor, p: int) -> torch.Tensor: + """ + Args: + x (Tensor): Input tensor with expected layout of [B, C, H, W]. + p (int): Number of partitions. + Returns: + Tensor: Output tensor with expected layout of [B, H/P, W/P, P*P, C]. + """ + B, C, H, W = x.shape + P = p + # chunk up H and W dimensions + x = x.reshape(B * C, H // P, P, W // P, P) + x = x.permute(0, 1, 3, 2, 4) + # colapse P * P dimension + x = x.reshape(B, C, (H // P) * (W // P), P * P) + return x.permute(0, 2, 3, 1) + + +class WindowDepartition(torch.nn.Module): + """ + Departition the input tensor of non-overlapping windows into a feature volume of layout [B, C, H, W]. + """ + + def __init__(self) -> None: + super().__init__() + + def forward( + self, x: torch.Tensor, p: int, h_partitions: int, w_partitions: int + ) -> torch.Tensor: + """ + Args: + x (Tensor): Input tensor with expected layout of [B, (H/P * W/P), P*P, C]. + p (int): Number of partitions. + h_partitions (int): Number of vertical partitions. + w_partitions (int): Number of horizontal partitions. + Returns: + Tensor: Output tensor with expected layout of [B, C, H, W]. + """ + B, G, PP, C = x.shape + P = p + HP, WP = h_partitions, w_partitions + x = x.permute(0, 3, 1, 2) + # split P * P dimension into 2 P tile dimensionsa + x = x.reshape(B * C, HP, WP, P, P) + # permute into B * C, HP, P, WP, P + x = x.permute(0, 1, 3, 2, 4) + # reshape into B, C, H, W + x = x.reshape(B, C, HP * P, WP * P) + return x + + +def forward(self, x: torch.Tensor) -> torch.Tensor: + """ + Args: + x (Tensor): Input tensor with expected layout of [B, G, P, D]. + Returns: + Tensor: Output tensor with expected layout of [B, G, P, D]. + """ + B, G, P, D = x.shape + H, DH = self.n_heads, self.head_dim + + qkv = self.to_qkv(x) + q, k, v = torch.chunk(qkv, 3, dim=-1) + + q = q.reshape(B * G, P, H, DH).permute(0, 2, 1, 3) + k = k.reshape(B * G, P, H, DH).permute(0, 2, 1, 3) + v = v.reshape(B * G, P, H, DH).permute(0, 2, 1, 3) + + k = k * self.scale_factor + dot_prod = torch.einsum("B H I D, B H J D -> B H I J", q, k) + pos_bias = self.get_relative_positional_bias() + + dot_prod = F.softmax(dot_prod + pos_bias, dim=-1) + + out = torch.einsum("B H I J, B H J D -> B H I D", dot_prod, v) + out = out.permute(0, 2, 1, 3).reshape(B, G, P, D) + + out = self.merge(out) + return out + + +def main(args): + # ensure the working directory exist. + os.makedirs(args.artifact, exist_ok=True) + + data_num = 100 + if args.ci: + inputs = [(torch.rand(1, 3, 224, 224),)] + logging.warning( + "This option is for CI to verify the export flow. It uses random input and will result in poor accuracy." + ) + else: + inputs, targets = get_imagenet_dataset( + dataset_path=f"{args.dataset}", + data_size=data_num, + image_shape=(256, 256), + crop_size=224, + ) + + pte_filename = "maxvit_t_qnn_q8" + instance = torchvision.models.maxvit_t(weights="IMAGENET1K_V1").eval() + for block in instance.blocks: + for layer in block.layers: + for sub_layer in layer.layers: + if isinstance(sub_layer, PartitionAttentionLayer): + sub_layer.partition_op = WindowPartition() + sub_layer.departition_op = WindowDepartition() + for attn_sub_layer in sub_layer.attn_layer: + if isinstance( + attn_sub_layer, RelativePositionalMultiHeadAttention + ): + attn_sub_layer.forward = functools.partial( + forward, attn_sub_layer + ) + + build_executorch_binary( + instance, + inputs[0], + args.model, + f"{args.artifact}/{pte_filename}", + inputs, + custom_quantizer=make_quantizer( + quant_dtype=QuantDtype.use_8a8w, + per_channel_linear=True, + ), + shared_buffer=args.shared_buffer, + ) + + if args.compile_only: + return + + adb = SimpleADB( + qnn_sdk=os.getenv("QNN_SDK_ROOT"), + build_path=f"{args.build_folder}", + pte_path=f"{args.artifact}/{pte_filename}.pte", + workspace=f"/data/local/tmp/executorch/{pte_filename}", + device_id=args.device, + host_id=args.host, + soc_model=args.model, + shared_buffer=args.shared_buffer, + ) + adb.push(inputs=inputs) + adb.execute() + + # collect output data + output_data_folder = f"{args.artifact}/outputs" + make_output_dir(output_data_folder) + + adb.pull(output_path=args.artifact) + + # top-k analysis + predictions = [] + for i in range(data_num): + predictions.append( + np.fromfile( + os.path.join(output_data_folder, f"output_{i}_0.raw"), dtype=np.float32 + ) + ) + + k_val = [1, 5] + topk = [topk_accuracy(predictions, targets, k).item() for k in k_val] + if args.ip and args.port != -1: + with Client((args.ip, args.port)) as conn: + conn.send(json.dumps({f"top_{k}": topk[i] for i, k in enumerate(k_val)})) + else: + for i, k in enumerate(k_val): + print(f"top_{k}->{topk[i]}%") + + +if __name__ == "__main__": + parser = setup_common_args_and_variables() + parser.add_argument( + "-d", + "--dataset", + help=( + "path to the validation folder of ImageNet dataset. " + "e.g. --dataset imagenet-mini/val " + "for https://www.kaggle.com/datasets/ifigotin/imagenetmini-1000)" + ), + type=str, + required=False, + ) + parser.add_argument( + "-a", + "--artifact", + help="path for storing generated artifacts by this example. " + "Default ./maxvit_t", + default="./maxvit_t", + type=str, + ) + + args = parser.parse_args() + args.validate(args) + try: + main(args) + except Exception as e: + if args.ip and args.port != -1: + with Client((args.ip, args.port)) as conn: + conn.send(json.dumps({"Error": str(e)})) + else: + raise Exception(e) diff --git a/examples/qualcomm/oss_scripts/swin_v2_t.py b/examples/qualcomm/oss_scripts/swin_v2_t.py new file mode 100755 index 00000000000..954c27f428f --- /dev/null +++ b/examples/qualcomm/oss_scripts/swin_v2_t.py @@ -0,0 +1,185 @@ +# Copyright (c) Qualcomm Innovation Center, Inc. +# All rights reserved +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +import json +import logging +import os + +from multiprocessing.connection import Client + +import numpy as np + +import torch +import torchvision +from executorch.backends.qualcomm._passes.qnn_pass_manager import ( + FoldQDQ, + get_capture_program_passes, + get_passes_dependency_for_capture_program, + QCOM_PASS_ACTIVATE_KEY, + QCOM_PASS_ARGS_KWARGS_DEFAULTS_KEY, +) + +from executorch.backends.qualcomm.quantizer.quantizer import QuantDtype +from executorch.examples.qualcomm.utils import ( + build_executorch_binary, + get_imagenet_dataset, + make_output_dir, + make_quantizer, + setup_common_args_and_variables, + SimpleADB, + topk_accuracy, +) +from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass, PassResult + + +class RewritePartition(ExportPass): + """ + Rewrite 6D window partition pattern to 5D one. + """ + + def __init__(self): + super(RewritePartition, self).__init__() + + def call(self, graph_module: torch.fx.GraphModule): + graph = graph_module.graph + # math equivalent implementation + for node in graph.nodes: + if ( + node.op == "call_function" + and node.target == exir_ops.edge.aten.permute_copy.default + and node.args[1] == [0, 1, 3, 2, 4, 5] + ): + # adjust original view node to take 5D tensor + view_node = node.args[0] + b, n_window_h, window_h, n_window_w, window_w, c = view_node.args[1] + shape = [b, n_window_h, window_h, n_window_w, window_w * c] + view_node.args = (view_node.args[0], shape) + view_node.meta["val"] = view_node.meta["val"].reshape(shape) + # change current permute node accordingly + axis_order = [0, 1, 3, 2, 4] + node.args = (view_node, axis_order) + node.meta["val"] = view_node.meta["val"].permute(axis_order) + + graph_module.recompile() + return PassResult(graph_module, True) + + +def main(args): + # ensure the working directory exist. + os.makedirs(args.artifact, exist_ok=True) + + data_num = 100 + if args.ci: + inputs = [(torch.rand(1, 3, 224, 224),)] + logging.warning( + "This option is for CI to verify the export flow. It uses random input and will result in poor accuracy." + ) + else: + inputs, targets = get_imagenet_dataset( + dataset_path=f"{args.dataset}", + data_size=data_num, + image_shape=(256, 256), + crop_size=224, + ) + + pte_filename = "swin_v2_t_qnn_q8" + instance = torchvision.models.swin_v2_t(weights="IMAGENET1K_V1").eval() + passes_job = get_capture_program_passes() + passes_job[RewritePartition] = { + QCOM_PASS_ACTIVATE_KEY: True, + QCOM_PASS_ARGS_KWARGS_DEFAULTS_KEY: {}, + } + passes_dep = get_passes_dependency_for_capture_program() + passes_dep[RewritePartition] = [FoldQDQ] + build_executorch_binary( + instance, + inputs[0], + args.model, + f"{args.artifact}/{pte_filename}", + inputs, + custom_quantizer=make_quantizer( + quant_dtype=QuantDtype.use_8a8w, + per_channel_linear=True, + ), + shared_buffer=args.shared_buffer, + passes_job=passes_job, + passes_dependency=passes_dep, + ) + + if args.compile_only: + return + + adb = SimpleADB( + qnn_sdk=os.getenv("QNN_SDK_ROOT"), + build_path=f"{args.build_folder}", + pte_path=f"{args.artifact}/{pte_filename}.pte", + workspace=f"/data/local/tmp/executorch/{pte_filename}", + device_id=args.device, + host_id=args.host, + soc_model=args.model, + shared_buffer=args.shared_buffer, + ) + adb.push(inputs=inputs) + adb.execute() + + # collect output data + output_data_folder = f"{args.artifact}/outputs" + make_output_dir(output_data_folder) + + adb.pull(output_path=args.artifact) + + # top-k analysis + predictions = [] + for i in range(data_num): + predictions.append( + np.fromfile( + os.path.join(output_data_folder, f"output_{i}_0.raw"), dtype=np.float32 + ) + ) + + k_val = [1, 5] + topk = [topk_accuracy(predictions, targets, k).item() for k in k_val] + if args.ip and args.port != -1: + with Client((args.ip, args.port)) as conn: + conn.send(json.dumps({f"top_{k}": topk[i] for i, k in enumerate(k_val)})) + else: + for i, k in enumerate(k_val): + print(f"top_{k}->{topk[i]}%") + + +if __name__ == "__main__": + parser = setup_common_args_and_variables() + parser.add_argument( + "-d", + "--dataset", + help=( + "path to the validation folder of ImageNet dataset. " + "e.g. --dataset imagenet-mini/val " + "for https://www.kaggle.com/datasets/ifigotin/imagenetmini-1000)" + ), + type=str, + required=False, + ) + parser.add_argument( + "-a", + "--artifact", + help="path for storing generated artifacts by this example. " + "Default ./swin_v2_t", + default="./swin_v2_t", + type=str, + ) + + args = parser.parse_args() + args.validate(args) + try: + main(args) + except Exception as e: + if args.ip and args.port != -1: + with Client((args.ip, args.port)) as conn: + conn.send(json.dumps({"Error": str(e)})) + else: + raise Exception(e) diff --git a/examples/qualcomm/oss_scripts/vit_b_16.py b/examples/qualcomm/oss_scripts/vit_b_16.py new file mode 100755 index 00000000000..6b79ecc7cda --- /dev/null +++ b/examples/qualcomm/oss_scripts/vit_b_16.py @@ -0,0 +1,135 @@ +# Copyright (c) Qualcomm Innovation Center, Inc. +# All rights reserved +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +import json +import logging +import os + +from multiprocessing.connection import Client + +import numpy as np + +import torch +import torchvision + +from executorch.backends.qualcomm.quantizer.quantizer import QuantDtype +from executorch.examples.qualcomm.utils import ( + build_executorch_binary, + get_imagenet_dataset, + make_output_dir, + make_quantizer, + setup_common_args_and_variables, + SimpleADB, + topk_accuracy, +) + + +def main(args): + # ensure the working directory exist. + os.makedirs(args.artifact, exist_ok=True) + + data_num = 100 + if args.ci: + inputs = [(torch.rand(1, 3, 224, 224),)] + logging.warning( + "This option is for CI to verify the export flow. It uses random input and will result in poor accuracy." + ) + else: + inputs, targets = get_imagenet_dataset( + dataset_path=f"{args.dataset}", + data_size=data_num, + image_shape=(256, 256), + crop_size=224, + ) + + pte_filename = "vit_b_16_qnn_q8" + instance = torchvision.models.vit_b_16(weights="IMAGENET1K_V1").eval() + build_executorch_binary( + instance, + inputs[0], + args.model, + f"{args.artifact}/{pte_filename}", + inputs, + custom_quantizer=make_quantizer( + quant_dtype=QuantDtype.use_8a8w, + per_channel_linear=True, + ), + shared_buffer=args.shared_buffer, + ) + + if args.compile_only: + return + + adb = SimpleADB( + qnn_sdk=os.getenv("QNN_SDK_ROOT"), + build_path=f"{args.build_folder}", + pte_path=f"{args.artifact}/{pte_filename}.pte", + workspace=f"/data/local/tmp/executorch/{pte_filename}", + device_id=args.device, + host_id=args.host, + soc_model=args.model, + shared_buffer=args.shared_buffer, + ) + adb.push(inputs=inputs) + adb.execute() + + # collect output data + output_data_folder = f"{args.artifact}/outputs" + make_output_dir(output_data_folder) + + adb.pull(output_path=args.artifact) + + # top-k analysis + predictions = [] + for i in range(data_num): + predictions.append( + np.fromfile( + os.path.join(output_data_folder, f"output_{i}_0.raw"), dtype=np.float32 + ) + ) + + k_val = [1, 5] + topk = [topk_accuracy(predictions, targets, k).item() for k in k_val] + if args.ip and args.port != -1: + with Client((args.ip, args.port)) as conn: + conn.send(json.dumps({f"top_{k}": topk[i] for i, k in enumerate(k_val)})) + else: + for i, k in enumerate(k_val): + print(f"top_{k}->{topk[i]}%") + + +if __name__ == "__main__": + parser = setup_common_args_and_variables() + parser.add_argument( + "-d", + "--dataset", + help=( + "path to the validation folder of ImageNet dataset. " + "e.g. --dataset imagenet-mini/val " + "for https://www.kaggle.com/datasets/ifigotin/imagenetmini-1000)" + ), + type=str, + required=False, + ) + parser.add_argument( + "-a", + "--artifact", + help="path for storing generated artifacts by this example. " + "Default ./vit_b_16", + default="./vit_b_16", + type=str, + ) + + args = parser.parse_args() + args.validate(args) + try: + main(args) + except Exception as e: + if args.ip and args.port != -1: + with Client((args.ip, args.port)) as conn: + conn.send(json.dumps({"Error": str(e)})) + else: + raise Exception(e) diff --git a/examples/qualcomm/utils.py b/examples/qualcomm/utils.py index e43821bda64..11b9ab88bfe 100755 --- a/examples/qualcomm/utils.py +++ b/examples/qualcomm/utils.py @@ -384,6 +384,7 @@ def build_executorch_binary( metadata=None, dump_intermediate_outputs=False, passes_job=None, + passes_dependency=None, qat_training_data=None, online_prepare=False, optrace=False, @@ -406,6 +407,7 @@ def build_executorch_binary( metadata (dict, optional): An optional dictionary that maps each method name to a constant value in eager mode. dump_intermediate_outputs (bool, optional): Enables dumping model intermediate outputs. passes_job (OrderedDict, optional): Custom passes job in capture_program, users can enable/disable specific passes or modify their attributes. + passes_dependency (Dict, optional): A dictionary mapping each pass to its corresponding list of dependencies. qat_training_data (List[torch.Tensor], optional): A dataset for quantization aware training(QAT). Typically is a pair of tensors, such as [features, ground truth]. online_prepare (bool, optional): Compose QNN graph on device if set to True. optrace (bool, optional): Enable optrace mode for performance analysis if set to True. @@ -449,6 +451,7 @@ def build_executorch_binary( compile_spec, constant_methods=metadata, passes_job=passes_job, + dep_table=passes_dependency, skip_node_id_set=skip_node_id_set, skip_node_op_set=skip_node_op_set, ) From cb42db2866bd630d63bf21ee56f237c1e13ca3c5 Mon Sep 17 00:00:00 2001 From: Adi Date: Wed, 17 Sep 2025 22:54:18 +0100 Subject: [PATCH 019/395] Fix format string error in Android PAL initialization assert macro Differential Revision: D81949537 Pull Request resolved: https://github.com/pytorch/executorch/pull/14119 --- runtime/platform/default/android.cpp | 1 - 1 file changed, 1 deletion(-) diff --git a/runtime/platform/default/android.cpp b/runtime/platform/default/android.cpp index 5945bf54842..fdaf7db3b1b 100644 --- a/runtime/platform/default/android.cpp +++ b/runtime/platform/default/android.cpp @@ -46,7 +46,6 @@ __android_log_print( \ ANDROID_LOG_FATAL, \ "ExecuTorch", \ - "%s", \ "ExecuTorch PAL must be initialized before call to %s()", \ ET_FUNCTION); \ } \ From 487214161f0b51188224dfe07fcabc6b8f8a01c4 Mon Sep 17 00:00:00 2001 From: pytorchbot Date: Wed, 17 Sep 2025 18:50:26 -0400 Subject: [PATCH 020/395] [ExecuTorch] Arm backend: disable Misc tests for buck testing to unblock oss PRs (#14395) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: https://github.com/pytorch/executorch/pull/14380 by @digantdesai ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/digantdesai/49/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/digantdesai/49/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/digantdesai/49/orig @diff-train-skip-merge Co-authored-by: Digant Desai --- backends/arm/test/targets.bzl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/backends/arm/test/targets.bzl b/backends/arm/test/targets.bzl index 7634eed7a53..00ec87f928e 100644 --- a/backends/arm/test/targets.bzl +++ b/backends/arm/test/targets.bzl @@ -4,7 +4,7 @@ load("@fbcode_macros//build_defs:python_pytest.bzl", "python_pytest") load("@bazel_skylib//lib:paths.bzl", "paths") def define_arm_tests(): - # TODO Add more tests + # TODO [fbonly] Add more tests test_files = [] # Passes @@ -39,7 +39,7 @@ def define_arm_tests(): "misc/test_bn_relu_folding_qat.py", "misc/test_custom_partition.py", "misc/test_debug_hook.py", - "misc/test_dim_order.py", + # "misc/test_dim_order.py", (TODO - T238390249) "misc/test_outputs_order.py", ] From e31cef61ccaba9171fcad17b32d9045218ecabea Mon Sep 17 00:00:00 2001 From: Mengwei Liu Date: Wed, 17 Sep 2025 15:59:22 -0700 Subject: [PATCH 021/395] Rename image_encoder to vision_encoder to match HF naming convention (#14392) Summary: As titled. We want to align with `optimum-executorch` naming convension ( which comes from HF `transformers`): https://github.com/huggingface/optimum-executorch/blob/main/optimum/exporters/executorch/tasks/multimodal_text_to_text.py#L238 Differential Revision: D82677835 --- examples/models/llava/export_llava.py | 6 +++--- examples/models/llava/test/test_llava.py | 2 +- examples/models/llava/test/test_pte.py | 2 +- extension/llm/runner/constants.h | 2 +- extension/llm/runner/multimodal_prefiller.cpp | 14 +++++++------- 5 files changed, 13 insertions(+), 13 deletions(-) diff --git a/examples/models/llava/export_llava.py b/examples/models/llava/export_llava.py index 7e571087c1d..62ddfc5c363 100644 --- a/examples/models/llava/export_llava.py +++ b/examples/models/llava/export_llava.py @@ -224,12 +224,12 @@ def export_all(llava_model: LlavaModel): lowered_and_edge = to_edge_transform_and_lower( { - "image_encoder": image_encoder_ep, + "vision_encoder": image_encoder_ep, "token_embedding": token_embedding_ep, "text_decoder": text_model_ep, }, partitioner={ - "image_encoder": [XnnpackPartitioner()], + "vision_encoder": [XnnpackPartitioner()], "text_decoder": [ # First partition the DQLinear nodes, then partition the rest of the nodes, # to avoid multiple DQLinear nodes in the same partition, @@ -254,7 +254,7 @@ def export_all(llava_model: LlavaModel): ], memory_planning_pass=MemoryPlanningPass(alloc_graph_input=False), sym_shape_eval_pass={ - "image_encoder": ConstraintBasedSymShapeEvalPass(), + "vision_encoder": ConstraintBasedSymShapeEvalPass(), "text_decoder": ConstraintBasedSymShapeEvalPass(), "token_embedding": HintBasedSymShapeEvalPass(), }, diff --git a/examples/models/llava/test/test_llava.py b/examples/models/llava/test/test_llava.py index 7f2b59e0116..1708cdcd516 100644 --- a/examples/models/llava/test/test_llava.py +++ b/examples/models/llava/test/test_llava.py @@ -105,7 +105,7 @@ def test_llava_export(self): start_pos += pte_embeds_before_img.shape[1] # pte prefill image - pte_embeds_img = llava_module.run_method("image_encoder", (resized,))[0] + pte_embeds_img = llava_module.run_method("vision_encoder", (resized,))[0] llava_module.run_method( "text_decoder", ( diff --git a/examples/models/llava/test/test_pte.py b/examples/models/llava/test/test_pte.py index 1f4aaa9938c..4b924aed680 100644 --- a/examples/models/llava/test/test_pte.py +++ b/examples/models/llava/test/test_pte.py @@ -56,7 +56,7 @@ def main(): # pte prefill image logging.warning("Image encoder started") - pte_embeds_img = llava_module.run_method("image_encoder", (resized,))[0] + pte_embeds_img = llava_module.run_method("vision_encoder", (resized,))[0] logging.warning("Image encoder finished") logging.warning("Image token prefill started") pte_prefill_img = llava_module.run_method( diff --git a/extension/llm/runner/constants.h b/extension/llm/runner/constants.h index 4ba88203c50..d7b36077757 100644 --- a/extension/llm/runner/constants.h +++ b/extension/llm/runner/constants.h @@ -20,7 +20,7 @@ inline constexpr auto kUseKVCache = "use_kv_cache"; inline constexpr auto kUseSDPAWithKVCache = "use_sdpa_with_kv_cache"; // Multimodal method name conventions -inline constexpr auto kImageEncoderMethod = "image_encoder"; +inline constexpr auto kVisionEncoderMethod = "vision_encoder"; inline constexpr auto kAudioEncoderMethod = "audio_encoder"; inline constexpr auto kTokenEmbeddingMethod = "token_embedding"; inline constexpr auto kTextModelMethod = "text_decoder"; diff --git a/extension/llm/runner/multimodal_prefiller.cpp b/extension/llm/runner/multimodal_prefiller.cpp index 3f8777d4acf..f9645667f24 100644 --- a/extension/llm/runner/multimodal_prefiller.cpp +++ b/extension/llm/runner/multimodal_prefiller.cpp @@ -43,9 +43,9 @@ Result MultimodalPrefiller::prefill( Image image = input.get_image(); auto method_meta = ET_UNWRAP( - module_->method_meta(kImageEncoderMethod), + module_->method_meta(kVisionEncoderMethod), "Failed to get method_meta for %s", - kImageEncoderMethod); + kVisionEncoderMethod); ET_CHECK_MSG( method_meta.num_inputs() > 0, @@ -80,7 +80,7 @@ Result MultimodalPrefiller::prefill( // Run image encoder auto image_encoder_outputs = - ET_UNWRAP(module_->execute(kImageEncoderMethod, image_tensor)); + ET_UNWRAP(module_->execute(kVisionEncoderMethod, image_tensor)); encoder_output = image_encoder_outputs[0]; } else if (input.is_audio()) { @@ -175,8 +175,8 @@ ::executorch::runtime::Error MultimodalPrefiller::load() { ET_UNWRAP(module_->method_names(), "Failed to get method names"); // Load image_encoder method if exists. - if (methods.find(kImageEncoderMethod) != methods.end()) { - ET_CHECK_OK_OR_RETURN_ERROR(module_->load_method(kImageEncoderMethod)); + if (methods.find(kVisionEncoderMethod) != methods.end()) { + ET_CHECK_OK_OR_RETURN_ERROR(module_->load_method(kVisionEncoderMethod)); } if (methods.find(kAudioEncoderMethod) != methods.end()) { @@ -203,8 +203,8 @@ bool MultimodalPrefiller::is_method_loaded() { ET_CHECK_MSG(false, "Failed to get method names"); } std::unordered_set methods = methods_res.get(); - if (methods.find(kImageEncoderMethod) != methods.end()) { - return module_->is_method_loaded(kImageEncoderMethod); + if (methods.find(kVisionEncoderMethod) != methods.end()) { + return module_->is_method_loaded(kVisionEncoderMethod); } return true; } From 82c1d772f74beca46dd43380126dcb34500902ff Mon Sep 17 00:00:00 2001 From: Scott Roy <161522778+metascroy@users.noreply.github.com> Date: Wed, 17 Sep 2025 17:13:04 -0700 Subject: [PATCH 022/395] Back out "Improve asset management" Differential Revision: D82581443 Pull Request resolved: https://github.com/pytorch/executorch/pull/14383 --- .../runtime/delegate/ETCoreMLAssetManager.h | 17 -- .../runtime/delegate/ETCoreMLAssetManager.mm | 104 ++++----- .../runtime/delegate/ETCoreMLModelLoader.mm | 19 +- .../runtime/delegate/ETCoreMLModelManager.mm | 202 +++++++----------- 4 files changed, 127 insertions(+), 215 deletions(-) diff --git a/backends/apple/coreml/runtime/delegate/ETCoreMLAssetManager.h b/backends/apple/coreml/runtime/delegate/ETCoreMLAssetManager.h index a9e06efa90d..11d957044e9 100644 --- a/backends/apple/coreml/runtime/delegate/ETCoreMLAssetManager.h +++ b/backends/apple/coreml/runtime/delegate/ETCoreMLAssetManager.h @@ -99,17 +99,6 @@ NS_ASSUME_NONNULL_BEGIN - (NSUInteger)compact:(NSUInteger)sizeInBytes error:(NSError* __autoreleasing*)error; -/// Executes a block with a unique temporary directory. -/// -/// A new temporary subdirectory URL is created inside the receiver’s designated -/// base directory. The directory is passed to the block, which can use it to -/// perform temporary file operations. After the block finishes executing, -/// the directory and its contents are removed. -/// -/// @param block A block to execute. The block receives a unique URL. -- (void)withTemporaryDirectory:(void (^)(NSURL* directoryURL))block; - - /// Purges the assets storage. The assets are moved to the trash directory and are asynchronously /// deleted. /// @@ -128,12 +117,6 @@ NS_ASSUME_NONNULL_BEGIN /// contents are deleted asynchronously. @property (copy, readonly, nonatomic) NSURL* trashDirectoryURL; - -/// The staging directory URL, used to hold assets that are being prepared or processed -/// before they are moved into their final location. The contents of this directory -/// are temporary and may be cleared when no longer needed. -@property (copy, readonly, nonatomic) NSURL* stagingDirectoryURL; - /// The file manager. @property (strong, readonly, nonatomic) NSFileManager* fileManager; diff --git a/backends/apple/coreml/runtime/delegate/ETCoreMLAssetManager.mm b/backends/apple/coreml/runtime/delegate/ETCoreMLAssetManager.mm index 53c3d1cdc69..256026e1f09 100644 --- a/backends/apple/coreml/runtime/delegate/ETCoreMLAssetManager.mm +++ b/backends/apple/coreml/runtime/delegate/ETCoreMLAssetManager.mm @@ -254,29 +254,6 @@ BOOL is_asset_alive(NSMapTable *assets_in_use_map, return assets; } - -NSURL * _Nullable move_to_directory(NSURL *url, - NSURL *directoryURL, - NSFileManager *fileManager, - NSError * __autoreleasing *error) { - if (!url) { - ETCoreMLLogErrorAndSetNSError(error, ETCoreMLErrorInternalError, "Move operation failed: source URL is nil."); - return nil; - } - - if (!directoryURL) { - ETCoreMLLogErrorAndSetNSError(error, ETCoreMLErrorInternalError, "Move operation failed: destination URL is nil."); - return nil; - } - - NSURL *dstURL = [directoryURL URLByAppendingPathComponent:[NSUUID UUID].UUIDString]; - if (![fileManager moveItemAtURL:url toURL:dstURL error:error]) { - return nil; - } - - return dstURL; -} - } //namespace @interface ETCoreMLAssetManager () { @@ -322,17 +299,12 @@ - (nullable instancetype)initWithDatabase:(const std::shared_ptr&)data if (!managedAssetsDirectoryURL) { return nil; } - + NSURL *managedTrashDirectoryURL = ::create_directory_if_needed(trashDirectoryURL, @"models", fileManager, error); if (!managedTrashDirectoryURL) { return nil; } - - NSURL *managedStagingDirectoryURL = ::create_directory_if_needed(assetsDirectoryURL, @"staging", fileManager, error); - if (!managedStagingDirectoryURL) { - return nil; - } - + // If directory is empty then purge the stores if (::is_directory_empty(managedAssetsDirectoryURL, fileManager, nil)) { assetsMetaStore.impl()->purge(ec); @@ -343,7 +315,6 @@ - (nullable instancetype)initWithDatabase:(const std::shared_ptr&)data _assetsStore = std::move(assetsStore); _assetsMetaStore = std::move(assetsMetaStore); _assetsDirectoryURL = managedAssetsDirectoryURL; - _stagingDirectoryURL = managedStagingDirectoryURL; _trashDirectoryURL = managedTrashDirectoryURL; _estimatedSizeInBytes = sizeInBytes.value(); _maxAssetsSizeInBytes = maxAssetsSizeInBytes; @@ -375,15 +346,15 @@ - (nullable instancetype)initWithDatabaseURL:(NSURL *)databaseURL error:error]; } -- (void)withTemporaryDirectory:(void (^)(NSURL *directoryURL))block { - NSURL *dstURL = [self.stagingDirectoryURL URLByAppendingPathComponent:[NSUUID UUID].UUIDString]; - block(dstURL); - if (![self.fileManager fileExistsAtPath:dstURL.path]) { - return; +- (nullable NSURL *)moveURL:(NSURL *)url + toUniqueURLInDirectory:(NSURL *)directoryURL + error:(NSError * __autoreleasing *)error { + NSURL *dstURL = [directoryURL URLByAppendingPathComponent:[NSUUID UUID].UUIDString]; + if (![self.fileManager moveItemAtURL:url toURL:dstURL error:error]) { + return nil; } - - move_to_directory(dstURL, self.trashDirectoryURL, self.fileManager, nil); - [self cleanupTrashDirectory]; + + return dstURL; } - (void)cleanupAssetIfNeeded:(ETCoreMLAsset *)asset { @@ -436,8 +407,9 @@ - (nullable ETCoreMLAsset *)_storeAssetAtURL:(NSURL *)srcURL return false; } - // If a file already exists at `dstURL`, move it to the trash for removal. - move_to_directory(dstURL, self.trashDirectoryURL, self.fileManager, nil); + // If an asset exists move it + [self moveURL:dstURL toUniqueURLInDirectory:self.trashDirectoryURL error:nil]; + // Move the asset to assets directory. if (![self.fileManager moveItemAtURL:srcURL toURL:dstURL error:error]) { return false; @@ -461,25 +433,16 @@ - (nullable ETCoreMLAsset *)_storeAssetAtURL:(NSURL *)srcURL } - (void)triggerCompaction { - if (self.estimatedSizeInBytes >= self.maxAssetsSizeInBytes) { - __weak __typeof(self) weakSelf = self; - dispatch_async(self.syncQueue, ^{ - NSError *localError = nil; - if (![weakSelf _compact:self.maxAssetsSizeInBytes error:&localError]) { - ETCoreMLLogError(localError, "Failed to compact asset store."); - } - }); + if (self.estimatedSizeInBytes < self.maxAssetsSizeInBytes) { + return; } - - // Always clean the trash directory to ensure a minimal footprint. - // The `trashQueue` is serialized, so only one cleanup will run at a time. - [self cleanupTrashDirectory]; -} - -- (void)cleanupTrashDirectory { + __weak __typeof(self) weakSelf = self; - dispatch_async(self.trashQueue, ^{ - [weakSelf removeFilesInTrashDirectory]; + dispatch_async(self.syncQueue, ^{ + NSError *localError = nil; + if (![weakSelf _compact:self.maxAssetsSizeInBytes error:&localError]) { + ETCoreMLLogError(localError, "Failed to compact asset store."); + } }); } @@ -585,7 +548,7 @@ - (BOOL)_removeAssetWithIdentifier:(NSString *)identifier NSURL *assetURL = ::get_asset_url(assetValue); if ([self.fileManager fileExistsAtPath:assetURL.path] && - !move_to_directory(assetURL, self.trashDirectoryURL, self.fileManager, error)) { + ![self moveURL:assetURL toUniqueURLInDirectory:self.trashDirectoryURL error:error]) { return false; } @@ -686,7 +649,13 @@ - (NSUInteger)_compact:(NSUInteger)sizeInBytes error:(NSError * __autoreleasing identifier); } } - + + // Trigger cleanup. + __weak __typeof(self) weakSelf = self; + dispatch_async(self.trashQueue, ^{ + [weakSelf removeFilesInTrashDirectory]; + }); + return _estimatedSizeInBytes; } @@ -695,10 +664,7 @@ - (NSUInteger)compact:(NSUInteger)sizeInBytes error:(NSError * __autoreleasing * dispatch_sync(self.syncQueue, ^{ result = [self _compact:sizeInBytes error:error]; }); - - // Always clean the trash directory to ensure a minimal footprint. - // The `trashQueue` is serialized, so only one cleanup will run at a time. - [self cleanupTrashDirectory]; + return result; } @@ -742,7 +708,7 @@ - (BOOL)_purge:(NSError * __autoreleasing *)error { } // Move the the whole assets directory to the temp directory. - if (!move_to_directory(self.assetsDirectoryURL, self.trashDirectoryURL, self.fileManager, error)) { + if (![self moveURL:self.assetsDirectoryURL toUniqueURLInDirectory:self.trashDirectoryURL error:error]) { return false; } @@ -758,7 +724,13 @@ - (BOOL)_purge:(NSError * __autoreleasing *)error { ::set_error_from_error_code(ec, error); // Trigger cleanup - [self cleanupTrashDirectory]; + if (status) { + __weak __typeof(self) weakSelf = self; + dispatch_async(self.trashQueue, ^{ + [weakSelf removeFilesInTrashDirectory]; + }); + } + return static_cast(status); } diff --git a/backends/apple/coreml/runtime/delegate/ETCoreMLModelLoader.mm b/backends/apple/coreml/runtime/delegate/ETCoreMLModelLoader.mm index 9e8ae04842e..05aa910d954 100644 --- a/backends/apple/coreml/runtime/delegate/ETCoreMLModelLoader.mm +++ b/backends/apple/coreml/runtime/delegate/ETCoreMLModelLoader.mm @@ -62,12 +62,21 @@ + (nullable ETCoreMLModel *)loadModelWithContentsOfURL:(NSURL *)compiledModelURL if (model) { return model; } - - if (error) { - *error = localError; + + if (localError) { + ETCoreMLLogError(localError, + "Failed to load model from compiled asset with identifier = %@", + identifier); } - - return nil; + + // If store failed then we will load the model from compiledURL. + auto backingAsset = Asset::make(compiledModelURL, identifier, assetManager.fileManager, error); + if (!backingAsset) { + return nil; + } + + asset = [[ETCoreMLAsset alloc] initWithBackingAsset:backingAsset.value()]; + return ::get_model_from_asset(asset, configuration, metadata, error); } @end diff --git a/backends/apple/coreml/runtime/delegate/ETCoreMLModelManager.mm b/backends/apple/coreml/runtime/delegate/ETCoreMLModelManager.mm index c27b42566dc..f4cfd2146ac 100644 --- a/backends/apple/coreml/runtime/delegate/ETCoreMLModelManager.mm +++ b/backends/apple/coreml/runtime/delegate/ETCoreMLModelManager.mm @@ -345,10 +345,6 @@ void add_compute_unit(std::string& identifier, MLComputeUnits compute_units) { return [ETCoreMLModelDebugInfo modelDebugInfoFromData:file_data error:error]; } -NSString *raw_model_identifier(NSString *identifier) { - return [NSString stringWithFormat:@"raw_%@", identifier]; -} - #endif } //namespace @@ -412,7 +408,7 @@ - (nullable ETCoreMLAsset *)assetWithIdentifier:(NSString *)identifier { return modelAsset; } - __block NSError *localError = nil; + NSError *localError = nil; modelAsset = [self.assetManager assetWithIdentifier:identifier error:&localError]; if (localError) { ETCoreMLLogError(localError, @@ -424,9 +420,8 @@ - (nullable ETCoreMLAsset *)assetWithIdentifier:(NSString *)identifier { } - (nullable NSURL *)compiledModelURLWithIdentifier:(NSString *)identifier - modelURL:(nullable NSURL *)modelURL inMemoryFS:(const inmemoryfs::InMemoryFileSystem*)inMemoryFS - dstURL:(NSURL *)dstURL + assetManager:(ETCoreMLAssetManager *)assetManager error:(NSError * __autoreleasing *)error { auto modelAssetType = get_model_asset_type(inMemoryFS); if (!modelAssetType) { @@ -435,132 +430,78 @@ - (nullable NSURL *)compiledModelURLWithIdentifier:(NSString *)identifier "AOT blob is missing model file."); return nil; } - - // If modelURL is not provided, write model files to the destination directory (dstURL) - // and obtain a URL pointing to them. Otherwise, use the provided modelURL. - modelURL = (modelURL == nil) ? ::write_model_files(dstURL, self.fileManager, identifier, modelAssetType.value(), inMemoryFS, error) : modelURL; - if (!modelURL) { - // Failed to generate or locate model files, return nil. - return nil; - } - - // Handle based on the type of the model asset. + + NSURL *dstURL = [self.assetManager.trashDirectoryURL URLByAppendingPathComponent:[NSUUID UUID].UUIDString]; + NSURL *modelURL = ::write_model_files(dstURL, self.fileManager, identifier, modelAssetType.value(), inMemoryFS, error); switch (modelAssetType.value()) { case ModelAssetType::CompiledModel: { - // The model is already compiled; no further action needed. - // Return the existing model URL. + // Model is already compiled. return modelURL; } - + case ModelAssetType::Model: { - // The model is not compiled yet. - // Compile the model at the specified URL with a maximum wait time of 5 minutes. + // Compile the model. NSURL *compiledModelURL = [ETCoreMLModelCompiler compileModelAtURL:modelURL maxWaitTimeInSeconds:(5 * 60) error:error]; - // Return the URL of the compiled model or nil if compilation fails. + return compiledModelURL; } } } -- (nullable ETCoreMLAsset *)compiledModelAssetWithMetadata:(const ModelMetadata&)metadata - modelURL:(nullable NSURL *)modelURL - inMemoryFS:(const inmemoryfs::InMemoryFileSystem*)inMemoryFS - error:(NSError * __autoreleasing *)error { - NSString *identifier = @(metadata.identifier.c_str()); - __block ETCoreMLAsset *compiledModelAsset = [self assetWithIdentifier:identifier]; - if (compiledModelAsset) { - ETCoreMLLogInfo("Cache Hit: Successfully retrieved compiled model with identifier=%@ from the models cache.", identifier); - } else { - ETCoreMLLogInfo("Cache Miss: Compiled Model with identifier=%@ was not found in the models cache.", identifier); - } - - [self.assetManager withTemporaryDirectory:^(NSURL * _Nonnull directoryURL) { - if (compiledModelAsset) { - return; - } - - // The directory specified by `directoryURL` is unique and will be automatically cleaned up - // once the enclosing block completes. - NSURL *compiledModelURL = [self compiledModelURLWithIdentifier:identifier - modelURL:modelURL - inMemoryFS:inMemoryFS - dstURL:directoryURL - error:error]; - if (compiledModelURL) { - // Move the compiled model to the asset manager to transfer ownership. - compiledModelAsset = [self.assetManager storeAssetAtURL:compiledModelURL withIdentifier:identifier error:error]; - } - }]; - - return compiledModelAsset; -} - #if ET_EVENT_TRACER_ENABLED -- (nullable ETCoreMLAsset *)modelAssetWithMetadata:(const ModelMetadata&)metadata - inMemoryFS:(const inmemoryfs::InMemoryFileSystem*)inMemoryFS - error:(NSError * __autoreleasing *)error { +- (nullable id)modelExecutorWithMetadata:(const ModelMetadata&)metadata + inMemoryFS:(const inmemoryfs::InMemoryFileSystem*)inMemoryFS + configuration:(MLModelConfiguration *)configuration + error:(NSError * __autoreleasing *)error { NSString *identifier = @(metadata.identifier.c_str()); - NSString *rawIdentifier = raw_model_identifier(identifier); - __block ETCoreMLAsset *modelAsset = [self assetWithIdentifier:rawIdentifier]; - if (modelAsset) { + // Otherwise try to retrieve the compiled asset. + ETCoreMLAsset *compiledModelAsset = [self assetWithIdentifier:identifier]; + if (compiledModelAsset) { ETCoreMLLogInfo("Cache Hit: Successfully retrieved model with identifier=%@ from the models cache.", identifier); } else { ETCoreMLLogInfo("Cache Miss: Model with identifier=%@ was not found in the models cache.", identifier); } - - [self.assetManager withTemporaryDirectory:^(NSURL * _Nonnull directoryURL) { - if (modelAsset) { - return; - } - - auto modelAssetType = get_model_asset_type(inMemoryFS); - if (modelAssetType != ModelAssetType::Model) { - return; - } - - // The directory specified by `directoryURL` is unique and will be automatically cleaned up - // once the enclosing block completes. - NSURL *modelURL = ::write_model_files(directoryURL, - self.fileManager, - identifier, - modelAssetType.value(), - inMemoryFS, - error); + + // Create a unique directory for writing model files. + NSURL *dstURL = [self.assetManager.trashDirectoryURL URLByAppendingPathComponent:[NSUUID UUID].UUIDString]; + auto modelAssetType = get_model_asset_type(inMemoryFS); + ETCoreMLAsset *modelAsset = nil; + // Write the model files. + if (modelAssetType == ModelAssetType::Model) { + NSURL *modelURL = ::write_model_files(dstURL, self.fileManager, identifier, modelAssetType.value(), inMemoryFS, error); if (modelURL) { - // Move the model to the asset manager to transfer ownership. - modelAsset = [self.assetManager storeAssetAtURL:modelURL withIdentifier:rawIdentifier error:error]; + modelAsset = make_asset(modelURL, + identifier, + self.fileManager, + error); } - }]; - - return modelAsset; -} - -- (nullable id)modelExecutorWithMetadata:(const ModelMetadata&)metadata - inMemoryFS:(const inmemoryfs::InMemoryFileSystem*)inMemoryFS - configuration:(MLModelConfiguration *)configuration - error:(NSError * __autoreleasing *)error { - NSError *localError = nil; - ETCoreMLAsset *modelAsset = [self modelAssetWithMetadata:metadata inMemoryFS:inMemoryFS error:&localError]; - if (localError) { - if (error) { - *error = localError; - } - - return nil; } - - ETCoreMLAsset *compiledModelAsset = [self compiledModelAssetWithMetadata:metadata - modelURL:modelAsset.contentURL - inMemoryFS:inMemoryFS - error:error]; + + if (!compiledModelAsset) { + // Compile the model. + NSURL *compiledModelURL = [self compiledModelURLWithIdentifier:identifier + inMemoryFS:inMemoryFS + assetManager:self.assetManager + error:error]; + compiledModelAsset = make_asset(compiledModelURL, + identifier, + self.fileManager, + error); + } + if (!compiledModelAsset) { return nil; } + + NSError *localError = nil; + ETCoreMLModelDebugInfo *debug_info = get_model_debug_info(inMemoryFS, &localError); + if (localError) { + ETCoreMLLogError(localError, "Failed to parse debug info file"); + } + - ETCoreMLModelDebugInfo *debug_info = get_model_debug_info(inMemoryFS, error); - // The analyzer requires both the raw (uncompiled) asset and the compiled model asset to perform analysis. return [[ETCoreMLModelAnalyzer alloc] initWithCompiledModelAsset:compiledModelAsset modelAsset:modelAsset modelDebugInfo:debug_info @@ -569,33 +510,41 @@ - (nullable ETCoreMLAsset *)modelAssetWithMetadata:(const ModelMetadata&)metadat assetManager:self.assetManager error:error]; } + #else - (nullable id)modelExecutorWithMetadata:(const ModelMetadata&)metadata inMemoryFS:(const inmemoryfs::InMemoryFileSystem*)inMemoryFS configuration:(MLModelConfiguration *)configuration error:(NSError * __autoreleasing *)error { - ETCoreMLAsset *compiledModelAsset = [self compiledModelAssetWithMetadata:metadata - modelURL:nil - inMemoryFS:inMemoryFS - error:error]; - if (!compiledModelAsset) { - return nil; + NSString *identifier = @(metadata.identifier.c_str()); + // Otherwise try to retrieve the compiled asset. + ETCoreMLAsset *asset = [self assetWithIdentifier:identifier]; + ETCoreMLModel *model = asset ? get_model_from_asset(asset, configuration, metadata, error) : nil; + if (model) { + ETCoreMLLogInfo("Cache Hit: Successfully retrieved model with identifier=%@ from the models cache.", identifier); + return [[ETCoreMLDefaultModelExecutor alloc] initWithModel:model]; } - - ETCoreMLModel *model = [ETCoreMLModelLoader loadModelWithContentsOfURL:compiledModelAsset.contentURL - configuration:configuration - metadata:metadata - assetManager:self.assetManager - error:error]; - if (!model) { + + ETCoreMLLogInfo("Cache Miss: Model with identifier=%@ was not found in the models cache.", identifier); + // Compile the model. + NSURL *compiledModelURL = [self compiledModelURLWithIdentifier:identifier + inMemoryFS:inMemoryFS + assetManager:self.assetManager + error:error]; + if (!compiledModelURL) { return nil; } - + + model = [ETCoreMLModelLoader loadModelWithContentsOfURL:compiledModelURL + configuration:configuration + metadata:metadata + assetManager:self.assetManager + error:error]; + return [[ETCoreMLDefaultModelExecutor alloc] initWithModel:model]; } #endif - - (nullable id)_modelExecutorWithAOTData:(NSData *)data configuration:(MLModelConfiguration *)configuration error:(NSError * __autoreleasing *)error { @@ -780,7 +729,6 @@ - (BOOL)executeModelWithHandle:(ModelHandle *)handle args.count); return result; } - NSError *localError = nil; @autoreleasepool { NSArray *inputs = [args subarrayWithRange:NSMakeRange(0, model.orderedInputNames.count)]; @@ -800,11 +748,11 @@ - (BOOL)executeModelWithHandle:(ModelHandle *)handle result = YES; } } - - if (localError && error) { - *error = localError; + if (!result) { + if (error) { + *error = localError; + } } - return result; } From 907fde59e50ef166edd9129d8e3fbbc0a6fbbe52 Mon Sep 17 00:00:00 2001 From: Scott Roy <161522778+metascroy@users.noreply.github.com> Date: Wed, 17 Sep 2025 17:13:11 -0700 Subject: [PATCH 023/395] Fix logging crash Differential Revision: D82666928 Pull Request resolved: https://github.com/pytorch/executorch/pull/14388 --- extension/apple/ExecuTorch/Exported/ExecuTorchLog.mm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/extension/apple/ExecuTorch/Exported/ExecuTorchLog.mm b/extension/apple/ExecuTorch/Exported/ExecuTorchLog.mm index 904647fee81..443a218134c 100644 --- a/extension/apple/ExecuTorch/Exported/ExecuTorchLog.mm +++ b/extension/apple/ExecuTorch/Exported/ExecuTorchLog.mm @@ -90,9 +90,9 @@ - (void)logWithLevel:(ExecuTorchLogLevel)level [self->_buffer addObject:@{ @"level" : @(level), @"timestamp" : @(timestamp), - @"filename" : filename, + @"filename" : filename ?: @"(null)", @"line" : @(line), - @"message" : message + @"message" : message ?: @"(null)" }]; }); for (id sink in sinks) { From 90ee3474f0070443d4c4c4ca0d88fb0503a98c3f Mon Sep 17 00:00:00 2001 From: Kimish Patel Date: Wed, 17 Sep 2025 17:51:07 -0700 Subject: [PATCH 024/395] Add selective_build.bzl file in prepartion for next PR Differential Revision: D82664665 Pull Request resolved: https://github.com/pytorch/executorch/pull/14387 --- kernels/prim_ops/selective_build.bzl | 59 ++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) create mode 100644 kernels/prim_ops/selective_build.bzl diff --git a/kernels/prim_ops/selective_build.bzl b/kernels/prim_ops/selective_build.bzl new file mode 100644 index 00000000000..a5c89147801 --- /dev/null +++ b/kernels/prim_ops/selective_build.bzl @@ -0,0 +1,59 @@ +load("@fbsource//xplat/executorch/build:runtime_wrapper.bzl", "runtime") + +def prim_ops_registry_selective(name, selected_prim_ops_header_target, aten_suffix="", **kwargs): + """ + Create a selective prim ops registry target. + + Args: + name: Name of the target to create + selected_prim_ops_header_target: Target that generates selected_prim_ops.h + aten_suffix: Suffix for aten mode (e.g. "_aten") + **kwargs: Additional arguments passed to runtime.cxx_library + """ + + target = "//executorch/kernels/prim_ops:prim_ops_sources" + header_target = "//executorch/kernels/prim_ops:selective_build_prim_ops.h" + source_name = "register_prim_ops.cpp" + header_name = "selective_build_prim_ops.h" + genrule_dep_name = name + "_register_prim_ops_srcs_copy" + runtime.genrule( + name = genrule_dep_name, + cmd = "cp -f $(location {})/{} $OUT/{} && cp -f $(location {})/{} $OUT/{} && cp -f $(location {selected_prim_ops_header_target})/selected_prim_ops.h $OUT/selected_prim_ops.h".format( + target, source_name, source_name, + header_target, header_name, header_name, + selected_prim_ops_header_target=selected_prim_ops_header_target + ), + outs = { + source_name: [source_name], + header_name: [header_name], + "selected_prim_ops.h": ["selected_prim_ops.h"] + }, + default_outs = ["."], + ) + runtime.cxx_library( + name = name, + srcs = [":" + genrule_dep_name + "[register_prim_ops.cpp]"], + exported_headers = { + "selective_build_prim_ops.h": ":" + genrule_dep_name + "[selective_build_prim_ops.h]", + "selected_prim_ops.h": ":" + genrule_dep_name + "[selected_prim_ops.h]" + }, + visibility = [ + "//executorch/...", + "@EXECUTORCH_CLIENTS", + ], + # @lint-ignore BUCKLINT link_whole, need this to register prim ops. + link_whole = True, + # prim ops are registered through a global table so the ctor needs to be allowed + compiler_flags = select({ + "DEFAULT": ["-Wno-global-constructors"], + "ovr_config//os:windows": [], + }) + ["-DET_PRIM_OPS_SELECTIVE_BUILD"], + deps = [ + "//executorch/kernels/prim_ops:et_copy_index" + aten_suffix, + "//executorch/kernels/prim_ops:et_view" + aten_suffix, + "//executorch/runtime/core:evalue" + aten_suffix, + "//executorch/runtime/kernel:operator_registry" + aten_suffix, + "//executorch/runtime/kernel:kernel_includes" + aten_suffix, + ], + **kwargs + ) From 5b99d4d1080145ab625d6ed903aff5aff29f6feb Mon Sep 17 00:00:00 2001 From: mcremon-meta <134334895+mcremon-meta@users.noreply.github.com> Date: Wed, 17 Sep 2025 21:04:21 -0700 Subject: [PATCH 025/395] Remove generic versions of aten ops and use portable instead Differential Revision: D82667318 Pull Request resolved: https://github.com/pytorch/executorch/pull/14389 --- backends/cadence/aot/TARGETS | 4 -- .../cadence/generic/operators/CMakeLists.txt | 8 +-- backends/cadence/generic/operators/op_add.cpp | 61 ------------------- .../generic/operators/op_embedding.cpp | 41 ------------- .../cadence/generic/operators/op_full.cpp | 50 --------------- .../generic/operators/op_view_copy.cpp | 29 --------- .../cadence/generic/operators/targets.bzl | 58 ------------------ 7 files changed, 4 insertions(+), 247 deletions(-) delete mode 100644 backends/cadence/generic/operators/op_add.cpp delete mode 100644 backends/cadence/generic/operators/op_embedding.cpp delete mode 100644 backends/cadence/generic/operators/op_full.cpp delete mode 100644 backends/cadence/generic/operators/op_view_copy.cpp diff --git a/backends/cadence/aot/TARGETS b/backends/cadence/aot/TARGETS index 16d88512b96..9b2bd087d8e 100644 --- a/backends/cadence/aot/TARGETS +++ b/backends/cadence/aot/TARGETS @@ -145,11 +145,7 @@ executorch_generated_lib( deps = [ "//executorch/backends/cadence/generic/kernels:cadence_kernels", # Individual operator targets instead of combined cadence_generic_ops - "//executorch/backends/cadence/generic/operators:op_add", - "//executorch/backends/cadence/generic/operators:op_embedding", - "//executorch/backends/cadence/generic/operators:op_full", "//executorch/backends/cadence/generic/operators:op_requantize_out", - "//executorch/backends/cadence/generic/operators:op_view_copy", "//executorch/backends/cadence/generic/operators:im2row_out", "//executorch/backends/cadence/generic/operators:dequantize_per_tensor", "//executorch/backends/cadence/generic/operators:quantize_per_tensor", diff --git a/backends/cadence/generic/operators/CMakeLists.txt b/backends/cadence/generic/operators/CMakeLists.txt index d88701007f9..b74ead7eddc 100644 --- a/backends/cadence/generic/operators/CMakeLists.txt +++ b/backends/cadence/generic/operators/CMakeLists.txt @@ -16,10 +16,6 @@ include(${EXECUTORCH_ROOT}/tools/cmake/Codegen.cmake) # ATen compliant ops that are needed to run this model. set(_aten_ops__srcs - "${CMAKE_CURRENT_SOURCE_DIR}/op_add.cpp" - "${CMAKE_CURRENT_SOURCE_DIR}/op_embedding.cpp" - "${CMAKE_CURRENT_SOURCE_DIR}/op_full.cpp" - "${CMAKE_CURRENT_SOURCE_DIR}/op_view_copy.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/util/activation_ops_util.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/util/copy_ops_util.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/util/broadcast_util.cpp" @@ -31,10 +27,13 @@ set(_aten_ops__srcs "${EXECUTORCH_ROOT}/kernels/portable/cpu/util/repeat_util.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/util/slice_util.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/pattern/unary_ufunc_realhbbf16_to_floathbf16.cpp" + "${EXECUTORCH_ROOT}/kernels/portable/cpu/op_add.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/op_bmm.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/op_cat.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/op_clone.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/op_div.cpp" + "${EXECUTORCH_ROOT}/kernels/portable/cpu/op_embedding.cpp" + "${EXECUTORCH_ROOT}/kernels/portable/cpu/op_full.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/op_hardtanh.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/op_max_pool2d_with_indices.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/op_mean.cpp" @@ -58,6 +57,7 @@ set(_aten_ops__srcs "${EXECUTORCH_ROOT}/kernels/portable/cpu/op_native_group_norm.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/op_sum.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/op_select_copy.cpp" + "${EXECUTORCH_ROOT}/kernels/portable/cpu/op_view_copy.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/util/dtype_util.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/util/normalization_ops_util.cpp" "${EXECUTORCH_ROOT}/kernels/portable/cpu/util/select_copy_util.cpp" diff --git a/backends/cadence/generic/operators/op_add.cpp b/backends/cadence/generic/operators/op_add.cpp deleted file mode 100644 index 89b67467605..00000000000 --- a/backends/cadence/generic/operators/op_add.cpp +++ /dev/null @@ -1,61 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -#include -#include -#include -#include - -namespace torch { -namespace executor { -namespace native { - -Tensor& add_out( - KernelRuntimeContext& ctx, - const Tensor& a, - const Tensor& b, - const Scalar& alpha, - Tensor& out) { - (void)ctx; - - ScalarType a_type = a.scalar_type(); - ScalarType b_type = b.scalar_type(); - ScalarType common_type = promoteTypes(a_type, b_type); - ScalarType out_type = out.scalar_type(); - - ET_CHECK_MSG(a_type == ScalarType::Float, "Input tensor not a float.\n"); - ET_CHECK_MSG(b_type == ScalarType::Float, "Input tensor not a float.\n"); - ET_CHECK_MSG(out_type == ScalarType::Float, "Output tensor not a float.\n"); - - ET_CHECK(canCast(common_type, out_type)); - - using CTYPE_A = float; - using CTYPE_B = float; - using CTYPE_IN = float; - using CTYPE_OUT = float; - CTYPE_IN alpha_val; - ET_EXTRACT_SCALAR(alpha, alpha_val); - - apply_binary_elementwise_fn( - [alpha_val](const CTYPE_A val_a, const CTYPE_B val_b) { - CTYPE_IN a_casted = static_cast(val_a); - CTYPE_IN b_casted = static_cast(val_b); - CTYPE_IN value = a_casted + alpha_val * b_casted; - - return static_cast(value); - }, - a, - b, - out); - - return out; -} - -} // namespace native -} // namespace executor -} // namespace torch diff --git a/backends/cadence/generic/operators/op_embedding.cpp b/backends/cadence/generic/operators/op_embedding.cpp deleted file mode 100644 index ce28789a156..00000000000 --- a/backends/cadence/generic/operators/op_embedding.cpp +++ /dev/null @@ -1,41 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -#include - -namespace torch { -namespace executor { -namespace native { - -using executorch::aten::Tensor; -using executorch::runtime::KernelRuntimeContext; - -void embedding_out( - KernelRuntimeContext& ctx, - const Tensor& weight, - const Tensor& indices, - int64_t padding_idx, - bool scale_grad_by_freq, - bool sparse, - Tensor& out) { - int64_t nbytes_per_entry = weight.size(1) * weight.element_size(); - const char* w_data = weight.const_data_ptr(); - char* out_data = out.mutable_data_ptr(); - const int64_t* indices_ptr = indices.const_data_ptr(); - - for (int i = 0, e = indices.numel(); i < e; i++) { - // memcpy(dest, src, nbytes); - memcpy( - out_data, w_data + nbytes_per_entry * indices_ptr[i], nbytes_per_entry); - out_data += nbytes_per_entry; - } -} - -} // namespace native -} // namespace executor -} // namespace torch diff --git a/backends/cadence/generic/operators/op_full.cpp b/backends/cadence/generic/operators/op_full.cpp deleted file mode 100644 index 21d5fc56299..00000000000 --- a/backends/cadence/generic/operators/op_full.cpp +++ /dev/null @@ -1,50 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -#include -#include - -namespace torch { -namespace executor { -namespace native { - -using executorch::aten::ScalarType; -using executorch::aten::Tensor; - -Tensor& full_out( - KernelRuntimeContext& ctx, - const IntArrayRef sizes, - const Scalar& fill_value, - Tensor& out) { - (void)ctx; - - ScalarType val_type = utils::get_scalar_dtype(fill_value); - ScalarType out_type = out.scalar_type(); - - Error err = resize_tensor(out, sizes); - ET_CHECK_MSG(err == Error::Ok, "Could not resize out"); - - ET_SWITCH_REAL_TYPES_AND(Bool, val_type, ctx, "full", CTYPE_VAL, [&] { - CTYPE_VAL val; - ET_EXTRACT_SCALAR(fill_value, val); - - ET_SWITCH_REAL_TYPES_AND(Bool, out_type, ctx, "full", CTYPE_OUT, [&] { - CTYPE_OUT val_casted = static_cast(val); - auto data_out = out.mutable_data_ptr(); - for (size_t i = 0; i < out.numel(); ++i) { - data_out[i] = val_casted; - } - }); - }); - - return out; -} - -} // namespace native -} // namespace executor -} // namespace torch diff --git a/backends/cadence/generic/operators/op_view_copy.cpp b/backends/cadence/generic/operators/op_view_copy.cpp deleted file mode 100644 index 162e9ee201b..00000000000 --- a/backends/cadence/generic/operators/op_view_copy.cpp +++ /dev/null @@ -1,29 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -#include - -namespace torch { -namespace executor { -namespace native { - -using executorch::aten::Tensor; -using executorch::runtime::KernelRuntimeContext; - -Tensor& view_copy_out( - KernelRuntimeContext& ctx, - const Tensor& input, - const IntArrayRef size, - Tensor& out) { - memcpy(out.mutable_data_ptr(), input.const_data_ptr(), input.nbytes()); - return out; -} - -} // namespace native -} // namespace executor -} // namespace torch diff --git a/backends/cadence/generic/operators/targets.bzl b/backends/cadence/generic/operators/targets.bzl index b3c305c9c02..193b43c2b6d 100644 --- a/backends/cadence/generic/operators/targets.bzl +++ b/backends/cadence/generic/operators/targets.bzl @@ -4,64 +4,6 @@ load("@fbsource//xplat/executorch/build:runtime_wrapper.bzl", "runtime") def define_common_targets(): # Individual operator targets with optimized dependencies - # Basic operators (need broadcast_util and scalar_utils) - runtime.cxx_library( - name = "op_add", - srcs = ["op_add.cpp"], - platforms = CXX, - deps = [ - "//executorch/kernels/portable/cpu/util:broadcast_util", - "//executorch/runtime/kernel:kernel_includes", - "//executorch/kernels/portable/cpu:scalar_utils", - ], - visibility = [ - "//executorch/backends/cadence/...", - "@EXECUTORCH_CLIENTS", - ], - ) - - runtime.cxx_library( - name = "op_full", - srcs = ["op_full.cpp"], - platforms = CXX, - deps = [ - "//executorch/runtime/kernel:kernel_includes", - "//executorch/kernels/portable/cpu:scalar_utils", - ], - visibility = [ - "//executorch/backends/cadence/...", - "@EXECUTORCH_CLIENTS", - ], - ) - - # Simple operators (only need kernel_includes) - runtime.cxx_library( - name = "op_embedding", - srcs = ["op_embedding.cpp"], - platforms = CXX, - deps = [ - "//executorch/runtime/kernel:kernel_includes", - ], - visibility = [ - "//executorch/backends/cadence/...", - "@EXECUTORCH_CLIENTS", - ], - ) - - runtime.cxx_library( - name = "op_view_copy", - srcs = ["op_view_copy.cpp"], - platforms = CXX, - deps = [ - "//executorch/runtime/kernel:kernel_includes", - ], - visibility = [ - "//executorch/backends/cadence/...", - "@EXECUTORCH_CLIENTS", - ], - ) - - # Operators that need the operators.h header and basic runtime runtime.cxx_library( name = "im2row_out", srcs = ["im2row_out.cpp"], From d43cde5a49d4fe0e06f09d702f42e2945f507468 Mon Sep 17 00:00:00 2001 From: Jacob Szwejbka Date: Wed, 17 Sep 2025 21:16:52 -0700 Subject: [PATCH 026/395] Add option in memory planning to put shared state on same location across entry points Differential Revision: D82250153 Pull Request resolved: https://github.com/pytorch/executorch/pull/14230 --- exir/emit/_emitter.py | 27 ++++- exir/memory_planning.py | 3 + exir/passes/memory_planning_pass.py | 155 +++++++++++++++++++++++++++- exir/program/_program.py | 17 ++- exir/tests/test_memory_planning.py | 52 ++++++++++ 5 files changed, 244 insertions(+), 10 deletions(-) diff --git a/exir/emit/_emitter.py b/exir/emit/_emitter.py index 6995f9f73a9..7701ca7b8ff 100644 --- a/exir/emit/_emitter.py +++ b/exir/emit/_emitter.py @@ -93,7 +93,8 @@ from executorch.exir.types import LeafValueSpec, ValueSpec from torch._subclasses.fake_tensor import FakeTensor -from torch.export.exported_program import ExportedProgram +from torch.export.exported_program import ExportedProgram, ExportGraphSignature +from torch.fx.node import Node from torch.utils import _pytree as pytree from typing_extensions import TypeAlias @@ -209,11 +210,11 @@ class _AbstractValue: ] -# pyre-ignore[13]: Attribute `node` is never initialized. class _Emitter(torch.fx.Interpreter): """An abstract interpreter (https://wiki.mozilla.org/Abstract_Interpretation) used to emit the given traced torch.fx.GraphModule to the flatbuffer schema.""" + # pyre-ignore[13]: Attribute `node` is never initialized. node: torch.fx.Node def __init__( @@ -1633,6 +1634,28 @@ def placeholder( # noqa: C901 if isinstance(target, str) and isinstance(spec, TensorSpec): fqn, is_mutable_buffer = self._find_fqn_for_placeholder(target, spec) + def _is_buffer(node: Node, graph_signature: ExportGraphSignature) -> bool: + """ + Check if the node is buffer according to the provided graph signature. + If it is one return its fqn as well + """ + if node.op == "placeholder": + if isinstance(node.target, str): + if node.target in graph_signature.inputs_to_buffers: + return True + return False + + # If the spec does not appear in the mutable section of the graph signature it still might + # overall be considered a mutable buffer if it has already been memory planned. This would + # suggest that the same abstract buffer is mutable in another entry point so we should + # compel it to be considered mutable in all entry points at emission just as the user did with + # memory planning. + is_mutable_buffer |= ( + _is_buffer(self.node, self.exported_program.graph_signature) + and spec.mem_id is not None + and spec.mem_offset is not None + ) + # If the placeholder has a constant_tag, it is external to the PTE file # and requires a fqn and location=TensorDataLocation.EXTERNAL if constant_tag is not None: diff --git a/exir/memory_planning.py b/exir/memory_planning.py index e08d3e55772..0394ed9c529 100644 --- a/exir/memory_planning.py +++ b/exir/memory_planning.py @@ -245,6 +245,8 @@ def verify_graph_input_output(self) -> None: assert len(specs) > 0, "Expect tensor specs" specs = list(filter(lambda spec: not spec.const, specs)) if len(specs) == 0: + # all outputs are const so no need to allocate memory just say we suceeded + graph_output_allocated = self.alloc_graph_output continue allocated = any( spec is None or spec.mem_offset is not None for spec in specs @@ -408,6 +410,7 @@ def collect_specs_from_nodes( # noqa: C901 ignore_graph_input: bool = False, ignore_graph_output: bool = False, ignore_mutable_buffers: bool = False, + share_mutable_buffers: bool = False, ignore_const: bool = True, ignore_out_var_node: bool = True, dedup: bool = True, diff --git a/exir/passes/memory_planning_pass.py b/exir/passes/memory_planning_pass.py index 9bd4ab20bf5..2636b61780c 100644 --- a/exir/passes/memory_planning_pass.py +++ b/exir/passes/memory_planning_pass.py @@ -4,10 +4,12 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +import itertools import logging import warnings +from dataclasses import dataclass, field from functools import partial -from typing import Any, Callable, List, Optional +from typing import Any, Callable, Dict, List, Optional, Set, Tuple import torch from executorch.exir._warnings import deprecated @@ -16,14 +18,18 @@ from executorch.exir.memory_planning import ( _is_out_var_node, apply_algo, + collect_specs_from_nodes, + filter_nodes, get_node_tensor_specs, MemoryPlanningAlgorithmSuite, Verifier, ) from executorch.exir.operator.convert import get_out_args_from_opoverload from executorch.exir.pass_base import PassBase, PassResult -from executorch.exir.tensor import ALIGNMENT +from executorch.exir.tensor import ALIGNMENT, TensorSpec +from torch import fx from torch.export.exported_program import ExportGraphSignature +from torch.fx import Node # copied from https://stackoverflow.com/questions/75582932/python-how-can-i-print-the-function-name-of-a-partial-function @@ -37,6 +43,106 @@ def _callable_name(any_callable: Callable[..., Any]) -> str: return str(any_callable) +def _is_buffer( + node: Node, graph_signature: ExportGraphSignature +) -> Tuple[bool, Optional[str]]: + """ + Check if the node is buffer according to the provided graph signature. + If it is one return its fqn as well + """ + if node.op == "placeholder": + if isinstance(node.target, str): + if node.target in graph_signature.inputs_to_buffers: + fqn = graph_signature.inputs_to_buffers[node.target] + return (True, fqn) + return (False, None) + + +def _is_mutable_buffer( + node: Node, graph_signature: ExportGraphSignature +) -> Tuple[bool, Optional[str]]: + """ + Check if the node is mutable buffer according to the provided graph signature. + If it is one return its fqn as well + """ + if node.op == "placeholder": + if isinstance(node.target, str): + if node.target in graph_signature.inputs_to_buffers: + fqn = graph_signature.inputs_to_buffers[node.target] + # if the buffer is mutated then record that + if fqn in graph_signature.buffers_to_mutate.values(): + return True, fqn + return False, None + + +def _get_spec_from_node(node: fx.Node) -> TensorSpec: + specs = get_node_tensor_specs(node) + return specs[0] + + +def _insert_mutable_buffer_specs( + state: "_MemoryPlanningState", gm: torch.fx.GraphModule, gs: ExportGraphSignature +): + for node in gm.graph.nodes: + is_mutable, fqn = _is_mutable_buffer(node, gs) + if is_mutable: + assert fqn + spec = _get_spec_from_node(node) + if ( + getattr(spec, "mem_id", None) is not None + or getattr(spec, "mem_offset", None) is not None + ): + raise ValueError( + "Cannot share mutable buffers if they already have a mem_id or mem_offset assigned" + ) + if fqn not in state.mutable_buffers.keys(): + state.mutable_buffers[fqn] = set() + state.mutable_buffers[fqn].add(spec) + continue + is_buffer, fqn = _is_buffer(node, gs) + # If it is not a mutable buffer it might just appear to be a buffer in this entry point. Think model.get_state() + # So cache it and later double check that this buffer never appears mutable + if is_buffer: + assert fqn + spec = _get_spec_from_node(node) + if ( + getattr(spec, "mem_id", None) is not None + or getattr(spec, "mem_offset", None) is not None + ): + raise ValueError( + "Cannot share mutable buffers if they already have a mem_id or mem_offset assigned" + ) + if fqn not in state.maybe_mutable_buffers.keys(): + state.maybe_mutable_buffers[fqn] = set() + state.maybe_mutable_buffers[fqn].add(spec) + + +def _check_default_mem_ids(gm: torch.fx.GraphModule): + for node in gm.graph.nodes: + for spec in collect_specs_from_nodes( + filter_nodes(itertools.chain([node], node.args, node.kwargs.values())), + None, + ignore_graph_input=False, + ignore_const=False, + ignore_out_var_node=False, + dedup=False, + do_assertion=False, + ignore_dynamic_unbound_tensor=False, + ): + mem_id = getattr(spec, "mem_id", None) + if mem_id is not None and mem_id != 1: + raise ValueError( + "Cannot share mutable buffers if all other tensors are not on the default mem_id of 1" + ) + + +@dataclass +class _MemoryPlanningState: + mutable_buffers: Dict[str, Set[TensorSpec]] = field(default_factory=dict) + maybe_mutable_buffers: Dict[str, Set[TensorSpec]] = field(default_factory=dict) + graph_modules: List[torch.fx.GraphModule] = field(default_factory=list) + + class MemoryPlanningPass(PassBase): def __init__( self, @@ -45,6 +151,7 @@ def __init__( alloc_graph_input: bool = True, alloc_graph_output: bool = True, alloc_mutable_buffers: bool = True, + share_mutable_buffers: bool = False, alignment: int = ALIGNMENT, ) -> None: r""" @@ -55,12 +162,18 @@ def __init__( """ if memory_planning_algo is None: memory_planning_algo = MemoryPlanningAlgorithmSuite() + if share_mutable_buffers and not alloc_mutable_buffers: + raise ValueError( + "share_mutable_buffers is only meaningful when alloc_mutable_buffers is True" + ) self.memory_planning_algo: Callable[..., List[int]] = memory_planning_algo self.allow_lifetime_and_storage_overlap = allow_lifetime_and_storage_overlap self.alloc_graph_input = alloc_graph_input self.alloc_graph_output = alloc_graph_output self.alloc_mutable_buffers = alloc_mutable_buffers + self.share_mutable_buffers = share_mutable_buffers self.alignment = alignment + self.state = _MemoryPlanningState() def _set_alloc_node_spec(self, graph_module: torch.fx.GraphModule) -> None: """ @@ -134,9 +247,17 @@ def run( graph_signature, self.alloc_graph_input, self.alloc_graph_output, - self.alloc_mutable_buffers, + # If we are sharing the mutable buffers then do not allocate them in + # memory planning algo, instead collect all of the specs over all the entry + # points and then allocate them directly in the run_multimethod name call + self.alloc_mutable_buffers and not self.share_mutable_buffers, ) + if self.share_mutable_buffers and graph_signature is not None: + self.state.graph_modules.append(graph_module) + _check_default_mem_ids(graph_module) + _insert_mutable_buffer_specs(self.state, graph_module, graph_signature) + # TODO: make the verifier do the work recursively to handle # control flow verifier = Verifier( @@ -164,3 +285,31 @@ def run( # I dont know if that is a valid thing but if it is we should adjust verify_storage_reuse function verifier.verify_storage_reuse() return PassResult(graph_module, True) + + def run_multimethod(self): + "Resolve any memory planning done across entry points" + if self.share_mutable_buffers: + arena: int = 0 + + # Every spec that shares an fqn is the same tensor! So we give it the same id and offset + # anywhere it appears. + for fqn, specs_set in self.state.mutable_buffers.items(): + specs = list(specs_set) + # If the same buffer appears in mutable and maybe mutable then we know it is in fact mutable. + if fqn in self.state.maybe_mutable_buffers.keys(): + specs.extend(self.state.maybe_mutable_buffers[fqn]) + for spec in specs: + # Assume a default memory planning placed all activations on 1, place shared state on 2. + spec.mem_id = 2 + spec.realign(self.alignment) + # State is persistent, so the memory never overlaps. + spec.mem_offset = arena + # They should all be the same size since they are the same tensor, so just bump off the first. + arena += specs[0].allocated_memory + + for graph_module in self.state.graph_modules: + if len(graph_module.meta["non_const_buffer_sizes"]) != 2: + raise ValueError( + "Cannot share mutable state if not using default memory ids" + ) + graph_module.meta["non_const_buffer_sizes"].append(arena) diff --git a/exir/program/_program.py b/exir/program/_program.py index f3d9eef9221..a33d715ca3b 100644 --- a/exir/program/_program.py +++ b/exir/program/_program.py @@ -1681,7 +1681,7 @@ def to_backend( return epm @et_logger("to_executorch") - def to_executorch( + def to_executorch( # noqa (FLAKE8) C901 self, config: Optional[ExecutorchBackendConfig] = None, ) -> "ExecutorchProgramManager": @@ -1745,11 +1745,9 @@ def to_executorch( memory_planning_pass = config.memory_planning_pass # TODO(jakeszwe): Follow up with compiler on if the deepcopy is necessary and if so how to make it work if hasattr(memory_planning_pass, "run"): - new_gm_res = memory_planning_pass.run( # pyre-ignore[16] - new_gm, new_signature - ) + new_gm_res = memory_planning_pass.run(new_gm, new_signature) else: - new_gm_res = memory_planning_pass(new_gm) # pyre-ignore[29] + new_gm_res = memory_planning_pass(new_gm) # WARNING: DO NOT ADD ANY MORE PASSES AFTER MEMORY PLANNING PASS. # THERE ARE A LOT OF ASSUMPTIONS IN THE STACK THAT MEMORY PLANNING IS THE LAST PASS BEFORE THE EMITTER. @@ -1758,6 +1756,15 @@ def to_executorch( _copy_module(program.graph_module, new_gm) execution_programs[name] = program + # After running memory planning on all entry points we can run the cross entry point memory planning + if isinstance(config.memory_planning_pass, dict): + for memory_planning_pass in config.memory_planning_pass.values(): + if hasattr(memory_planning_pass, "run_multimethod"): + memory_planning_pass.run_multimethod() + else: + memory_planning_pass = config.memory_planning_pass + if hasattr(memory_planning_pass, "run_multimethod"): + memory_planning_pass.run_multimethod() et_pm = ExecutorchProgramManager( execution_programs, diff --git a/exir/tests/test_memory_planning.py b/exir/tests/test_memory_planning.py index 426cc54dc66..ce20de8f820 100644 --- a/exir/tests/test_memory_planning.py +++ b/exir/tests/test_memory_planning.py @@ -14,6 +14,7 @@ import torch from executorch.exir import ExecutorchBackendConfig, to_edge +from executorch.exir.capture._capture import patch_forward from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.memory_planning import ( _do_user_inputs_exist, @@ -93,6 +94,24 @@ def get_random_inputs(self) -> Tuple[torch.Tensor, ...]: return (torch.randn(10), torch.randn(10)) +class MultiEntryPointStatefulModel(torch.nn.Module): + def __init__(self) -> None: + super().__init__() + self.register_buffer("state", torch.zeros(2, 2)) + + def forward(self, x: torch.Tensor) -> torch.Tensor: + return self.state.add_(x).view(-1) * 2 + + def set_state(self, state: torch.Tensor) -> None: + self.state.copy_(state) + + def get_state(self) -> torch.Tensor: + return self.state + + def get_example_inputs(self) -> Tuple[torch.Tensor, ...]: + return (torch.ones(1),) + + class ModelWithDifferentTensorSizes(torch.nn.Module): def __init__(self) -> None: super(ModelWithDifferentTensorSizes, self).__init__() @@ -1081,3 +1100,36 @@ def test_multi_map(self) -> None: verifier.storage_overlap(outer_spec, inner_spec), f"Outer spec {outer_spec.shape=} {outer_spec.dtype=} {outer_spec.lifetime=} and inner spec {inner_spec} have storage overlap", ) + + def test_multi_state_plan(self) -> None: + eager_module = MultiEntryPointStatefulModel().eval() + forward = export(eager_module, eager_module.get_example_inputs()) + with patch_forward(eager_module, eager_module.get_state): + get_state = export(eager_module, ()) + with patch_forward(eager_module, eager_module.set_state): + set_state = export(eager_module, (torch.zeros(1),)) + edge = to_edge( + {"forward": forward, "set_state": set_state, "get_state": get_state} + ) + et = edge.to_executorch( + ExecutorchBackendConfig( + memory_planning_pass=MemoryPlanningPass(share_mutable_buffers=True), + emit_mutable_buffer_names=True, + ) + ) + et_prog = et.executorch_program + count = 0 + for plan in et_prog.execution_plan: + for value in plan.values: + if ( + hasattr(value.val, "allocation_info") + and value.val.allocation_info is not None + and value.val.allocation_info.memory_id == 2 + ): + count += 1 + self.assertEqual(value.val.allocation_info.memory_offset_low, 0) + self.assertTrue(value.val.extra_tensor_info is not None) + self.assertEqual( + value.val.extra_tensor_info.fully_qualified_name, "state" + ) + self.assertEqual(count, 3) From d5dff72aea986836b62b9588eecd1ba45427e766 Mon Sep 17 00:00:00 2001 From: Sebastian Larsson <38941629+Sebastian-Larsson@users.noreply.github.com> Date: Thu, 18 Sep 2025 15:51:23 +0200 Subject: [PATCH 027/395] Arm backend: Add docstrings to tosa/mapping.py (#14374) Signed-off-by: Sebastian Larsson --- backends/arm/tosa/mapping.py | 88 +++++++++++++++++++++++++++++++++--- 1 file changed, 82 insertions(+), 6 deletions(-) diff --git a/backends/arm/tosa/mapping.py b/backends/arm/tosa/mapping.py index 60ef98a37c0..a36b4cf3ebc 100644 --- a/backends/arm/tosa/mapping.py +++ b/backends/arm/tosa/mapping.py @@ -4,12 +4,12 @@ # LICENSE file in the root directory of this source tree. # pyre-unsafe +"""Provide PyTorch-to-TOSA mapping helpers. -# -# PyTorch to Tosa mapping - simple mapping functions and multi-type extraction -# of key information. These are used by the initial compile stage which captures -# the standardised TOSA representation. -# +Use these utilities to translate PyTorch dtypes and FX node metadata into +the TOSA serializer types and shapes used during initial compilation. + +""" from typing import Any, Optional, Sequence @@ -32,6 +32,19 @@ def map_dtype(data_type: torch.dtype, tosa_spec: TosaSpecification) -> Any: + """Map a ``torch.dtype`` to a ``ts.DType``. + + Args: + data_type (torch.dtype): PyTorch dtype to convert. + tosa_spec (TosaSpecification): Active spec (reserved for future checks). + + Returns: + Any: Matching ``ts.DType`` enum value. + + Raises: + ValueError: If the dtype is unsupported or unknown. + + """ if data_type in UNSUPPORTED_DTYPES: raise ValueError(f"Unsupported type: {data_type}") @@ -57,6 +70,20 @@ def map_dtype(data_type: torch.dtype, tosa_spec: TosaSpecification) -> Any: # TODO: other types, can be # SymInt, FakeTensor, a List[Union[FakeTensor, SymInt]], or None def extract_tensor_meta(meta, tosa_spec: TosaSpecification): + """Extract dtype, shape, and dimension order from FX metadata. + + Args: + meta (dict): FX node ``meta`` containing a ``val`` FakeTensor (or tuple). + tosa_spec (TosaSpecification): Active TOSA spec for dtype mapping. + + Returns: + tuple: ``(dtype, shape, dim_order)`` where ``dtype`` is ``ts.DType``, + ``shape`` is ``Tuple[int, ...]``, and ``dim_order`` is ``Tuple[int, ...]``. + + Raises: + ValueError: If ``meta['val']`` is not a ``FakeTensor``. + + """ assert meta.get("val") is not None val = meta["val"] if type(val) is tuple: @@ -77,23 +104,66 @@ def extract_tensor_meta(meta, tosa_spec: TosaSpecification): return (dtype, shape, dim_order) -# Class to capture arguments and turn into tensor references for TOSA OPs class TosaArg: + """Capture and normalize TOSA operator arguments. + + Use this to convert FX nodes, sequences, and numeric literals into a + consistent structure suitable for TOSA serialization. + + Attributes: + name (str): Node name when argument is a ``torch.fx.Node``; empty otherwise. + dtype (ts.DType | None): Inferred dtype when available. + shape (tuple[int, ...] | None): Inferred shape when available. + dim_order (tuple[int, ...] | None): Dimension order, defaulting to ``range(len(shape))``. + special (list | None): Captured list when the argument is a sequence. + number (float | int | None): Captured numeric value when given. + tosa_spec (TosaSpecification): Active specification used for mapping. + + """ + def __process_node(self, argument: torch.fx.Node): + """Parse a ``torch.fx.Node`` and populate tensor attributes. + + Args: + argument (torch.fx.Node): FX node to inspect. + + """ self.name: str = argument.name self.dtype, self.shape, self.dim_order = extract_tensor_meta( argument.meta, self.tosa_spec ) def __process_list(self, argument): + """Capture a sequence argument as ``special``. + + Args: + argument (Sequence): Sequence to store. + + """ self.special: list = list(argument) def __process_number(self, argument: float | int): + """Capture a numeric argument as ``number``. + + Args: + argument (float | int): Numeric value. + + """ self.number: float | int = argument def __init__( self, argument: Any, tosa_spec: Optional[TosaSpecification] = None ) -> None: + """Initialize the argument wrapper and populate fields. + + Args: + argument (Any): One of ``torch.fx.Node``, ``Sequence``, ``int``, ``float``, ``torch.dtype``, or ``None``. + tosa_spec (Optional[TosaSpecification]): Active specification; required. + + Raises: + RuntimeError: If ``argument`` is of an unsupported type. + + """ if tosa_spec is None: raise ValueError("tosa_spec is None") elif not isinstance(tosa_spec, TosaSpecification): @@ -127,6 +197,12 @@ def __init__( ) def __repr__(self): + """Return a compact representation of populated attributes. + + Returns: + str: Readable list of set attributes. + + """ attrs = [] if hasattr(self, "name"): if self.name is not None: From 62c4c77d494d4806615ea369dddcf09e2911d90d Mon Sep 17 00:00:00 2001 From: Sebastian Larsson <38941629+Sebastian-Larsson@users.noreply.github.com> Date: Thu, 18 Sep 2025 15:52:15 +0200 Subject: [PATCH 028/395] Arm backend: Add docstrings to init and arm_quantizer_utils in quantizer (#14375) Signed-off-by: Sebastian Larsson --- backends/arm/quantizer/__init__.py | 5 +++ backends/arm/quantizer/arm_quantizer_utils.py | 38 +++++++++++++++---- 2 files changed, 36 insertions(+), 7 deletions(-) diff --git a/backends/arm/quantizer/__init__.py b/backends/arm/quantizer/__init__.py index 5cb5c834a98..e36c683416a 100644 --- a/backends/arm/quantizer/__init__.py +++ b/backends/arm/quantizer/__init__.py @@ -2,7 +2,12 @@ # # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +"""Expose quantizer APIs and load optional quantized kernels. +Import the public quantizer classes and configuration helpers for Arm +backends. Attempt to load portable and quantized libraries; fall back to a +log message if unavailable. +""" from .quantization_config import QuantizationConfig # noqa # usort: skip from .arm_quantizer import ( # noqa diff --git a/backends/arm/quantizer/arm_quantizer_utils.py b/backends/arm/quantizer/arm_quantizer_utils.py index 838dd44733e..90876386aa6 100644 --- a/backends/arm/quantizer/arm_quantizer_utils.py +++ b/backends/arm/quantizer/arm_quantizer_utils.py @@ -6,10 +6,12 @@ # LICENSE file in the root directory of this source tree. # pyre-unsafe +"""Provide utilities for quantization annotations. -# -# Utility functions for TOSAQuantizer -# +Use these helpers to check and mark annotation state when working with +``QuantizationAnnotation`` entries in FX node metadata. + +""" from typing import cast @@ -20,7 +22,15 @@ def is_annotated(node: Node) -> bool: - """Given a node return whether the node is annotated.""" + """Return True if the node is annotated. + + Args: + node (Node): FX node to inspect. + + Returns: + bool: True if ``Q_ANNOTATION_KEY`` exists and ``_annotated`` is set. + + """ return ( Q_ANNOTATION_KEY in node.meta and cast(QuantizationAnnotation, node.meta[Q_ANNOTATION_KEY])._annotated @@ -28,7 +38,15 @@ def is_annotated(node: Node) -> bool: def is_output_annotated(node: Node) -> bool: - """Given a node, return whether the output of the node is annotated.""" + """Return True if the node's output is annotated. + + Args: + node (Node): FX node to inspect. + + Returns: + bool: True if annotated and an output qspec is present. + + """ if Q_ANNOTATION_KEY in node.meta: annotation = cast(QuantizationAnnotation, node.meta[Q_ANNOTATION_KEY]) return annotation._annotated and annotation.output_qspec is not None @@ -37,8 +55,14 @@ def is_output_annotated(node: Node) -> bool: def mark_node_as_annotated(node: Node) -> None: - """Marks node as annotated. If needed, an empty QuantizationAnnotation is added - to the quantization_annotation node meta entry. + """Mark a node as annotated. + + Create an empty ``QuantizationAnnotation`` on the node when missing and set + its ``_annotated`` flag to True. + + Args: + node (Node): FX node to update. + """ if Q_ANNOTATION_KEY not in node.meta: node.meta[Q_ANNOTATION_KEY] = QuantizationAnnotation() From 4e732cb0e3c53d5d1c450e63a5280795436a83a4 Mon Sep 17 00:00:00 2001 From: Sebastian Larsson <38941629+Sebastian-Larsson@users.noreply.github.com> Date: Thu, 18 Sep 2025 15:55:53 +0200 Subject: [PATCH 029/395] Arm backend: Remove sin_cos_support.py (#14370) The operator_support check always approved sin and cos operators, which means they can be moved to the supported operator lists in tosa_profile_supported_op_lists.py. Signed-off-by: Sebastian Larsson --- backends/arm/operator_support/__init__.py | 1 - .../arm/operator_support/sin_cos_support.py | 31 ------------------- .../tosa_profile_supported_op_lists.py | 10 ++++++ 3 files changed, 10 insertions(+), 32 deletions(-) delete mode 100644 backends/arm/operator_support/sin_cos_support.py diff --git a/backends/arm/operator_support/__init__.py b/backends/arm/operator_support/__init__.py index 7b73cddad37..fbc8801161f 100644 --- a/backends/arm/operator_support/__init__.py +++ b/backends/arm/operator_support/__init__.py @@ -16,7 +16,6 @@ pool_2d_support, reduce_sum_support, right_shift_support, - sin_cos_support, slice_copy_support, to_dim_order_copy_support, tosa_supported_operators, diff --git a/backends/arm/operator_support/sin_cos_support.py b/backends/arm/operator_support/sin_cos_support.py deleted file mode 100644 index dcdc20f8e4a..00000000000 --- a/backends/arm/operator_support/sin_cos_support.py +++ /dev/null @@ -1,31 +0,0 @@ -# Copyright 2025 Arm Limited and/or its affiliates. -# -# This source code is licensed under the BSD-style license found in the -# LICENSE file in the root directory of this source tree. - -# pyre-unsafe - - -import torch.fx as fx -from executorch.backends.arm.operator_support.tosa_supported_operators import ( - register_tosa_support_check, - SupportedTOSAOperatorCheck, -) -from executorch.backends.arm.tosa import TosaSpecification -from executorch.exir.dialects._ops import ops as exir_ops - - -@register_tosa_support_check -class SinCosSupported(SupportedTOSAOperatorCheck): - targets = [ - exir_ops.edge.aten.cos.default, - exir_ops.edge.aten.sin.default, - ] - - tosa_specs = [ - TosaSpecification.create_from_string("TOSA-1.0+INT"), - TosaSpecification.create_from_string("TOSA-1.0+FP"), - ] - - def is_node_tosa_supported(self, node: fx.Node, tosa_spec: TosaSpecification): - return True diff --git a/backends/arm/operator_support/tosa_profile_supported_op_lists.py b/backends/arm/operator_support/tosa_profile_supported_op_lists.py index d3207c65dff..9820fbd05d5 100644 --- a/backends/arm/operator_support/tosa_profile_supported_op_lists.py +++ b/backends/arm/operator_support/tosa_profile_supported_op_lists.py @@ -2,6 +2,12 @@ # # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +"""Define TOSA profile support lists for INT and FP. + +Expose static sets of EXIR operator overloads used by the TOSA partitioner to +seed positive support checks for different profiles. + +""" import operator from typing import Final, Set @@ -24,6 +30,7 @@ exir_ops.edge.aten.bitwise_and.Scalar, exir_ops.edge.aten.bitwise_or.Scalar, exir_ops.edge.aten.bitwise_xor.Scalar, + exir_ops.edge.aten.cos.default, exir_ops.edge.aten.logical_and.default, exir_ops.edge.aten.logical_or.default, exir_ops.edge.aten.logical_xor.default, @@ -113,6 +120,7 @@ torch.ops.aten.scalar_tensor.default, exir_ops.edge.aten.gelu.default, exir_ops.edge.aten.alias_copy.default, + exir_ops.edge.aten.sin.default, exir_ops.edge.aten.sinh.default, exir_ops.edge.aten.atan.default, exir_ops.edge.aten.acosh.default, @@ -147,6 +155,7 @@ exir_ops.edge.aten.cat.default, exir_ops.edge.aten.ceil.default, exir_ops.edge.aten.clamp.default, + exir_ops.edge.aten.cos.default, exir_ops.edge.aten.cumsum.default, exir_ops.edge.aten.bmm.default, exir_ops.edge.aten.permute_copy.default, @@ -223,6 +232,7 @@ torch.ops.aten.scalar_tensor.default, exir_ops.edge.aten.gelu.default, exir_ops.edge.aten.alias_copy.default, + exir_ops.edge.aten.sin.default, exir_ops.edge.aten.sinh.default, exir_ops.edge.aten.atan.default, exir_ops.edge.aten.acosh.default, From 654e722acffb5dc8a4965cb12cc540994f188891 Mon Sep 17 00:00:00 2001 From: Sebastian Larsson <38941629+Sebastian-Larsson@users.noreply.github.com> Date: Thu, 18 Sep 2025 16:02:21 +0200 Subject: [PATCH 030/395] Arm backend: Replace asserts/raises with reporter rejects (#14371) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - embedding_support: replace input-count assert with reporter.report_reject + return False - index_tensor_support: add explicit rejects for None in indices, rank >= 4 indexing tensors, and int32 overflow of value tensor; previously returned False without explanation - minmax_support: add reject when min/max.dim’s argmax output is used - ethos_u55_support: replace IndexError raises in view/select checks (invalid dim/index) with reporter.report_reject + return False - Improves partition diagnostics and avoids hard crashes Signed-off-by: Sebastian Larsson --- .../arm/operator_support/embedding_support.py | 15 ++++++++++----- .../arm/operator_support/ethos_u55_support.py | 14 ++++++++------ .../arm/operator_support/index_tensor_support.py | 15 +++++++++++++++ backends/arm/operator_support/minmax_support.py | 7 +++++++ 4 files changed, 40 insertions(+), 11 deletions(-) diff --git a/backends/arm/operator_support/embedding_support.py b/backends/arm/operator_support/embedding_support.py index bf95014e575..24395d56cbf 100644 --- a/backends/arm/operator_support/embedding_support.py +++ b/backends/arm/operator_support/embedding_support.py @@ -27,11 +27,16 @@ class EmbeddingSupported(SupportedTOSAOperatorCheck): def is_node_tosa_supported( self, node: fx.Node, tosa_spec: TosaSpecification ) -> bool: # type: ignore[override, misc] - # Note aten.embedding.default requires int64 indices and TOSA does not support it. - # Int32 indices here for aten.embedding.default is ok since it will be decomposed into ops that can handle it. - assert ( - len(node.all_input_nodes) == 2 - ), "Number of inputs to aten.embedding is not 2" + # Note aten.embedding.default requires int64 indices and TOSA does not + # support it. Int32 indices here for aten.embedding.default is ok since + # it will be decomposed into ops that can handle it. + + if len(node.all_input_nodes) != 2: + self.reporter.report_reject( + node, + (f"Expected exactly two input nodes, got {len(node.all_input_nodes)}"), + ) + return False indices_val = node.all_input_nodes[1].meta["val"] indices_dtype = indices_val.dtype diff --git a/backends/arm/operator_support/ethos_u55_support.py b/backends/arm/operator_support/ethos_u55_support.py index bf9e29d5cb7..2e9bd846045 100644 --- a/backends/arm/operator_support/ethos_u55_support.py +++ b/backends/arm/operator_support/ethos_u55_support.py @@ -236,18 +236,20 @@ def is_node_supported( shape = input_node.meta["val"].shape rank = len(shape) if not -rank <= dim < rank: - raise IndexError( - f"Dim {dim} is outside of the range for tensor '{node.target}' of " - f"rank {rank}" + self.reporter.report_reject( + node, + (f"Dimension {dim} out of range for rank {rank}."), ) + return False dim = dim % rank size = shape[dim] if not -size <= index < size: - raise IndexError( - f"Index {index} is outside of the range for dim {dim} with size " - f"{size} for tensor {node.target}" + self.reporter.report_reject( + node, + (f"Index {index} out of range for dim {dim} with size {size}."), ) + return False index = index % size # Shape after squeeze. This may get converted into a view which may become diff --git a/backends/arm/operator_support/index_tensor_support.py b/backends/arm/operator_support/index_tensor_support.py index 4b226a9c407..25bc79ea938 100644 --- a/backends/arm/operator_support/index_tensor_support.py +++ b/backends/arm/operator_support/index_tensor_support.py @@ -111,16 +111,31 @@ def is_node_tosa_supported( for index in indices: # type: ignore[union-attr] # Usage 2 guard if index is None: + self.reporter.report_reject( + node, + ( + "None (from slice/unsqueeze/ellipsis) before an indexing tensor" + " is not supported." + ), + ) return False # Usage 1 guard fake_tensor = get_first_fake_tensor(index) # type: ignore[arg-type] if len(fake_tensor.size()) > 3: + self.reporter.report_reject( + node, + ("Indexing tensors of rank >= 4 is not supported."), + ) return False # Usage 3 guard total_vals = math.prod(get_first_fake_tensor(node.args[0]).shape) # type: ignore[arg-type] if total_vals > torch.iinfo(torch.int32).max: + self.reporter.report_reject( + node, + ("Value size exceeds int32 range; would overflow flattened indexing."), + ) return False return True diff --git a/backends/arm/operator_support/minmax_support.py b/backends/arm/operator_support/minmax_support.py index edbf7f61818..68433819f4b 100644 --- a/backends/arm/operator_support/minmax_support.py +++ b/backends/arm/operator_support/minmax_support.py @@ -32,6 +32,13 @@ def is_node_tosa_supported(self, node: fx.Node, tosa_spec: TosaSpecification): ) if not (no_argmax or no_argmax_users): + self.reporter.report_reject( + node, + ( + "Using the indices output is not supported; only usage of the " + "values output is supported." + ), + ) return False return True From a1ed4edcd2ad764a3893c0a6003abab3e82c34f6 Mon Sep 17 00:00:00 2001 From: pytorchbot Date: Thu, 18 Sep 2025 13:50:44 -0400 Subject: [PATCH 031/395] Move selective_build.bzl to shim_et (#14406) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: https://github.com/pytorch/executorch/pull/14405 by @kimishpatel ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/198/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/198/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/198/orig @diff-train-skip-merge Co-authored-by: Kimish Patel --- .../xplat/executorch/kernels}/prim_ops/selective_build.bzl | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename {kernels => shim_et/xplat/executorch/kernels}/prim_ops/selective_build.bzl (100%) diff --git a/kernels/prim_ops/selective_build.bzl b/shim_et/xplat/executorch/kernels/prim_ops/selective_build.bzl similarity index 100% rename from kernels/prim_ops/selective_build.bzl rename to shim_et/xplat/executorch/kernels/prim_ops/selective_build.bzl From e174887dbfbb773b42240372b389afed22bbb3de Mon Sep 17 00:00:00 2001 From: Anthony Shoumikhin Date: Thu, 18 Sep 2025 12:18:52 -0700 Subject: [PATCH 032/395] Add image to multimodal runner test. Differential Revision: D82183713 Pull Request resolved: https://github.com/pytorch/executorch/pull/14194 --- .../__tests__/MultimodalRunnerTest.swift | 48 ++++++++++++++++-- .../__tests__/resources/IMG_0005.jpg | Bin 0 -> 77700 bytes 2 files changed, 44 insertions(+), 4 deletions(-) create mode 100644 extension/llm/apple/ExecuTorchLLM/__tests__/resources/IMG_0005.jpg diff --git a/extension/llm/apple/ExecuTorchLLM/__tests__/MultimodalRunnerTest.swift b/extension/llm/apple/ExecuTorchLLM/__tests__/MultimodalRunnerTest.swift index 55bcbb0f407..e1ee4372187 100644 --- a/extension/llm/apple/ExecuTorchLLM/__tests__/MultimodalRunnerTest.swift +++ b/extension/llm/apple/ExecuTorchLLM/__tests__/MultimodalRunnerTest.swift @@ -9,25 +9,65 @@ import ExecuTorchLLM import XCTest +extension UIImage { + func asImage() -> Image { + let targetWidth = 336 + let scaledHeight = Int((Double(targetWidth) * Double(size.height) / Double(size.width)).rounded()) + let format = UIGraphicsImageRendererFormat.default() + format.scale = 1 + let resizedImage = UIGraphicsImageRenderer(size: CGSize(width: targetWidth, height: scaledHeight), format: format).image { _ in + draw(in: CGRect(origin: .zero, size: CGSize(width: targetWidth, height: scaledHeight))) + } + let resizedCGImage = resizedImage.cgImage! + let imageWidth = resizedCGImage.width + let imageHeight = resizedCGImage.height + let pixelCount = imageWidth * imageHeight + var rgbaBuffer = [UInt8](repeating: 0, count: pixelCount * 4) + let context = CGContext( + data: &rgbaBuffer, + width: imageWidth, + height: imageHeight, + bitsPerComponent: 8, + bytesPerRow: imageWidth * 4, + space: CGColorSpaceCreateDeviceRGB(), + bitmapInfo: CGImageAlphaInfo.premultipliedLast.rawValue | CGBitmapInfo.byteOrder32Big.rawValue + )! + context.draw(resizedCGImage, in: CGRect(x: 0, y: 0, width: imageWidth, height: imageHeight)) + var planarRGB = [UInt8](repeating: 0, count: pixelCount * 3) + for pixelIndex in 0..8-PzT64DX?C@26x>Z1VOHvt;r?v`c%fSeqF0RR9X z0-&L=05BglsE+~&vjo8Yw+;Z%d>jD)m~5#3EzO4hpHgW5Y?%L9hbsGz;(HeW6A`Km z+8+jr0sxH(1%nCo-UlH2unPzEU;IbuKY)UUfrW!dKtw`D`A}&1^l<|)Fwn3taB#36 zmxl8BkON>b;jq52euu|aG(@0q_{hL(<= zfrFEan}?TA?1#97q?ELbvWlvjx`w8fv5BdfxrL>ble3Gfo4bc+P;f|SSorUV_=H4o zQgTWvBsVX=ps=X8q_noKzM-+HxuvzIx37O-aAiXvP?*8HN=|8+ap3472>tp}_$qVy?7c?v^3@pNbctJtC{f9UvEZi4Xc&zV=2!;;W z6l?*ApGD(xYPyjq*_AGEj2!KQeXau+JBk-|05Rof5q&75&Pe~RsdoE=>Gx+ z8X5);1_lNW9_|D1i17abA~NECf&Bjh^}j&>4>0~0-am{$eb|77g+=(-KA|9?eEL7e z``Sn2qI+Kkpu#|XG$t5KfH2^Zl$NLLU@i@dw5PH(OXd|YRE;J;7!IRUI`iGgrDRHl zfx)>9+EkChMPJ4UQw_JCI$E{NAzJ)L8Kj9YB0m7H;^|uZTgV`13?4$n-*Ct@bsA=0 zXqDt=mo#irhL<0ij4JSn()bl^CNUH>hH5tz?%ZD$RU&@>T@fcBVf0*Hb7X`xB!bLZ zwKc=+4rDQFX_~<`uVc98e_2`=$TOeepG5IMXRwB2^Fd`(HKFG*M z3useVH~XXJfDR>21_~#O0(pwTYt9^Hy5To->R4_r1H&lF29w3AvQxhx$zgUpx#o!RRQ8Zk138bRWKj!P&x*SH-VrisaoY9>+zH znL=eU_+`{N{pV1v?@)&!V8ie~N8Fv9(XgaAM{#=IY8WvuxhZTO)3?ntkP9MnwIozu}xo|GM zI6S0}G?AkJH34ub?TPPXsp$Cr0ei>qo!S2+Swz@d{swq!cwEQax3&43|N5`fT$fMI zs=vMSS&{8~iz?rR=zbHz(>!dlkPJ;%x}?lRW&jIKdTR#m!g?6aM#e|s4YaP1~k;;J33FP^|3Uzmbm5bxG(u)_ycS{au&U|HD~PuhXn`( zCxWF7Hn#uF&No0D*17c{h+Ayz&YxgW?wT32kiK0QMDFdU3nf(Xl1__fbu;>O3C?m5mSmu~&h6*5`_1uM>9rab+yH<2HgJS1}TcJ0md6nEJ8N9n$Lk)Xx8N-vs<&I zfmMkTY<*Uy(TyR$k0ZcL(x)nMT-F1atOE=^qDvLCGik)g%@E5ohIq0sfpM8&n@{sc zmeyXX_0z=f05mNzMKdZXpeWrUDcD3)$ERAG$&c;`Xvi^0BChkxu}+9?9H>nz5RpkD zSj)rNVV>Vcid#<@-k5aMeI2&uIc9 z4pm+<=MHeqh0JXEu326q53*X}I|5x1@B#Djhpe~I6@vxa#45bmHP9Qg{@7D$^?VNa zTlLEF+aV=);Y>~kAcuaPXf||4^JhN!Q6-ivS_h>g@*1VjFcnrkCH<_{J9=|$-zhS| zk#3kdghf`Z;Y9`J0|cIP{jm{A{18@}8MWR_a0c-}d3kC%gp4&0+pH>NQdtv;>tA3I z9;$o!^2fQ5Lx{0Nu)S63kb<(>r|269?8x+&qk1~Z3O&TA`M5k0p4^=%x#qpBfaS7N z-ZDhi`8b_kXoPO17mW5by%{gJ2r(90wl1EIN-W3x_SQf%N?YAK^RQ6Z`61NdS|Ek#5ztYUrp`37iZd7c8;u^2 z#$;zfjDuWFX9(WC^at_I-%Ma%C!M!M(*+1du9}giz`lt1rRJ+_mfv^1gza@a%?Eyo z_$M7{$9Dh|D{1*VV1s{Q-|hFU2dALuv2uIDSdtmjiL^k%6nbwX;~q~CxAQ_x{_XLO zolMrUIZ8Pb230cp?#{6B7hdYLc;CVr#dt0$8gzr$ZKS9^mm{mwsq5&=UO#|f$E5}{ z(lsoF?u?Z0B@e*7P~p02qLi_WEswF*T$62Jt-Zi7AX-Xr!Z-0ba_IE5i4w8Bdty6D zu-3PMVyv;a3+>=h_3b*#PMoyk7&J$EcBU4EI80?jFxbgRC-I!1B!4G?_82Oyifi`8 z^QD2jF_06-TpzYlg31WAxtvy3VXiOB$VvP(~tD@re74 ztlbqw9QC5oXU5Bxu}f06Fdj3~{w?^bllm3rM`qGoy-3{z$Nw4&+ma-&RL%gzHI=S= zbGtodkHD;U&&&Bz4DFhAMW$U%TexO24rACm_(sSsIjIt<@tL1iN#RoAv`u28Q{Z5d z;TukFwFgpCUtQd3$mb%eefF1~$1dHT|cuRj>se>YXDgqXS$xxvH(COvEJ z;Nq3AgfrT{^1N8x_9eBj(cDIL9vCOY3?Yv&?|;Hl#=7qwgl=%OQ{l#>38^0~)YNBE z$wapo9w?P`L&jiM43!6*Kg>P5r_5H@Zy+$-MO!vePQ;c*MgGP`~S1Xn27gm1%+nwAK5KXBw zuR0h^1BeBm(c5Mm4K@C5z1GjqkcS8X6M7dE(>=`c!=zH>2^bH>XDWy{V+m6^NLVNK zK!hBWOO$=}?*No>m#CWCN`H4!XM!cYIk<|9){tlcqjFZBu3y5)Zj4KL-IW47*148N zy5Swt8yFQ1-yWK}%??lUXLBN)4GMHNoDMwIb6zYLIMDce?g?G1T<>bvzFduYcW%yr zqA@yr9dvXKe+~RNux%nGaSA$-w0hK87tTN)$>3*~Qi@z#e1#8`BUF>xY7c>YZxOL~ z6fE;)X5vw5Ccoi)%%M6nU_X3L&p|-mZN$9UYs-T6cD?))dY%WIT3B{w4h@+Q1-%2j z2tHT0tE{}9a9n+^uHdJ`kasa`*rP0j_ac6w-zUe%7{#`0)V?e*YHrw} z&{TY1;iJl=W7d*=2Fl>Km?RAAS&?GODzMW!`1xszX?wXN7vDOq z6iQnSVwyO9b;|J*)20_pO3JLqBG9gq{p}4~|2ub3MP&jV9_JvC6eE7_XaAIIE`D*7 zktz0=D*lIC(3V<8y)`Y<-oeiMbt!NK*nW#nmiN`=>%xOr1 zg`s04m^Gy&B;wxQt%$VY(nN$}G~(Ujd4#iCS!KA~Rc>cacQaC}gm{hU4b zrvZC9v-V7wwbGmyD&omRoX}S{Iz{t;Ba#4v0Rxh`qls!nx z=^I;nu>NL=(wAtk-RL-7%;(2&#&93{H+hCG{2~L=n@Wnp}6DXYO|-YpHA)Z^b7}wjk9^R&{scIcbyqf^Y1*-zRaB*-)J;G=*&E zG*;^7t6gC-Q&2B8*)cp1Sxy6=3Hu;0!vavcnDMK!1F_M;d(9Yl58EOa1@&20V3Rgv z84;mN!MX#XhOvkBEf6jmS4K8Y(vOk)C1T=@Q zK0EQ%LI}p6m&hogJ=*B@?h8Uf!Kf3B{g-F;!EvsYq{4;Oor0`imMEp|rSSId z!!K6Mcyb-HE-((BrHvP8vo?a`1(_^04ONo0l5e_$Bx3gAEG8;;+^eArOsUobT*}{^CWT~tFq7jLXJyi z@8f@_Fsmwh%*c%|PW6Ow#lTp08;bKid9CXNN+_|GDP?IWn)7kcI-Fi7fKnRU(VPFoyCr=c+s4|@GEe2*?J_|XIjZ)UKsA{&-5@Ph=G>!x#iDzV)TPa2n70C9Z?Waq=ZAZ= zDpIy@>dp9De;l|#L96O}!h;!%y-!&KK1z?8w9Bz)^zQ&cRZ+gC5?JDhM|A40X^^g4 zOX7SHSR{~V>C}*oAh6M6!;3U*rh;RfO-tT3S7W_!y&X}g^R1zZxwqz-Gm*xcl{~#P z@EZ?L*B}gJ!u>YwGlV})nufUEgxDYBpMd8R@&ljT@;x7Y2XCd}ik=lAX%2c`{&_ue ziyigI2I)@p+?f#CYr)5-pWfBKf9J|xqpXt&4q-z>Bj=Rl+eMt$?c<+Oy!LQRK#YM% zIwizAJ{Z}1)s?mrEJ{e=ZDBTIcyz$hthM7SXc%LhpPg z`HTq~+Q^Q-Q$rXmnLTiZ<2ce^jJ_(!&v2Z7kkuXc>g(sjNxLEfEFYoGC;8eyjUrw~ zf8JZp`|;X(b8__~9?weWJbGqpQzb4}K|JYM7O;di=dw1HXW(fnJLKw^F-Zsc)cN%L zHbslo%+c=_p|TBaAeikt>AVnlUp*76qtOwX82NsUTWvL)cihEw*w8760Y+z(fX9}` zJC!uXjSLP=KT@Fu<_z-Gsf!}ptRo57U(Su`ZZ&hfB(&qy{Z%hy9y*~)=@_T&dN(-T zakbqIQG5=vEN>AYL-n8&SGWr{L5O=zxg(zaZ9^|$)U5ejEB(3guAAHC(b$oG% zI}z$Z_DGySwlMpajX&m;4LsUU5w~SkXiR`pnSJvaS7HyB8C0dDo|--%Fh&E%il@5L zvKMLGoa>A!nG2(W6-~6$$r-=${IYz3oU^ncA1F?ZqAoQJp#Z%Wg?M4fH`$pz80u^Ukcz%JOarjqiB^1E#5nCCL@5@3M&dIm zBkzix%t8jz#HJ=U$^oNAHJvo;1Op7~ZGi=qu|NHB;VQRyg>5VjM0S>GAE&%kuopdj znk$a)byHG6q{qDY&FoP79k6)KvuQb8pg`o9`uR1N1O!(Hkq^vz1xK4&tGnQ4cC$$^ zv^VlMn~a;DRthPHdoh~qWP-E!2mBvJ+kSHtIEr4X{;5v)cISO8)rQlj2mlfkmENg6E6+nqSj7W7c9+_$#40Hu=T6P>-J{CDTfw%4@%?a zbxM)ShYRxGRUAKeI}tLrEPvBLXV>A(-{H3-N+uC}Ru@T9kFFUj{@yisz1PCfO>5s; zGesv|__jqmf?yaSnKHU3`BzdzuIBIli`S^T!eh7eD$#cI!1=!iQ-qIuW znzB{4Q?p$FJo-tr_9^&N9%*_6WWJX{2mFg4vYz#w!-T?lkaoJ>Py$4NybW_(892Wp zFZ$+v1j1wR!d^5g_^FZqxNET6@-b4>x_q5-;2qX_dlR=5aveYb7$BE8peBts8V7p0 zRPLehm|CJb!CFjZr2$a=LJZn*7g)Sc&S-`9pTWHCnE3 zpV|{`qqzo4aO5q#%ez4}Wq;`_L%$;H#cXm`*=(Z8{E~IR9c-Xg`(ixllA$a&#nEyb zW#rOS->5WkCK_Xyu$qa`;j8K%snUF@AAa60R|*&OEU}sYgEAxdB_PHcRsHWfpdlG_ zD=}XofRO#K+oh1>C-sF+*~%(Ck2E(x{yQwxqgkhsrOgTX?>(?7xQq2M&-Mi5hdER= z*__M@mWfDI3!erN&99`My3SMe?}q34BKLKND|vud|v{ zfj;JEB-$!o7B9U3GXXc%^B1w3p-NS6rG%BA4ayV27)jp`GZF4;FT2DP_+XO^@zy(CO^h z(v|;rfq(xswVjP;m_YFR@pKc%S3+GeM?!;q?tj}Blz*(%efZdel_fYHP4iY?i(X@1 zH1Q(d2oIl_eo?0#l-(I4n}8~%4WxIt_1b+!JLj6F_89+|!z}-SX(!lz2VlCBzLB_X zzUADZ6zWcXvKgg#`YgbT^Mrj7cK4_njYhi_ntWZe@|pt-OhF)P2cZj&B-~ULcV7~l z;vhrVjugGJvR#*nOG4ZqN!%+TGvdOl6s<9+Ktj9U^G75Gm)| z_%(iI8sk5y7bt(8FwYrSJZOi39G;X@{q`r0IJU8=sEN<^#mQ})LRT?(WehB&uQjD8 zNOxC^iMZO<#AuCn$eT*24|8N>8H=Xgu)DL7hy8*PF2$UETT zmmSofq>BS(4##bRW}S^&VJDdD3gd8#<teH{-sV^w7QtTM*4Af{H#q$s=mwG<1YA2wF3JY_NG2V=B*#rL4CD#zK#oEg#G>d0 zG1{~x?arprm>079eSL?Y^}T;yVl&-wIavMzI7T9x3578yK!zG0?k|;9!9|^OJN4=t zR9=n8HCHolhb4m$$SXW?kS|f zd~q*Flq4ktn#v^ZhWZ(}_f53ugdZK`EYY8JHiQ7D8uG;ZEWmB8XhB6NRq?eZSs z?^A^DVHv4>S$!te|Mb8sjiA~@<;i6Q`bTpSEar9>DE#c-bX65E>LEkQJE!syJ&4&? z>x-v=H^L{IuXi85K?s$c#>?co7-LLMK)v;bMv zA0jTgI#(f-#*gQAS|DC%kT z+WZ{7C|q1cT>A&I>R}o#`?S@#-HKh(CEdY3>v0E)=euI2{SQz@-BGnb?K!UE3uBu7X67}1o*{+q~```;o z#4=B+_3$0$!DKCD3s0v{4C9xhuuQRu^6XZpO#QrFML_6;I;sTMKp%FN2&2}@8uKxE0)(Y+P{I6u3T-Sl_LUwE!0C>S4lbZloF0GRfIvC2Ok4fTheY34o{UaBdC8gSC-nJz4`Mb;Q%b zpB733mdjR-WA^n-@@6NLI>VTq?7EYjdY`p>VF1pG z1HjPP&cAzhDx)rgTHU?Po+)_hwFAR@#h70eGZp6B{q#FqTU@=uFq-$XH_Qt>$NS62 zLW+#qTB8fwv-*-)637_$MT7}=vzWW3odveuHb;r+e(|}TYg7?mXG%!!>Z0|hH8JrB zGW6jF;KT1YfNt(-4|O<8i%)R9(r>gpTN4F+;N%cONjC_H!LS?1r^fYljmW9Pv`*Ly z%Vpyw1r03>NKWYhK6TjA;TIY-UZJ5p!`zicG9I*UUfvxO<;q+5_>F8qK2IMYPJ>rk z663wtMcJY%ZYp!7a^M&vMoWV;jY&&G&GpZvfS)LM4D zw>uBA85~pRGJ4}HoQXU9ulM1D-p-|L+N-eo|Dwu0ZHHgPou?#smzk726v%n^gQ7bm zX%7_@SF}^-7WZf_ZtUYyy}N`X3;oa=MDGTqmMDIwp(RAOFNk|MzWEORj{A(7Zd_HJ z`;|J)3Oiv&z(K84a7?G%L8+UAhnAMaX}Mh6Dp~C~*iOH~g>BT1&VBDN9S4RpgftCi zXkt9zg$|wPp0gAJRGJ~f0NZcwH9vlMD(p8oOPnS|RQrsG>H~|K$0ku4`HRPrK@Dmmh;# zYE_u>m2svZyhOpwQ${-L&p)#H7xpb5Ud}&AM`}+EK~T>_I$Rie=%qg4@NM|E7zY*MaRY?$lwC<-?`PpwBhYeZ8G8^;zC<-&pGX^Z?(t$!9?b0tHz*8_a}OBHTr`ESB%MnuC9_H8 zhgxcX`7RbHP?L$l%bCf~8f?icemphaf8vfXpg`Pv$-gC0wmp0zH1q_DOF5lu{t*IQ zl;Zchn$63+z=NipJmbf5SYn?+_K*_)h3CzAmQ}mWxm~3w^W;*6G1ac<*;@LB3(xIV z&21Hi#dVyEgS%^;L9i-S^x`j1?CySV6SMTne?=Kf{oz6Jsz|=h}tf7li<_>8bjus*GzYtbPZwIDM$*49EVaLWzsXl zD64jLX78I=tAEGkv1ZAhAr#7a?nN=N{}8U#l6#p9dTsYTG|ldPVZ|vH4_5r6dDl-N?2k zz0>!~tRSjU`LciWG(w7RINRV=C11i% zH52f}Bh{o<8xcLug#zf0?PQo15@fxn)n}rG$wDf23%_n9D4FsC=UVU2vy->Z{7{%g zW006Mo;{{EN3D$jYF!fG3DvCqY4J(%2#89xTPtjw55fO6%n18^E@cCwZz1&|Z`-`* z9RLpk`jR(cxm<19eY^*%31Pw){W)*1?Q_COsyFI|v-7n#quS0{5c(p%;-fl8YXLYt z#tya2$uZxyd#yfhKc>f_PExJ;5kiEpO%}@HAlrk!M02OhJ3!?gwT9Ao z7Ft7njFj!Qwv+8409KEzt!~O#fUHiuRp%Yh!SW86QXldxEWfmcT&YA*cM(PN`U18E zk3dd+e}7x&sq1W}BA*#g-4Q&QwfJ_Z9xY8(SL>oL=sxT}i&(i4abnMPqdi61BA8qS#Ft8 zdz$FoP%Vy|;o~H{Jm*!}3CC9wA#pPEzn67ibRW`+Ma2hNN~CsR5(L6B_u$vET^)EAoVDgldjUr0oCI3Do$yt=w+^8;az%} z3(9cUcC!X^&Eg z$928~_8io?x>;b1@sB)XD#Uh9#{z0HgCR#z>LzjMIhoZmwcipto!;d1f*%dMoAs67 zIDL$pV-bhe2S>YUKO$#ZaR-D6gX3j1l9%3p3T$`jD-_jR+2eQaAH%A2ly^hS2*{Ev zTnPOFzvdgT6h^}tPMM~?Anfa+lA=y$8`*tb1M{5#Fp7A~^$2cg&P%pWEo!hRhX{7m zGCE@m;adI}6~E*{bCY@B;T&)`YEjq!M$drXBd;5!U+tDGGR!J)(3=7vus00*i_OP-x1w$ z`K7Q0tSLjiuKG>5ll<$a9nq<5c-P5im#Qol(*V|*Mjk=WtcebcK|DvF=EW{I4hxiz z>Eg@w1abTa9pE-c;*m_=)y)xs5iqLzS}9O;Ma82Gid%gL_*anpki)(5uO>~8gKQT?s8R1z zgC${(bsP&b3!02xD~4M?y2lBoI;!82Vg%$&5juEa>mA^>aC`^}619-61M3#q3U3IG zyZVy;&1;Tpph%0rNcLk8e5{nIpe+f`Y)x_PBx{b_YK>+3RYldZ^NQd($0w9b6j@UH z77|V==;{3CQdQh`rU6F%r@>3LDdF{oGrp$BROx171Ati&N+G?SSj{)vTQ+GT?0}SL z{`!O<*Zc}hL`~4MZG5CO0<4`cR~(rdcGJ4A#7k|j3xUD;`sGdJ!x<&RUE9!2!OTRt zMyWX^-yLuo$sNnms}YF3koTKb`C^GT)b+oHGZQS6yQUjA?Eu*X+4zTcu2v=Z0$0CQ zk2@=Nluy(F&ZEHF#?5|w9d?{!`Gb_fMZol=7Qf%27DmU-S)pUthBmD3K$k% z0aUj#h#V%hlPCVLy3S+pE${v@WK!|7KAVK*;2LjNg%qh&qC}D9HYLIF%i*LMp}4EP z$G#viak%@I_wHlqSe50>y-VPY(|hm=Hjd5H0QJqK{`3>MWT^F(#?8$D13ZKdDGGgm z0)2ZbxNh696w#_c{6*8@Z%bav3PYTU zp#*I4$Y|m_{+XX%TSHX0XiU+N>wyiYK$*iiVtEgZcclrBw0Ydz_qo!l|jB%Vo49X*W$t0^Gvo8lV)Z%Oj@GGE5zwxE8#_?Sk^i2yPpQ1Iu)!AdyepM~zZ^A}Z281?Do1Reg*|1 zI84GmnWb|IV;rdmnng>-4K2r$kL>RPzEO6f%kv!n`k4>pbB~`_W{}2)X%vArxEJ0M zt}rH&I?V@`Ng7jBo{nZEVkrpq@;s@<=jM0xCZ|?R1SvIU&TnZ5&vE=@(cSv>#jdj~Wk z^uGh}lLG}!Z-z`JzSJv_Z&#uKh;FMgIirB;b7z@eE2#yU5~Cw0nko0atTeSZKYuP! zL<1kg9r?adr^R?1sB5CVrP&ajtx!#Edop_Wy2%R<_j@Q;(6a0uVMgx?QZN>RwGO#p zqHiPW4q)2i2Z&LR))ITQOn+Xo>ZMih^9&DTmhJ>D_Cj90HmwQy_U4t2 zS#Ky}n?*+g&!-+aT=|pGC-chVOQ>i8A@#4NGYoyte?NzAi5_6NxhYJSalnDWdSh;= z+=U450KN9d8R_$!uUO|Mo`QsBrGCizA)o4X z?rhH&I7wTs#`2d~Rt`w<)7#czcC!A0cx02%HBO~(EO}98 z6FZX>?2<@h7A-bQFtnZU2zypPIci%{&usJ9q_Sb@W1;SEwhNq+g&3xpWL?kAMzVUS zM*8F4{+`}ccx4e-MZwE+J(sE@yr)FQ&-w>{CH&k=HxRZ1 z0^WQZ6n9_8%J+%e69AIFg!p4d6vFGdeCm(up*gD$gpeefVlX>BfOVMNjCA z3n8Du0pbNYhu_BPYGQSmSidl-c&{lC?-&d9R{XRpTa@I8xW`EFS21DnX*oXDGitPh zbx*ptwsvE)?HUP+`Q>@3BswMkrp50tY(@b4jHxOgcr>PDaTjwc(dNgg#MHxv?%sWv zFr{{)et9abQz)s5P{=w=z7V^DXspj7*q=r~`@&|m#1{P%ypIai^<+HVYAns#IH}V7 zq0;t4(^xU%a0vfYbN-Dw&22InL>vaV%(89hUnc2S?DamMz|gn$0Tza?d4n;bWn6}~ zA%6`waY`@?@A(eC#%oOPeXH~ev|rcr^kK9qsQdJqQ-p|NW}Ee02dy1-^ODF+2%i){ zS>PgsaM7~RFqGFexn#({v(Kw4>~>iK#veYiROE8=BI<)I zetgs=-@O`Af6{NWthrcJN=JwSW8WbiYrIQQ*iIm|CT&3=W1R)4Tn*MhC)s!%vRbt$#M= z*i36-vdTqDh!n4HP!*KgYgvzeDv3zNOozTc1isFsY%(}P16g`&pU zFbR2)^u8yI8XwqBC5hRPvtIx<2eoBs9W{sx858cAN_vN!YtBk_^)#d5U&G)|@(l1T zFuQdgrr9$HYn`%WTf{OFKN}Y~;Q;`?hs=iW zhu>7^bbtKH*d_;UFK`aZ1fw+%O=@js_7%JE+DMBKiGuD0rK^NzchD$Fng_`$JmGWg zW7__}%r|Sl7WXSG&_1aX$K|U=(;SYsN0bBfG{cmDzvmj@GWY zQPO7P46hv6m8?D5?|jcnox~^G)~QMoy7^svA}mrH?Y2ae=)o1(n@%sVVD7UeC^P|M zZqv#`Ni7<1jm4LcrLR@YoI7oUs+Je+Yh*7%dONg^)WGQ#o5R)E1Jzx{_i|-(V&2N4 zEB%_Qw&yX%TDA5|YUBRCv~5`_k_bMUvI}7Qk#`Kn)e>l8t7wt5Mh}{AlWIrp`N~EF zmUi7YcRI63_*v}5%wnJLlN?pU;qOOQzq(8yJ6vDMq=336Z78j_6UpoW2nm4+|9U<8 z=2f_7U(SyEG>gq=6&*vskxx~^VV7oE=Xx|;Grf_UTaWKrJ61<;MP%7S59gi~3O44h zWr8t3`A0*jWSgmF`|6pF?PI`%dh`8WaYuSGl^Hx@J1C{E`KA9KCu^jiX*;;)<7nDeMi5>AuNMkf>BAG8c_=cs`#*ygxdtfU>_B7bWb|1zJK{;W#bmT7OEA1DIf7S-CP!$hVCpP%ur{i7S z_>{Y}AY)3-G|oQ0kgk@N2C4SS0Cfw1mm4R8Lg|#%s8Otg{oo0IjPd&6OoJFTsJa6TWS^)nUaTS(5OI&s~DlnbPJ%awX-)CE0f?sKL}qH(JZS%VRS>4H0D4xXPyQ-_BC zNeQq>$_BFQR5ompXz|}R0bPEm4pGKZ$|H>XqqyBx@;pzfJBd>z6{({AEAk;*x;YSX z<7K-cpghNAV9g9{T_Ho2PU)ZR3Sam_(kkS8w>kNEtS9)|R-$9|ZE=|jl)V|C_9)Mu zM64F|FJ~*U`B|{fjGa`J&gf@!9>6y-zi=klnFTvSn{>{>tN2ine&$jju4ZjFs@InP z`L49~p|G2{+lO|klC08KCswGi-QeLJ@T(zKbJey9Y|jKYPIXU?-qI9Tq(V5_#5!J8zpcf>f1Me_=dKZF$2wAE#DwudoM zB3KPMtgm0=96W0(*W(V@M}SuZDsm(*UbvcMaxGHhp1o7LHXJ?83(WK^F_#|yEZN-E zyAl~M;sn4%qkF1_Rx$T2MA-CvX%RAKkV0@pB4Hic%U`-%r0Q&nx+ph}lR_2y#AS0H z=o8K8b}ont*=rH|v2Z#J6qNfdf$vOyA?ep!Tv_I~)@+#U!{NP-Ncl@o@VBLb2nr%Sv7YlA62@*0;m=)m zaO1NjP&;Tj_JicUY&%)$Z>|M6z3MuV?lsCH+fYYSBqYyZt%!<)DYiZPiYk0}m z7CU21(v}B@y5lZ~E8RE(dmpR_*DBedQ9K0KVecp7D9sq{qAe;^w&jO4nxa^q3zy6k z#C#m+6wSX~q0}+$qHUoCoGK2u`rKj$Qz%VI9PQ=9*xn#b+Jh+HbiMbugo z*r&AiPZF0HG1?&ImsF>?M_Ql&sW)o@^Ox9%;McrQx zADNd(q{2UlUF?0qa^Pl2<5u z{1Z^ z&eAua`mMVI>ULWwb8eS!Dm{x~sz(OXM~)J1?&j0@R^$+~k%$y)Mm|gz?1t@xf?{`9 zZm{AoY4oH6lK5x~N}z|)AU-jAhNpVyLy^LaE%R1K=))*0IbY3MlECZVmm2kU zS~t?7P3>o|7VkW&!+MG4wFZ%oKa`K>3zKl!=I=T4E-l8_ly#F8+M8Qucb}~td0eAX zP2{roqkYw$)VYgae#}Q?H$tz$pXPE8%n6_^PI9c+Q0%LkH$;2^Qf1M{8+RLgLqsS( zo^H{{vh++2JEOcZ!gJtdNRwy#WsuO6JxS3KVU^5PqN7@Vzj)Wd-FJrvaXjK~!s49$ z^UZmn>oVZ4MqK)R;fi~TNAyDN$b~^vP9n=5FRYv4aaMK}+k+|EK-<=V_4z6HkD9jd zA#$|#>)XTOsvKJZ;oKho6-|0xq()VPW}O?98HAQ&T~#`oMt-C8EeNm0*Q^QckCVIGTjg$znP9^a91!pxBlp5#^u> z#`2mAy*gqCsV4qUA&kS&3KMhql`C!|D58_Q{&{f(XmY3x&>2}YpzO$Z z@Huh;Y+`--ufXX*&f3zKdqzIh42x%vL|L?U7_joBmZo`V-%C(Wstj=7F+N^yh%`9N_ZXgGq0Jg? z1XcE4e-m`lW#WUI1qLn&dK~!?9-UCkY^Yu%X_67}977hf6hNXwbsK-ys5XMMMJwBj z$ESZPM?!{~>SsGwI_?bu-JdSiDcZt+L|@_Wk~<4BnlXPlwTUV}BZ>B)6;wFXY=uzO zq5M?5A}5MwoosN^H*Nnn!!6Q}Gl3{r>@_Kw7`9hD#dqPZ`a&$=5u&uV1_c zVP>xm@wB;Qi`zA%2KGEFR8zdxByn)Sec{tQS5ti!Nj>YHm3HqaeqmTwAsp4q+-R3gQ#%q$gBy{P|dh=$C>Nl~<-jxG9pTfBdrT_%Zby|bO4hsSY zUTcrLl?<)4j8`2sVBeDsp9Cp~Brmb8yO$r_CYSiU%xuU{`sk_+IkX%G%xCQc1xZ4_(J6 z=nY8E9*ru>%(CA`Hijejaf9eb@dA1D6p_hudn9PPt^)HKy9H9-*}xv;*UMwEu6ivI zo6(nL@V=XHzhacU^AZ3tu=$AW1vY&P!g_Q4mQ|7|KJ%#=ZbwiKG3`vaI=`PCv&yR^ zN9GYf+|TfWr;460G6>8vow05p0=dEL0Xb}Yl55XWmX~@3y4+i7G+SGBi%VHP_D`6& za(#N9K9!%SJg>f6Evf*^fJk=(_;b>%uh{LK=Q(v8k)CtxKU(MS_08+%LJ3pAY<#Dl zLwcI(q^|soYE30$2KMge6d}=V~q3q*J@fA`&&lGZ>8!wew%(RV|I>q z3^xbFV7laPe4e@JKRUva{t+zD%wmkIk`YdDeNH$eUHJ7dRh}{{UX1 zTPvxqRV^QTgYxn_bsdNJ*E++cd0e>r=xnz)H&IF+%quyam34C}i4sR}xhkh8Z{f(T`I1_)oLY{Cb;Z2N<@@rvR5f zC2NcN`O`4-YNi65Jj zw_yYW@0!rJ(j}RP+a~f(GmPY5^s948YIMb4V=7?^INa?iZh7ZE*z~JH`egGM_QhH1|&gK3EL649WmPA3S=TfyX^-d2Xzl&J^;~ zA&3E#{_%Z_3=W+2$l|GK+KrM%T;ek7NEsL>wmnT}!6X-V9!{o zoz<*sO*Hg6ttVYbBDoO}EG58c5gC-QJxc;|PqzcwvbD`aONHT;q-D9c0!C8%NnGTC zp1$A8riV?@uI=tL*k@>9137J~6yW42J#*H)`Sj=w>f6P0CZ#(Q<+%H zkwNEa$2`|xG*|k4>coX<&p$USj;9^4e>#FWFYPTP zLn%kP$Ydm)+ll8r{{SwvFtOBGt!9=IV~qrC8=zlagV)}(T)esp%;|14>zM95%{8HD z9x<_w)(lVH1Ti?rVVdV4hVDz0H!7xh!A;8FdTFj{y z5!*G&>bhKdW-cI>IRl)KgyaL?2lcMsPt`Qo{{Z4qHH$oZx3`xb{0_tc>(;Dlnk*7q zE%uilBW55*-~z-BcHo`|aa}TcTuv8PM>m3hKUM4!5 zXoQmYyB*D*yJi|#HCO?(lwpbzvNi`mamGHluST%B)4#FpbCZ;Bjw{ghjCtegN##q z6;I!Gi#z-6TI%Q9nMm_bIXFxXPU1)(h^h4- z5NWo-?Zoig`8=o@Z>9+8>yOU4CyJ|8YDMKGa&2ivh-?IypS(cNQG>>Rm3gi8icJLj z&P*XpYS<<-{5d>&;Afv)){W<=~@RVt64lO*zSz$dS_tvce; zODQp+rh3;`;rlNx#$U0_8Zb7(qm}uN4?PIyoKjwBMJ;5W_V?`Z zDJhKd(;x?Kfx(II)z73ILNk~EpdM&Lv4$Ry{Vra$`i>RNpF zP-*Z_Grsv%OZ?rjkVZQC*NjErje6Ttd+RSMCbuIh&*meJ$AQT?^y0m9M{Q!_#x%c* zDAji<0OCN{!jL%(y!(Gj?x#sdV|pp3&YI@qR*Ze78O(~Smmu;-W9j%*+8(_cOC*uy z+1F!lAOKHO$8Y}tRa3gRmdXUgvP@WG3PuSXn04dOaa`~zp+D=n{shfZX<}Jr+w)Zm_Rz5(%`CGO-ckND= zE8f)Sb>nL^W;Mj>{%bq4dp0xmKjBtjm1W_PhEM598cecdV)=Olu5teW>#djw*b1fw z-bo!Z?_Qs^jjm$mtZsO6)+>!(+8}pfv-wxh=^pvUKZI9^cpU_~&U%Q(Yu{{R-MPerab3A@%|hpzXlrQB@ot>6}#TrhMY z0Y6h&N11^B=*@ID++{~@E1w*^g!UAXNQwpBB!}9%SmP1Gs(mXPPmgmqxvqj?wFn)B zQ+ktgoGIp9GketcKf8;Hni%rgr(CEfY3L}p@2Sb^OgxS_&3yOb`S;&AAH&6a_O&2H zIOqYdn>E=?*D^8gO>YID-(k{LK?|<&!2sc#`eLr@62%Nc#@HD$PU5GG z`+DNA8ZmYwjjaW&QDnA3W^y}|j4vI%Ptv{*ACH}RxW)UXQ*Av=D}*x{B}jIS$L3?6 zy-&II#Vx*reW{j)+7>aQjz79G}jinroRE zY}V1Ea=B?8_Z)w^03)A%mA|NJy1nJWpTrCz{{WtGT7^W$bCxVg9A}Eex02Ac#PKOr z0H_%!rx_%YNj{W8O}o_G%FW5HhGd#^q84|=Wp{Eylh-4@ewEBg3S2;*UdbauQ=O`F z>UR#^Gg|tCNp$4%8%gr+IXUzI59L^rh7t$3Vluefi~@UL41bMxQI>}_S#%;>duU{9 zW*An+1AN@}_Qh4ZbdEVKW0jd>IV2H~-kWp4=ZeslYjv^-aTm-9J9uG_b|*DK%o0ZO z&k)?Jw`z=%J9=m9*197mW$I+Sx05Vlt3vBCU@K(s2X6T_Zs~S1%5B7<*DH=d_h1P*_RVVj$s6~7&)-{z-U>vT{c_XF| z^Yp56Y6$E0c{VCWcOoid^*J?zq}x|p8oHedx|o@ZOFW_5Jd@5xs5Khh31&q(ESSOT zlh~6|O>3xY4qC|q1OsqCFY_PDmd4fe{{Wh2^QR*i11g`1$LCwQq^;C)dmBeg*h;U= z&UYQ(FHYc|zvnet=1ah=_Q(u_?*cM@qa0vVT7}ibu^XGB0URh(j4!axNvs_|P}O3J z;6$k^A>0d+r27G$YZVzuU6(n^`Xi%z>7}%5Lg^gfe3C|`8BeFux%Q6k*L1S8PTcNk zNg=+us!y)!^2+TjVFToq*}?pJ(xjS%@tCGNJDdvMoi0^kVL|Iqx0wT7SOUQMUmNUEWW>_jS2@4cMMGh3nrx3|=yuq=wvlgbZr2yLOp`DLFmOrw07gmV{{TGK znvpS9f#VA(WqjlkqqYF9m+klQJPh*easmLOKTf*9uqB;<}gC{&ug-N~;}Nvk7IM2gX7KW;n3ZIIa@g>T{my>^Gia@e$X z-d~;&@q#v#=rTHU>Ds*7CR>TpWG;;u*f*WZK<(T0sb-Q$ofyWltf!S9As)k%no5*! z?-|CWv`qF_@fMobcB`;hhmG0DeB8>N<9>Wndjx zPv$Ni3wG=kvQJKc8sfDJJKIN^(8rmw2$5TGLNWYH_q{sxt*c5hRtHkYcXy>n4cNK8 zm`aoMAgVV&4o*VwPp7^r#e->ULn>2aUkt zqa9T6IX|ECJ*zic@V(xQ<5tnpY`H8z6k(V;q6N$T&ItYd!_?)rXYwJ2xRnACD*bRf%mpu#Q|a=#1=CiaUWlqgP7N-2P3KLn#YGr)phH?GFao!@D6Z5 z9dZ5@+v^?(Xb|bMU&H3G#PEgk=E3BEa!%frv!vP$CMYC=-Gp%LP=SD8M{UG+$7=4a zh*0IUZEaxl0`AENYJfrK3;pk}9@UMk*;wi<9yc!W`Dhm;<8F7K?(u=o ze@e4`eW_|!@EgVveUJv*#zDfUUI`ozPTbb}wwjQziF~CS-;z%Q9E^jIc)%IXYk}32 zWQd%l%*?+H+UfouxR+8kQ`}rS`D`L!v4w0EEV#=bM;38_r3?5C$+d^iWPZ z9Gd#fyAZveuVdW6)Vx(RwmPdn5^1)x&2Mj|%C5&JZa6*84?l)$NNNjwe|=zW)Tshl zK?EMWZ6cwt)^z!k?H2nXmNH|NBOI|j^(TS{Clz*0ZGwf4P*HLK*c+cf*e8#nuUFb8 zc3Ry_E3CsBfpK$gbAh}w$MW>)&nC8PHQOtRe2ZI^aHMd7pN~!l^QszTw->-idu+1q z05c!~Kqu6IN$Jz{t;<1cZ+6z8EKC@QRa627a0dgZu9z(dZQRq;UPBym#xQ3)T$Nv& z9S2J0gg5pw$kvBxkc{vT9fz;hyD2O#BwQv+V7cQM=K~r3m4P%jFt9#RkfFin=ErYu zrFBL+S(sFJV^(clf~1YKjNz3zIqjO$d37ma5;_yK50!8Z2XZRAU&kT{2kzxO1w3=^ zYTbyq+*VtKo-lHDjC1(XoKjav@t?lEr*j(8nwCf-ya zD?i=u)~AX|nZP84B;a(-Om5zXBw(JWseC_#{i11HVXt3^~Q6InN{wM|+B&wNQX%V}HKiuv17B)b~-PZrI#(Z_bL3%=!zbjM-H}U4n86>0Ff~aa`lCMnOF>-nm=b zqi+PvRIuBQI*)qnFNX)-xD9N@G!xtY^N*Fbe5V9wAe@T%{Ncq=l96!S@t%9M^U z>++lvxM$Np^Vl2{J8ug;w9D@UQtUEW#}+fL2}we{*-Be{L0f`X_yJOV%i_VtPI$>b!`mH6sIA44t=-*J|MTQi%H#hRE-baa$>0R1!+Y4dQBoSnUF^0Q}9j zBmgt=jxkIsmh6AD6`*0x@~2@K9RbE$=qgbx>u}Pv%=>#P;9zlD&*Hc)tybSvf+>tK z4=A!6hTWW!Ibn`5(zTqOz0_eYMxQo^DDM4tbc##qT3ba|lyR6azbN$Y_}67U z7P}m&*t;7^dd5ih6pevp+9N0%fC%mAFf&e$2%s3aB^`;}1N1$C>Gk8SLH0b93q{hA=3HsZK4EHEVj{HM{TxQwya|+irarP$u$+;v2APi zVCYsk4rEc0(~Nrm06NRRhAACxK4)~u!OnBo0iN{xON)DQ&vKimp8)RpwvX=f)1_zZ zR_xZAQQhiXV=d~cN0o~|DaYOeuo&-De|r;#xnTx5C6@&KGBI6hT>YXX8zYJ(Q<)Ao zs-H}B{&fxBpJ3L);>uOJiGEoFHdveko_d`26_Ts=S~Qd#p5`DaIy^hE9y{#^*CW0v zq!1a;nG}G;r|(L3WOVecU$NTTmX;;LWACve9=vqLBmI)}>~~O51A@D<4{_S5w-H)P zN2&kO{E)i7g6mQ9EvLg1jqSrBUJeI5bKCK&b{V5qH<%6oU*eqT0~4v0>Dsibz#m zkIL%Auy4FE=toZHHD6Nj{Ca_F-#v}Qt-0NpkXVvV{sGDBTG45X6}%Jb60B!Cu;Xb` zGoJm&PM=Dr;w@)O()9HfZye7f4=)7aayciU{K>3f{p66jQd*SL!#;JJ+gmz^J$#}A z1?kj}Us~rih??33w7764OaU7F-~|KkW2Sr7m4}Edbjc#QdD?c`L#Q0{$j5AFnswF8 znwksS^CLwM9ROjwf>aE8cdmw-mg06=BZ@(LB!b*P(;eR;Mm|*_XO71m>Yk%#t69o| zX%ZWWiDe)HHlDc0LJf9Fq|bKIwwE|*oZ&_Svi87J)aS4AskILYYFb^G)HOEqWnxuQ zM-3-lPR?*M$3a~Rs@bv~F6W;~Z8T#Ov7T1>w;oCSD?-lkW0F5ME`S%@yTXIqbiw?6 zE4-gi(6s#-E;Ryv*cdXZK0wIm-iMD$<=#spc4W4fY+*~V#=y<#fyd}`P1@RngGDyE zo@peV6qp%`hi=8Y410cc-uQa$Y^^4Rh>?qVsxgB6bGP2S?Pt5wt*$QZ_A$64Ft<=a z1z3&%9{uXxjV0UNLnfyZL?>(^2WVCt=Z=H)u6#upxZdEZD_NeoVk6a+XS8q-k@Cnu zARaS;!N=0BN8#5?!vq%sLmpJCf^vERlbxflIIdGscxIFv0!bI+picj4M-}uLH3**M1UdFJp}-hF}3~02Q&*2e-KG z^{wp|Rq+&WscEpSq@gyY!8k@#Ze!S-w0PlzdteYNj-SU?^V(Z!+O3Kb?lCEUkyjk< z-GDz1)zHPO+D2{kq`63A-{z{OF(SI>JYx%tb6B-DuGXiWUZk8-YhLF?Car6$K`w`@ z>FXV)UOz1xdZ62YV;hjFXQmU5YlHCSkqm13WU-=T?dJ*=mjReC^c{O+HRv&FHkOOz z2?*Kq6~HA}akD*f$YI~Ta5}e&Rtv*=s%(3P}}U4~);ct2iiv5I))jLO(a0R)ZT=NbH~3h>(<_kuN+Rm$i1w(*af z98#f*l6RLtO(o2c=a<@K?KQuh3bRP3e4E&~!RTMB4`M27J0wYq$HQen<8x#VKqnXn z+*J3HT*qqDCA!TzK3SE^Fc>6&xEyD{CaKGLAzjWwJ1$7Z2p*^O9@XwfyP3~h5ajO8 zmrs)ImU!iei*Vf?3EEqpGoHStwy&>l?DPk)wpp3nV4`72W6*8m7#tr;=kyQl3yYZW zWR)%#3DmF3bC&u6j8?Ve7q=RkJS-YG!7SturJI(*ji6v~E1!{jL#Biid)VbHbp38N z@@9)^Jc3C(efiHP1HVkvy0o`SKGSaQV45=*INGH_9Al7kjybOG+94N~%Gg->+q-DM z9{qFvd8)S2+g{59MH!K!XDjmjfM)}ao%2~k5~V(ABcIzl`^PVFZK&I6Hrt{K_R*>H zke1FlIS1?4*18FF>%-wYB5xU5X&IhH+NF0K=bYq$>C&C9X^%J%F#xy;^5u67;EV%- zp7qp2Z=u>;B3v?0FjWzwx0Zod94Q2DJ@9kVy?hN-x!spuO}1<4))p4lkl9-p?hXOv zvmpnousG!P&*Uq3Y4$cKCyMSbGF`(P!*FrYdSo8;&6`_I?LL}PcxFF3N`yNx9FW=V zo-yf9TfII7w}K-aE?6j8aQNq;H~~-NUh2jzEPE1ZXwJ6Y;=&mhdnBujDBh#yZV4ow z$EV9%)^}P}h-6rRl`)T>abxrc9CofI&&s;Ed9UL~VgdwTm<2rNpXpXtPtk5}K}Y)_ zT=f8{AYkLK{{ULMV_xjd*!0^SZdZTY{I0FezzJd3n&`#tw1pCU%2ae1Qc3Bay*cS# zbz;_fYlv--G6B1A9e$*oeig4Cr)UU{1M_FDK^U!mv2BsfQe3L$+iALFuIwZEvPKR& z@;^UXrn+jve8`&&4lzj^p^2Y%9I!b9?;l+IR@@V!06`r09sd9t=~$%t9Ez%9!)~#m znU@8ybJrYIphSwQ2IkIHzt5#MX?(;90{}q;oc=Hm-AR$`o&k-% z%;fS|*Sf0hSeK>+c(;I;Fyq{HuWf!AN1(3B8JzB}*o}oxphe|rjQHk8CaP-(Cur|m zc5(SyY)OvQZ(uTR#|EoI7s$L-yMdh4kVnGPTLGaGg^56_lKj$Qubu$+aIH9eY%^(=DaBknRb_J?qQI<7v=?oZtKs44a6Q zmN^+kEOI~q4|85~b8!^X-AO2m?6%CM-Y?#|NVyvV@$+Plr+zCBQr7hidhf~ze3uLf z{s6w`rDw!#G~KaT#}l*e9%dMjaf5<5$sNx;=D#t_Ga9&=K4)do1ZLA`Q>Wi(_xG0< zQ(c=~J|G@nF@cwH;ACTOC!jQH64@m4&8Ez^FfRf?*Z>s`gVB$s0O#pkhJ#^kt!ls7 zcF{^Cn+l>vSk-;q&NGZ>o-4P9N0vJ~SZ%H)WsX9nzWN;GWH9+T>y!EPuO70EIs4N2 z8CIpq9($avy@io>zDNwju0)N!e7{_e!=`H0wEiE3yxY+vaZIhV04$Hr2?MSP#!X(o zhT~0`v+kM**cJ>`nb!vZ44tY^;stOKt&0n}Ayx=mB>+4`$CU*8;KvY zAd$-sagm&H_}4J8g@NL1O3jx*Jyi5K>_=j1oVtPqhE`F!OyqI{5JnD8Pp=d`>ojIb zV-J!}IuEGE2>k1^F*0*(VlmY2KF@aqst?G5F~K9#@a}3muMXSAIgd9Qd#C{w0rMDj z+{9pW_*K|e((+Mkl8bkCJca-V?UKg}&r(fxck=jpIO17u-+9^|cs8=dyCdds#AiLa zo}#too!2hL(2ka8Ar6IW9APh~lTuR6#ock&6hwj&wIA}|kD zJxTTB@UE;$rNL!%sektP?bras1$N^rw1&qE-y9mXqH9)H5=>e!B2^((&hfAhlZ6@F z2>abDZdDywaG@8};q81iJnb}rTHzR!45MnCkC!fZ&wOH{g2v+0Y=U%zE3j<2!()tK zbpxe#mR9=KsT8o1E#Tk1OA=1(;{c!3(~pKN(qBJVx$`lBi~t5dokdJkR+lZ)OGIJ$ zHj*h`B#;BMI7R(BoQ(_6C0p+P{+oBYI?wya0bXp)}eQ zu2xvMT#zz){XzU|o|Yx2dojv4TO6j91GO~QBj`EitLr^1 zXr#~o()@w&{+wgHNT-Z3T>T03;=7NrPilp%12Gr}YW$-opeOnN04iiR=G|7-+X$#J z)4zJ0-Q4ZWXk5s+``GsP{cFXYSvPagbz6EIeyFp7lJJv06?HkrpmC?hCsFcEH9yxvJl5nJ#UVTgVc2{{UH? zyGwQ>t}$BsTdfgsoNczFrSScPHz7n|#24pA(fEJ4P4;<)`n?^(H>%*qR-B{xu!nzz>Wp<4vB+^Bjl)AixE!3M8 z#z8^Du;c5`oWT8Dy0RBugdUiEgkoKiz|8@tALB&o(v511W;~wCinD?e! z7Ll53)?!Kd@_773bE;j@lea^xx0=&dwRH0Kk`KyyfzyNE73NYr_Y)G%L5JiRVn7S& zo`c%8Q&iM#1r zNcQY_J*uJ6^pjX7~vjNXKsE{NzZE1jw@GYaNc2FL2TnE>s+$T%3UJ=b+@HeE_LhH}`52398v$?P-Nt!FQYuWV%d zMwt@I`*6p;GoCUyuU;8ceTOsMoiU=h4_k?%5_z%ZlmmhQ-IK@YY9VWD<;>2i3j(24 zI6sfx`t$hKUY&bA{hH5n9v)T6BPDP?_DS_MVk_CyVPo>u7zJhnW)BP3w|{!_(rMWq zOzfSQ?IPv~?k+8mm}Erx8=c!(v4S((IUPl5+ux|XST0uyApmYWSm1hk8p4-Lp58Hg zy;zA#lF7XAyt9BgUVfRSw)<`6#8*;;jKFXJ$lL+oJ=F!1oSa0vM2x6`X=tlZ=dX6%L)L zBI>LZ4UuJc2L*CZP;<_4*0|+O>W+Cw-m-T(txo>t)uD=Tr-mq{csK(%@7U+kxrgx^ zEz9aGjSz*LMp+9IR2%{I>FwUAT40Fi#tv!0|JW<=G)$$av z4o(W7>3u`WQ!JZFLH$6D(ni%$|g%&O8jBy)ne13eBg&Ux!y z)D%?g&r*d5@~&OSTb*w7M?6XtxWD1fBrFC$2{{iEbps zkd};eAy**d(38b<7CDdiTepQB{!)IYrF{h@6{<##lD)=9r-=wNK#_L^+n#!Sx$Es) zcKT>QLV0}k_x)&!3wBYt6f0xykAGU#hwMsD$iO`SKjU3d#6@Ux)~glJSRiJ7fNdim zDCB;1(!l{6M4JN?8)m}IYTu!Sd*xdV8)X*L?;wekqy>M42 zn(DQgcTD8^*CTM;*5G2?j=8RS?`ne0z#P{7%c`p};0&7i{{Y8+GjScorIY3(##iqj zP%&N(d7e|j!CzTDkee#_yT^9cK4cb5yYY?vdiB8euMU(C!jqGqQ^qUu z>h&jGx|QXB>!Hg!aeDJy?jzMlaRZwFwzx zxcx@oY}B_&r(5~4*~C$?xlVfmP65ZSr})-wvOJAF#mr-TU@TksIVYCS{{ULCvgx5v zlIfsnHlNt~*Y>`q>u_yJJ7Wx3j-POXx$Tj=9l5T8{^P`2oJ*@|Go`hl+X_Jv6Ouqa zy)bZdf;-nIEX~}A>~$ZqSbc}e^QM`YJTAG3@Ca7pA2O10U2eOk>lZhfzaCEUyKquNH*@sk88ype zQMR3CvXvy2K3uT*Q$4u{udQ!Rt>dF5rxsd>L2!|x*}Mgp0FVf59)p~psjkyS@a?s& zx=AY#(VT&ur01&S9R3vD5_x7XHmS5Q!+fBHKD}y9OH(&aQX%pVJ%Ij|$x@`H?wwRA zJI6i(nC+aoU*4=5LuFe$IPm9V!VeCRi~dd3^}y z?=P(ogi+SgMQeG4mm`jyc>L<`l+GEK?w}_FsWl8xvR&I;IKoeo0*-`s#coAqG;`sI zD%s96oF8ta{VPdSNL-R|6v*Pkhuf!>7D>azIW&UX>B4(SoXv|I_@+Xk&)o_h&4mVMnK4am{Bu zR?)0dw5;se$r;X1bB=zJIOHN!s^C zs4eZt$5<~X;;cA~<`5XR-|T#mUJ1ob}kbmZRJlg`$t>1NgM z?3CI=8z8rg9mg&dH>n&idgLxIZahfqY%N8>w{<1*7XaX8M^Zg~I@F)q+OOD|C5|Rl zl#>%jvwt4lLBXK4SSR)k6#}*1_Rxq7>%ST}pGGQ?b;vJJ?mu zpBZT4*$XTn6%OQMA-f!&++&ekRF?OTn}krxj6)P7c2|LbJA2}?;MetwRWR7jZDDNg zRaIigCz4w{VHb#4^C@IGd^a`Te}OJ8T{F%Ax7s=KnhP| z$T=OVC~j@-plgYw^6mo$QciaczyJ?=rzWt7Cbzk_X+tOxo!lWE2HgH)xml;aP-O@g zc5p$?J%x12D&)D@=z2AwnpuU$e6e!JCEoxN3FrR+)max071}&fTpi_(aHk3n@n_RK zaBGQxUQ4(AHp~oK9xMLt1?qY?6+1@TSIV=;&{$W zblu3~9+GLKc#QhiMsu4PgW zme@Zp3dD2C$9nPW9U=`L;huGQLVz~(01sSxpL*@>JVW7|3q%426ECM2^y$TNRl_Yg zRn3&0z2jObEF+cWc^Tw!la7|*Y_wP_lSK_h5g zD;}{O-Tl3cLjFM;NURYQ4WS1ZAaFDJ)d;RNtxbHfat6_r;3|RG9(w&N%AWQUAXL1P zZR|IY4%JNajA!w$MbR&96sTDhIN^WY;Bdz^=GCC$?_-_8U7anp{INTE#z(qLBZ9@$ zU}GvWJ90ByqW=JLMiqgLRDw?hv)`KLCYBq^xmg`@%B)D^=3Y1-%y;*v8yic>W0E`9 z0ALIQl0e7Z$m7@2u$&W?rjUZY&aUe62C{Js%l3k&?woU+0fEjdjj~Q|n#cgc8Xf81Rv}`AZ(`4@`9WR@7=+Zl_dje^QcM%M7g*qOd^4 zhTs?50D^ew$rWMmFKr=7CsIRa3_v;cBX{Y=UD5)LMWRu#BV!~S=N$e6>zeL}(=K68 zHU)Q4!8{*N&bHBDEblY-Ra1fZ^sUQXZaLR%a4>A;j_~qT7mTk;y!jsQh*MmF0@aC#p#xaH< z^Uoj3wBS|VHYii=UX?p^GP&%20t7cT>k{(WuPN{^-ZkoyaBFF9OEHyq>I!`gY}_2J zWm}BiZmm0m#SBFdK5Dk6t1nuxC};uwDhE7brDM%iknm}sGH;&(xD9%3kUzMt(*65a zn)sSxurRKQXb&IMUvshbu4drow{IAd#=P&vej>KeY~+=SwEJ?1Pp?9Ks~mPZo*tX2 zRoNjoWpkhL#-Ep+T~CQTPkVoK zmp6y^_a(fre&AKW%C2+ykJ7js+xyAomOQf+T=EVwc)<1MzaP#sDPrsUbga613Kx`a zja0k-&9+CG*UxYe0nX#!sP*qvt`N_08|w2s?I}^PbR6fL9@SkmtBc4P-x0`1%*--C z9f0e}{HsCjX14j-K+{A5`i_0NSDfVgnbU+8ol4Fubg2!--Na-dRpc+?GhFtkuC2^1 zJhQ3=IaDJnjC48RoPHHX>RHUcX_iJ*V}MxYLGO(7)7r3a1fhtw+jSo-RTHl~9({h5 zwK!XIMNMzig6mYayPs?khR8oEMOBSNm&&1C`j;HS&y>ClsRLxgYtlO zk}-_(J^S-jJU<1zjptgA-bh$BNKiu#Hj|uS^&oNjR)xifnF7NDp;Vi;jLSiWA%w*#HWu1NgoPU^@`n@dA#PwdMm+E;wcgk{-|GtiC41J|5aJtd9Q z?Id@F69aOE!vIO+r>Cb%)YBnr$m6)TjaU+Q?!ixN1DxmHrqt}FwP@N@Mvg^d0&Z=m zk+hCM81IT9G@Y(*Z5lc{^Hi+(kVQGNC1Jkvf1E& zhIpt=pGhteRpo3feLw>kKE|VDxwX1mn^_fANIRRZ?neQ#xb4WUs*jpBMijR@Sgo(7 zfuWj5OS3A1u0h^M%An(ciel&z>Ow@j5u!p%H1U(T?dkG?-=;IgbDDOou1zpjyk&jB zl5nZ=pO>Dyz3aENycRaQC6%T7S=x4tt`9t540G#VV^t#cdYY?UHm-8B!8L+?j6PiQ z2sr>AK*8rCxeJM8^N#l;I3N9b^n;>iX|D{WGs20|1J<_| z`WuK#5KDoaap~1b{cC`?h6(pPrMH2}#_ap@d8^kT#(+&DI6wx)`AJjTjww!pW0Z8e zI+>(ph)S;-qi2$P4#&SgTEP!%GCM{iSu#o5NfKa&?@negr}(? zSQGs!oRhOLMA{if?oL7cK2)&{BK)wvF~k$E|A>Q{lsVh%yC zE}kArM_lSA^fwp8DpTxng5pig&J6dH@D{ z^UZN>t1*%}jAMEo!1U)ngNje>d-A?q4dp4_kiGYEbDY;xqOYPdl~>T~PO{ff7kMN@ zw;m7Hk*Vp1wx=Q_1%wtyyu$ajFOe|$t8UcwMI+G*?zz0(zT2d z*gd1@XicgXW-!LF20wLkf$TC#{c1M9x^i~0PI54MXRxYb7S2iM&{gQ|uoawOA5U)P zovpDqeMp)~rf)JvEXVW|HuKAqwR%>An|M_3A&xo@wUspOFx!kT`RhVJ^b%=<;bV3n zj&sFVR3@|{w^BgHK{clZv`RS}$<9W53T{>~@i!GoIDA2AMJ}a)4B4gV!K> z3e>Z=9$1bb9i(6edWAh}(lb)Yi?DS`b&4hBpu^Di`C>vWmH{t7Antp{n!uij3l9f||9smS-W8SfCJjm`= zE8MI_fR%?V2Yh?f&Z3%W6-rB?+3I5M<|yZkoDt>6$-r*sxzBp!u5_7Vf0e^B6ICqA2$Wtp1&7K4D&YZ+*7MX8%GwtBXnWNqj7z~d*T z2om za!y7MUs{Oa?)i}|taCP;&@$IQxik`DwN_4VSZLT+1VOpQC3 zwH-PiBTRV+LO5jOJ$OBH{VML89sI5@t`%-qAdtYT79Y@6X7LMZiXcgRsE`6s<0PoY zS%>*GX5UZ|2&08ab;nckVD&$ha`H~rwx*H3=Kh^E?XBWH^okcak=ued9zJY!9FBRe ziYYAS)S|qynH@>Qn}$dOCm`q3wR3mxs!WJuStW(IB^WS0G1TV({{Twe(dUW=TUNGo zlRKL%tgIY$8*)83u3Ge!xsjDvCa=`pKiTZ^M!0yVlc-`zgLK{F5)86F(iNnG5jNN z_6O3lr%F-zl}&Q!bT+J}{FZz$&T>vNE2M&OddLV~=ia!O?ro*G{oC$xNI5+ZQY)gj zlqnIecMb^e&3Y86HEj;I*F&YSo6E`Hk`8#}V?C>}hH%>qM=Ut!uh9E{TJk98xJ%b4 z4l)5P&;mH=T~&^x*iif^BRT0`Lx!Z%Ay3ToNH;?#n(^USWbNl3wb5A22v?7c4C1gY zCW_+RAslrCA6}f+t*{8ga9&4GQ}wT2l&`sb=T&Egmwq#nPtvZdGfo}22b@(qMk58d z9eaLRsZv#va>U?vt)UG}DxDfgHiPScD@xz4?wzX!Cf|-RTlU*P9CPnowButIdnbUo z1lOs^Ys-8J+mxP_>9QUwr?~}tVf)rXCTgP%9aG zM=*`&kQ4wL0G@X8c(1MeM?KU!MeMd;>(Uo!{{Y?tiuj5<$!}wMC%Q7Bxx>VxGCZ=3 zXE@+}PCcvOIO7*?N-b%))w-EcOJ#j^By!rUW?6QNiCxeDPSD$O%A*|St>4(_Nh90o z(n)&J#zROOYE|{&(L(K zqw}9^(n&5Em3jaZoF2TK3<2K1JB`~`Ip(=O6EMKc3=0mwN7et)k~r z9M(gw^$1@xHL|OP`AQF$)8z!@lk3v6qn;Qpgo=F0WgAoye~1i%bN+a$vR+RFu}5!@ zCC=l8$8E0oK?yMeepPb9LcMXoBRwmBSy#JqB$6piM6UE8nRhmMC{9l}&tNfJmByo~ zB!>}%4S=k8TzCF|TH1=bx+{H2ZW`H(z`wYePFXTbgNz)W*gcJ4OJdI{c5TVjkaB(a z&ObWbfo@k&o=dH)9GtrYkC(1E{Hd{AqRl8pV9(BRgU=s9^sVD{u@~0EG~Gh)PKsEd zXOS>SP5>nR(Bsg1R?U{Ns2IpF#EwYbMotDe#!f#9!t+tDW!X4KZ-nl#S^C$t41@XoUTc0riY)-nF0>Z-A{b+{{ZU6K1*u}nV}G! z>@&$3$6RFQq1!Ug9g+efWehS1+z$tnj)$+kOJ|_oUtCZ2X=9E!6a?Pj?j-Ut+XLGa zoYl2roLz~u=pc?@W<-<9IZ>UM!I-740tS~ft!(;eJV8kB4=clbu zSF_ZvOBDf}<6{scZ2BI9@~&#-p^Ew&bi~UW0m_^(&rA%HS`7`fv;CeQ-OT6YUD#Y_ zkVZ4g^cl~sU$RMPO+{ty?rPit9M=%XW|k=GQ0|a2f(Yfg>5ArWV6?MiG+B4r01|-6 z0qcw$eLX(4x2WkIZHDhJ3PUP86*(ZO$vGYQ;;{EBmuT5!2-pM55rVnyqto85lx?Za zP7j!-o@{W&6?Z9O7=zDI)})!7%_1L~Je+~gOnx=BC5tM-ELa;P=%f|Hb;+sj2Agdo zvd1fuatvS`G5k7zoL2tT9Wt$BkG4~&T={UtBV>YhkUDykSEjX+D8_ROE0L1f!5+gI z{#CW-$kIE*`-K=AV^!WW&&;^*+ksh^u}$`c^JQCsw;y+A!g~?@YE@gi7<*o4VUj44 z0VGlx&qBd4Mmh>+oBa9Zng6sEl#(UCChA7M9Do!^Zne?u_X;9nSw6_j04@Img<+osAM=GO?W3_7< zyxS6{x){#U@DTCzq>MY0gsX`JJIGeZ^);moD68`Z-Ok(&57MFZML zKtFq&b|3z_+k(>i`@<(chjHsoA+rWcS*2~!q1p#; z>FZ8gIg(E;kg3mpKU&b!+U76{uM5e@&V4zql`bsifJHfFE=EVx@z14WDy=&l&TTsx z7Iu#qQ!HDQ{pR#J#(CqdU%N!KD<~<~2P6i`;{;Zwi7%H6SY62Fx3EGKI3Rj~&mQ$- zP}1PuOrf_7agxIw`&L}?a>17Q2-Y{{VRL z^~F@P(v|?T$e~>H#&{j6_cpUUt1i}P&l|Ckn%a_$?san67OdlzD^P8uF>W1mwy#=8 zB%I9QiRxJLYAGTT?2IW?P&1w~YQzxC^WriNzzFAoikT~>XrfxyO@uK;3|mCKNs$51 z`yl#NYbY-~r({$yj=OL+4l3zV_Yy$W zn8>4JE;j^I!3;N5hOr}Pw|%44x>d+GNX7P*B=M1qRn)e2jX;npfziE9b^9w=EK{}B zbqiE6TSfMS$!1-fhDSq_K-6`a6mGMKDw0VEppt!YQoN?rvoq~c)DGv{6xO(xg&UV1 zoPKqa!(HrysKkG@A@d4kAGmmKf5NOm;p2R+@gjV@Duc++e0Qd1N5MSv*Bt#bT3%zq zqG-Nc3=pU0B%e`9&}mt2DKi=TF{{G=0K18h<2(b%J;hwM@cy$HBkbZ;7~_y@ywbHx zwmwS8WkLbWgpv=S9+lTNsPM#0U@q;dMl-t}wdU8vSA*qg7`G+R@>^RyM#Y*(fLemR zWLyFX^}wyvD(({sSpnsKUbWkeM&{@hp4k}*JJEn4v&MhVO2oLZ(XQf>NZ@;E7duKZ z=m$(1BRNZ1NA5PKIVGwq+hnA1hA0$aijt+WbB?vGXLALe!b>R;viyTQx1brWehXMa z6lZs0!vu7|@Aa)4OK9P{N#!1FYH$fZGLm}Z>rK#3>gPrDI(uK(EOkb`ewTAGX6lr&*fb8^sXwz z05aU}uByj{>Bc{mbV3?fwKjJ4_g3)&s!n~m`c&4X-M(z+)YcuCVT_Iqbe4;P3FKF8 zI(8JD&8u)lZ`<}Yjc&XHTQ>cxtt|}OJ)gj%5t{Uw8ob-VTt-cLB#p&oHK>Jz;+-j| zqc{{hb4Uj(Q_Emf0MfH@Kn-@RySXr@nzrv)*OG8*;DG0~2{5GQymwxZwC#glrK!pS zAEkK?tox>8T@lz1Kk*E<5ouRPRFz=0WMp<@frc3CUl2tCTHHvsix`ED5=j9#=Y|0A zPal?sYdq7`uBOfWm!H}FB&#UQcf~@XMwZ=anBj7$F*%d8Ln6d z$v7C?eKIq&Zb55pGRZj9U%Z51gBiHdY^EBdl zwJW>Zp_w)7-p3V`gL1~%05;s7nfXB{kEKA6MBvMEfuaK&Sn?AX&U)nk0Q%~LF(eVq zaOLDqa-i)DM?sut1HVc}n9SaLkP(Ik2I9T8j^B-SEeM=u-R>liX>r@6;?!d+TC@Jpd=Bc=fHxuT^8rC32&y_CEgrPsX$Fbrpu(8&wWbK$cvR z6pq*&@kDuwgr=5(|%#Ox-ez~Do0_?J^AF*yWBWo_PJ&rv9$1qgqMYPoZhs$cyw>xQOH^wqBb(HOajT`f?Gj^! zIq%!OZfIZH8eA7QqVs>6K16K3<{0B4PVFE-+1yrXc)3F}m^-Cj8ug!uzx22;Z0vF%mtZ2Y@}aQ6)yn_FT=o6F}J-f%`S&1cJN z$%N8Q?BH)+;KFV$1AOFw$Iu>l$MvgGTgz({PWORThVdIjL+(6x#y+|HD}Pkqt0KxR zl0BV5Zb5(u+s*=m?VOy^X8vW?sX07#<`gUQcx^sbleO4mgsb0nBz8J8I#atS>S zIvi9N(COb~i+!w;2)HT&U=z+s>OTxtj<-f(Ql-|VJC~eE6s5_C?NkT~f-&Ep!l1W` z7#GOAWQ|TpljCnl{< z6D=*2YGh-#bMgi|1>+{LXNT>CZ}eg8qzSo#9B^UkhF5iAOv}WL%EnL zqbT<|IUdH6>RVQQ(Z<0}0-i}Hu_wJc^brh7{@jt5(&b#59H zkQDaXbMIXdirm_rJsUa47g9KA*hUqIJnbVr+t5@~CDLt>Or{WVxq1AJX||gfmPL79 zb;!;J6c6i)uW_fx9F6u;e6k0pU!`^ERIKzr|JMAMw9{93x|ZGN-<^tp{{Sol_}5S2 z?+D9(;%FgdKqDcfS36ko!1=H->t5yI?}vJS_7sxq=2&GQJYpTl@wfL^C+9w-3}U$( z+Z#(=4(n3*g>$hZltXI^TlsfPtAGe37bCeKaqC|jQy!c57xm9NkTx+`xdpDkqJ z_LG9S93J?tXZuS3086-M*Y4foeXkUrS&w#4A%Gv1I{wc~`%>!DUHK8pGSTn(q*Km) zy~TPlZuZ>9N_#U^JVOo0y3+L9V5*~NF#w@^boH!};e9?UCv8sS&3C~&K2wiQPbROM z+)B3kMVPm?Y+_>`cOR+i^{W?G1NTu}NdnA4`=Bv6^#`SEcWWkT4`~&l%iH*N-tfJx zm6&^QqzB}lo%#;FsWp!c*}(Cn5V{wQJi)oy9^2r`J{vVdoIPdL4Hu(%#KX*7CPaeH1 zO?90{*#x)B6q2FW?;*gzJ3;O%tiOu=+It-~Ho0B?WO>3ysKAB*oMhzjj&sd<`O>bE zPqSYm#WsFqcEf)^PLPt`g=fEI%nZ<0iEs z+Z%*X7v%s9bg9g7mK#TY^`taq@~agsBK^sjI?t|sh5K9d27=WrKcILPhOHO|EVaCT(IzbL}w zekQsf5xjC*!y2<3k_v;2f!{UH4Mf?M^0Gh}mtRkWW!st9u(OZ|Zsx^{-Ah*_vFbZ*OJ`Niq{6XXg9CN3J;J zRoE<}TaCaORfo&io;sX#6xhg@aYM#>cC8ywOk|IgvH7#e8UB@=p=Q~cm@Tfg8BBH% zZUcsHh!51(x_D<%0TCog9QADctMtumTgb7;gnyg>6?!dfY0A+jn|L~op60nLRj9RJ zBu(`)EOeRl=}+3*dw_5UBL}!XwJqJ?+m-+h26|TZp?4h6#pQgs>N?{ctDR$NCrMRI z&ZOiLLjD>2Xr!ej%~5JvdTLJ`^SRoNS-RsK;CA-<)}&aN6i4}(bKQPp`R21xUR<;+ z(a3#>1TSpvKlYS{ri%`Ehpj?fx~d3})s)v4TMAamO901+2S{6uw47eR|gAq8tK7(lL=;iEDIA zF4@~PzSYh%*1C%@H(+sDaKXF)exTO%v>{Jin(L<#*xa=lQIUc;t=n(B0j!I3#@us( zTaY0HG4EZJ&|KPs82Z-Dxacb?*?6t1amcOWN$dU&7>c{u8;an3C~*)K+&ght&u}JI z;*px6R^o!y07@E|?NRQiUH$5mKy0sAHzaOv zZ7u$xD2_Gho~(NSJDT-vW^l2wCjbnRc*T4f@!M33#F~bi2ZwI{)Amy$&9IXimLPS; z+>CVwyqxngpb|mB>OedWKb>VyYEI?(@_Q08 zea1Kg-nO*~yoni<mlIaf-o=?dV|*| zIjwy+P1Lm*r*#4bhyt(14%5Q?pq&0*{i~v+l3dbfOrY(#ui?u}&k)+(*;(mAdwW6= zoJa$-j5rFrNg&|#{PSF1rv{m%>UWwZsT|Qs0v|8S$wruhcYhE#JaRhm+M~HXA=2cI z>Kmyn6;I0H$PE6c*Es1;(O`XlQUc=a&i9fH^Us9{!0u-Z$A#v%Vc3kqQMr&RsolSSJ8MKb6vKZ zrCLXBmv^`kMe_hS86EICS7|<<;l$FcnrK8<8&}R;kidX9bjUd)k80ukQF#nja>EOd zF@;iceaF3HPP40}nm4DbQ%P-mtvSr(yrFkRY+*aO) z1a}s4-Ay*w(<%>`0m$c-$2sYeYn+oiiStUR`>h{U{{R~Arjm9?OdgFIc2Xtu4sNbQ zKzy#Ot+~5{gdKq7n%-zEte4EUUo0kAE6QDQjt|Myk?276tT-hRT5hzB(rFjVIF|^_ zPB*V*BaR6*zZ)}Ko6y2-z`_}nx)KK}7%4gJfN@#YpI(P0db-%+?=?xHxG^-QWpxc9 zbSQZ06?YNSy=>ZC+4(okP?F+6*Kxu2or-fl=43eO!6%Vf z`fRdWKlYEx+rIj3+@KMTRA6!W(_*oOoti0U^8}o1INYt<2X8s{t1-N`USi(L;el<8 z;4uTB$>;e}ak9{fZW7$;69zU6#S4xfY1lgF+uIeJb!n=3!%z^;z>o{5#~nCqV>PX7 z;(NOW^6stSo)=)FaskKpy>JG2TF zmvfYsTfec!SIdR={{XC%?RG)Xf=?`Z98?i$SLK7SSc?$dat=F;X1a-!PVpSxbcbda z5$-tY*8{Ex z4k_KmXVGcgG(Qd^M=vM|EQy(Ow1H9sFc20msYb zvX7Io{Hw$DYwbtHLfc9=A8uQ6@-))CfQ7JeI(xA2Q{P5jFty_6U zMBK32fRSzHXaG(uL5%v;=s z3R~u89B>a5`)Q-Rfzmd3Vj!1!f|cCeKpjWaR_eR9hYY7H+|&nG(&5wW8by}bc5Y(6 zxcsr6K9!NEXwNnL*9Jnb^0Aj|6%2UBc_W`{?QFEz?(AW)W-^6cfrlIpq?70X=Bz~& zdUKdAuAyE6DigTz*Pc83R&Nodj*7K!vN)8|HOnccl#J~z%+d^uf$Ps1`c|Ho71~-Q zz)(zwDx)d~UYS0(YYIlgV`mv2fOC&? zSi)=9q8KbLoT018opheSW!$T|8}(rY%h z$m<|muQPE}3^s-X7$jhWn$+=U!-Rotwa*J#8zsKBj!EqIG^~@k%Xwss`FI)0JanZ` z69qJ<)aa#8nmo;(YjZl!D!QV>>4gi^*BxnJ?B$1?b`CjHnxuZyadIG@H;9aZxOD@r zE2Y!yBE7nX#Rtqu$1nsdPq_gf<>(AzkW62iMZN3ChN*sfAVN z9@W_B{t42az+8*tAUTa>3A8I?soXM0(;X|B)hu4lX0;fVm%8!@^{+1vh>c0Ko~V@< z&8fs{DIB*XFx&!$!1X<S?jmp@?T{hZDP7c6-5Gl=~T`L0Zt(e4x_bi0^f^rC}qTbc`ySC2mjsYCfPD$CKZqc7?o7`b@ zk-_w>$u1fwJhQ`AVVi4=bv0rbrS45 z7in)w*pAtL>JETrhJ$Zk;r8oS$T<{ai0z0u`Eq$azSXA;R+6~iAOL4^C!xo0%AFOz zoPe@|Gtk#bV`UcQ55d9C0LLVIRL-7)xse3OTnkZ@7|wp3bK13`w~b4p{J`|BxpE@% zqB{e63}ku?RaCb50agH?Zbm9;QogMXH)2gb*J1_)sLnvoZ))mv`Q=%RQVpOEq4gV5){Nyy*gm_t9WHAwugSEn>3qMlnvzZk4l*#-}A~zWdj)F z@HMt<;*19St};eF=~@OQ4&0uby=$ut70YCTF6gXcA1F{ut~z_w+jt0cTy@V;R&C+| znV4ks{OeZM)0I1U&399xGR3{KE&&4-xoZH(BvT`^DTZE#w(TvLjGF3&E~7+{LFhWx zv=Mq!p|>Lys|~oV)I>hrIIWv>Gm29w|yes%Xb`e zKiS#8b%_4}g_r}A$Kmy_j^dp5t`)`eE}(RfglG4X zHc4!M4sd>6!vJLQTx8nR(p&waHDPkYDxP3>0E~=#{VPfftup@h71bp^UjuXHf+t=8 zI5_R;_*O9uERxAPsfZ~>ILnB~P#50=n&qV)R?{limoq6X&GqaqXM+gCVmC(J#{(mt zPuHpGRZb6V;`>-M)zE_w2mtXnTqZNAs8W#@dMO-I0l;^h;+C% z4N@mmVo7)EdS}z0#=4a{osO8&(&(B?>$a6GVL+(e8bgqA)bZEb(z%Pj?a2fEgHVzr zgtCt*%B~-Sz(1j`w@tUW)Wf)P$me>ZV6(99MmXc}sux#RSCS%}eVzi_Bf{i1d*cL< zdz|t3R`8XOa@iccg@&c(#U8bIZtav37D4k23=f!$ajk!@0!w)`(W%%Z5D?P203EJ_qJx4n)N&6s z=GMejjYW7}c@-GGr;14}wbWm1l1VbIR#_5bhahEB9-X>-*BPkZNqaNSijkl=Wgjp? zcmw9-=N&~Rry*@h;KaC$yAV2olfc36j=WV1B(#=UV!W0>N|s+N3^NbAxRxKLO?!}s zFTBpE!rGNC?ON5ENZ-s^IT#oiBlM{3;cM9>wZf(b?fE-#*n?Zp+Wn;_hEe6l&$G=r z032itlY!4YJ*w5crjV8iG#L_@d=#_`*%nuXG!%z#G4kEsQL0O#w{vw~Wj(uK8?I-d*b7h2Tj+Sfyd;uzb^nqU?^ zt^7n2&VOF@y?xVFZO!EK24D{`-^f2JQ2a^&3Vq4CDZPN)??4Rk-=8lq{hrM zz-+G_YqQn7XQ^F5Fh*QS%av{X$6nkESfiT z`_5x%rd%IPet*uf7H_vlx_eS(m05@?Kp5yj;5J8Uewk4g5JmVSss+&u5v0hm66ksXF4l{wr`8B**GKzW{ z`ag(nt!&|s%Z4+%hjGr{zbG7mTOZo3aHiVk_)6%fC$P^#Gmb#xAlD)nP#`2~zc9-( ze84H^8SRRsH2B*qGq86&eA(dpR&v9^yOSz?ncWgv-3OXv5s-fS7C86LagSWqUZU-B z0k>G1K*K8QhG19SvU#p@#a_l&F%cq@u&WXXKDZq5TNm=Qa*K#Rbi8g~zt4WXs#PmJ z0Vi$DJ7>8pg8tax9yn0nUMp!?Sd5@5a4^U>dgqSx=;DQ&%vlVM-n?KQz|`_fDvXc2 z)2}tFmD$jvrlLE81@6bXGd)rCf%4q ze(}lRdw+#OW0LmiXY&_&fU-H`lfi6suMVuco`mJgrOt|L#*B$~aM{7edw*J3WqZBR z1|eIFb^u`H)2$;5FO_w*%uY$&(2m~6+K0M}%~VF&8!-d`TkiUkisik&6DhU2XC-kZ z)HR?jHjtnfKSu|TNe5>^4+^?J8j^I{Dc5xK7asvA5-aDmaSqK>N46CX}KqZ zwTD6h?t9fksDut?Aj)uY{d*dx%_U`_glXL9A-tPTwuTD_@?F`Toq{pvzD9rg)o90| z%I|-6KHqmHJjMtKJ;nub66#mV=p@>xKw!znG0|CS@=tWvdwkOxn8SH-fDT7oFb9go zLe2S{a!*qrEyR1Z3;P)%jIqK9&J+(*&*&>k7t{3%QqJsx;|shaVN`n_PZfIaR@Ja1 z)8K|c6e!%^bwKE+?&IF6iM7k?c(n^whSpHuGDtE(2k$O3#yxRW%PxrLS6xcX?WOLu zo*?#<0kx!rfW-I14!l<-s%t(Zid0x`8aPo&Kpc>ZoZxes?IDGxW_@!}^OHNwXuD$v z>&gC=akT9rpY6cL=_=t>xlxZ^`NwfYVxsSBn|)<-#9s+(|58?%KmAaP&Np zTl#jW_PLMjsD-0tIC2XNglmiJ1#!2wgt0|0T3;0*iqsWjOZ?HI&< zW^5CXxERZ0wh6A7$}(CS!d%MnJ1rAdxrP*;-HSmfU{E#!?eE7^>7QD-mvhI7CQ`Ap zF-+~;LmsR^>r^z$IJFXOxQQSe_a=6L4>`yjl5x_LThe8*o+gI&JI9ej@OeSlV3Wmd zf}Iyu)8EkVjYk)#@FkwvPnGH*95WWzA^s*)~yK6_&}Eo%g-a$ zpJfoc9#SHzs2lRZLgTM3fnP%|N^-k;6)SFK+n<%0xC5NxwWEx@OK}o0LVorRGx*lk zy{(i{{jO6SaiGEHeos9y!8NgM6`rLdeUdYk0QqQ~@zmsc``4dWJ);>V(28y^L!P&{ zQb{X@>FHYWzz87y(TdvC?PAjw7>p%mmuqbxtE_!Dob&|#eX7h_rj@99&_`<=b0^9~ zWtCNz9PMnLf3I42UL&VcYAQDK2Rl1G40N|L513$|(yc{iSISmg?&h{(v)?3X+o`R| zAb>}mg9^A9>P>s{n!2{6xoUH1ZmI(Tlir~=Z2W3UJyJjw?+q zeXBwk0L5O8+!85)pB=HsO1%}i1RPcP?ds%q{U`%pO8FNCxBSeC!nIwMq_0rG&{yJ- zo~sAqn$l9R5<7~eEST>~@^!5Fr8we%K4<-_br-kzX7<+CQNwXP#9Jd^Nt7weCOz<&PZ_+JBAz^J3N0k^L zmjD7u>PKEVuU_!4gkg_QTWwH7D1u1~7iD9do^#Z%>Ax=kAv}ZZzy?Mpm z{;Zb`_FJhT4#R0vjz{I5IOmUAijH+Ild;h#IVEG%FD1IuM1ARjhS~QGIuLr zbM4J}zQ3u>a?SQ!1B|Mv%ObaCW1OBkn&z*(MXOoKBvIT68c7Qk2pAGZ+zzJbBvrG0j|p3 z5v{?VW_CMOK-#ESa?2P89Cpvwt$E#)mq_x+^2U+v0k;vf?)C?89;5?aiX`*lk~?^z zmh#CAbW6*PQd*e8=C4K8 z+b}f?o0;zf!PxFqUzd-&=r|vZ4XI!lne!>-z0F;@jUb7)xXPSmhamfB+LHRuN7O8q z>IG;eLACkHH@-;zRn-|sM91wETF}g0L8ICyh+QfvU954poaAG_x$RmBs>uqBpoUR*Um10H|1CzIt+fPY3d-^$~p>*E)5?4jv83Dl-IR z0P+TLj(c({u+(quWOHWWeAWkX<@YZ+2d@=RR%^{i%D$0=mkj52?7K1l0Ba=VRjPKC z?o~vsbS=$;G4mmfm^$Iqw>juAPXqzminVj6r;>~1v}yoWD07d%9DZC>w-z^=TMLL& zF5@gPO9VV;2l{lUw74E)#$ku`N&)AZ5M9Sb`|p zHuDn(O#Ilvr9uL#&Q-p~SKuAnv z%(!Nf*_J5?VmTo|CmHL_K1denn8xg4Cm;|96!%!9aAS}u;Q`6X^#-X;r(Yt;By+Ru z<6vGns)|kSX%`)j|Iz%UK@vQkVu_m|0_2dS`yT%1@~XFY8+@RlL{c$UAbi9hKz~ju z^g2xU5KBB#h>C)DzDgDZWCrM10yrOyR+%JdPu zw?jPyr9MjA5TwD(yVIs zud8X0>KZ+(H0n0C08OBtst*m%RtG+#6|`w9b{eG?uVd6!{@YW!R`S5NVoS4efDSl3 z_wQQvZDn^L-meNH{KbOqUCvr)?n=+0G3Tcf%cYaGsj)W$ROfVogl0~rJKspGS?o>owi zfhzot*^fBtco{t7rFlKih;=o+4s4@|fs|i-_7S-yK1ytebkXE@#ymAKD9JAcor4 zBafMZ8P8$ftl8XO-9?C+OK~F|tr5=L9*lBMG3!_}-(On+ERjawj);D2pLXYgTe^mW zf3Da-`cyton!ap^W-jGNU_Mno*yFLTSzG=ItZvqaLnf#$E*4=cYIlHuUwL4=2ORzb zJ!=zLmTgMz-guM9Xj5iS4p{O@AmD+}XPkAco0(g~_dZRWQW&5Ec2p8L{_x|P?)2Fn z{{ToDro$XAN;!NEd*hscTEgDX-ZXNOec9IdBUq8p>DqzJs1z%coaf%XGIWIk2Ox}t z>0ds>saip++342uwTj#p{o|k9s0v6Wi8%SYV!a1W@d3LQwwG=aRry%;4UPsg>t92b zW=ai0r^wO`wRs+;1Pg)hTJgfAa7S)4S=R0m6qAwb{VP%?%7z@DO8Y8tPg6@1!h?lw zq*a*h02$6e=8&h|ZYxR%!5PmTE4rLIl#oMg3F5S3vnQ4uQ{s? z#cRQ6x#p}|+j){I15)TSw~9Dxs-%B&*4>Jq<&TWeum-?iCU}|59WE3GpYOEdyo%IU{{V^ zX&O8i5?o#pb9V$xn_^+Lk5w5^cVnT)e*J6JzB$G&{vzv8IuAAO4nQP;zWDEo@!4$k zD_E3UsEr58q&sfLcqbm-wfM#+<-y^%6UGg29wFE7E*@wtkz!2zsTjh6o(9p!Pio=r z+DH)G8QH<=22KdU)2S)k)*;X=K?${le`Y+x3AK%CDs)z zj8LFqZo7zMka))9UVA<0EEjOc9r^b-?NQsv&m@K?KRS`r@&N}J@6C3{29a)@R;bdG zU7kU@-cPek#|O&?{{XGlyZc>5zQ|6p*#a~XfP;*Ut(EK31oZW;Lr03%#aC3mQX`jq zCm@Fbl!wMZ9CXfdGh0w!={J`#+Duv)Pb|!HoN`WjWct=Mprq`PGxxMGo=9Vx?NEcg z-vLVXVtBwO1NqluqC=(It@XaQWcSx)0d;9|VpxU_>}LdLziP@&FH4eNH7AltLvF*b z-UrtQ)20tSt6n`pw5vOvDhV7|UK8db#*El*qj4Dn+jc8FMdYm$DviAn%j(w}Q-N`7 z35GeG3=_P^x9=}F&UnRG(&tNyX0x}AS_E?7;F(T(f}nHl=xe5%Ptv}|w)$|TCjv(?OwFWSUBMsLogN}QaJXETdTbX-XT_#es@OGndF5N~g$ispg z9r}~$^s3F_{{ZZ9EJ9&L$#0Q>UZ3601_AZ1hUZMY`#Wkd+&Nfe^7j?o8J{N# z7z2z9@t!NK@T__^og%isIzwre#F9i=9m;&8<;LE9>sN`<%XjE?VCeJDmZP!e(Vr5f z&9S_e5(wH= zY(n7<6@cxyW2Y67s808>`O=0!{I<;CoM#=g_32d?IW=@dpp=@6x+b`S+G$ebK3hh+ zKpYir{{WBUO}g@f&X=b<0NR?<$_)ZdY-GWM5$Y^c&)%jGaTBWZ?79=SOKkHC6*V%!+7iOkzfg_K8( z0DVpf{#?{o5y|Fmgu0;w3=T$pbDz?lsP^&(=&ZjsRCAxNf2CDA9IeT{%Dk;-p?ccc z?Ucnk21y5|2=wQ@M$iaVSMNSHk+hIKKR&gZ(oYW220=WjZpXMI^{TAbw@@nFxnMGR zI63+bwW5x?3EAvgZNha)9ZG;m&mVOO|7Tgh_U zFvzAgyu$4$AWir*c|>9n*wTb*-fXi zm_+M^XKm789<8&0J7YPnk|OYVu*#$jz^TU>2R&<()c)Bu(#IT=+d!ML!1DO#ra1zM zHEWqp4(jGA-D(~fyIV=^)^@n!2hK_71AtFC?V8`#rSS!{am}d$w$ft_w0V1nQJ%++ zm9Jx?Y1g(daWV5&D4EMK5y-jca)7x6ejL@iSX*DQI(6i7U)c4_58iGaw{SD>T(22v zEj>Y~?vFQpPB@wUKJwW4maUH|hy_ofC!qGPN83VQ1GsaYHNT55 ztlB$!iDKQ)5`YV1J_R{HX1Jbpc!!atH^f89n>tR{R1fUK?n~ zn6XIBl+M)1;|CqSxb&>l9j=+%hPA9g;mD$j)nUFt6}g?2CR`BCK@2*QMse8UyFF&| z=j@icVm|mIZ!dWn2O|V7eNX3H)}8A@qQMb(HZahCWm#ACMYXtuQXE8EM^DfSW<7Z<`)O0vvmegFq z=cJ%G0e}FWIRm)2_;x`wvP7|5!5Ie+BL>`kJ+a!hw5tG?PkRG>o>DOx|z45slCd0Ze7=RFDu^i_=0N6k$zpZBYmKiO=K9N3Eg6vl$20g*+ zkberwaBtl>=ZA3BvvfK+X)UJmm?#Z~XGh)FD%l4aLsFb3d;?XK3^C+huzvq z;{YCqtz!yJBB?@F=COZN5KUl{75M`O9B$i_k~qgh(28x0#z^B2da*kF$8a{12k3J`wN9A^=pISdD{_||jL zsJ@{T`Sj{_`ZvTWtzKO|{!clf2M-$VQlRp4o=@|xlS}b(YQ>_9c4jED27QQ%PbFC3 zgN}e_y?pHoeD8e?!dgWpV_nO_;Porq5z`g)N5icf!rm^H4K{HUu(-{@To~KP8O&gf z$j#F`{d4J6UmsS2_hinRRb%ZrZhHowbq(6G!6HntlAy-h)Z^d1bXNV1P>Wi#(Brq# zG#xo?(mWK72=_P5kVs$xf~pR3PEKm%I)oa8QC!=Av6Wn4u6?t#^*v2}&PAR&%~GhW z?Y+-Jz7;`pCT_uu*2DxF$FPR#_GPxRh3+P9i5i!4j<^exT483le=6Y`C&qTTDsl8R z^^vU(ZdP5u6O2z47nqQ#Gx$y0mh}q_ab~vH3{B0IOq$>)+C< zK_I{+)}_7-{%%7CMcP5nUVp;8{6<|$b@`_CeLW7EbDE~@vA+ONpeO(e)r%ZfJThXm zBbVO2dT~))sc3UMvtl~G9x=^hM=15H(aXnL)JEJy3cnjxBu|n#txI*5PxWeBxAm;* zRg4wn>}d*8m%^@UqYAi>SjjNR8Ro7enC70H2x+a&b$XmXYRXxC>UEdikc}<$d6i4e z_oK}Bpe$T6R4*qSRdV91$u0&-z@TXkl3umRc$>sF9vspxwS7(YTiYRaKEA`;_N|HT zk#_>zu5v5HelcEYdbfi09VTf3GN34OF}+SjaOOC#Wkyp@&e|1Y;O`!Iy5C;$E%vi* z3dJS#g^xHPM?s&aGHZxcp622=hE0M750Wyt9Oo6<%i*mzO|;u^=0oz9z3@r~ofEr~vGa?WNs40DWa#xaa{ z?b5v7=J?BS(%t#8M=1fHaKIy=Amg0Yag)zUHcawM8NmebM>rX%H0TyfuP$)X4lspt zmS9ToI`rowkEL`{!^R6jRVc?)XkO17+&tFj5*?(DR|N1tKGi*rv0-C#Bv%ouTg(-F z#$2!?g4pMb)d(62ukUFb8aMdK1QLleAc_43`&(Zl2-GB1TmdV;2B*=Cm%0f zFh?DzrCu`kV$EsPjGdD)R_J+(5qXd09#{yu>65_c9V?r?)MhsZJ==Wj2F^*vUAUbi zLFJ4M+ethV->>1?v*6O@zr1UOLg<)aNgx5!1dd0s^{&W4BwQ@COz!l{yBqy7-r`9e zy9an_BM)X?j-V7KsMSb=zT#v@y9<} z*a|V%Ki8VNCW2`9+dCO#ZO12_y}=ng3=CI1X<4^(ou_?K&>Q%zrM7FtV{V6L?f7JG z;%}E9F!!SO#X4P;{{Y$2Z!#4bGBA@Q9CbM7w;gL1f3rHolF0MQ{Z9<1CqIX!a~E24 zFLv@rHt~avr#b7;6VzglV;1!=sZU$8*nhL^bsZWz+Y6OjY1GJH_?LIy&VFEPi;^*C zIhIE^OB#H#0!hdpE>9fiueW;TtgS6fS24kIB9DdJmR;p}$?Q4oYkJ1o*G{%u%~nX` z)3*c<@;h(alYznftDc=6M(-<^2_)}hTIt~XJ_V%uN;7KMFVVQp%gZ>qz6v9LZ>U#5>9DntxrG8cr z?oi!FPL$&9p%YpeAxV7eiI5oyRR@q14DrwZ0IswxX1Pe3 zF!No zPjRPP-bZMY$vliUf>$Ak>IO5DpQTcdRu2T1_VM{Jz>E|SOB0{s$31wcWAQDrO)ToM zM!`xmyB)wcBydJKsf8)+Vy99rEi=4|Z8uA|l6#m&ah3#v#1C%&0F79-@pg}BSILlYW7Z4thhTExJ|Bv@SV2+tWfuRlH>y-4TPKGI+94Uil$c}x0_UVjSh zT1!~si4_BRM>}_g$sG$4dUKOi?KL~ebwdTy2w8YkBji0hj=3ETbU|_}BOhqa1w1=_ zV5*nk#K&n&9m~PaN%i)w!qH%o*xJL7GBhq3&J+WX6#8KL=DRKNYE5ya!7j;P83hcF z83&v$IrcS)Cx)QYU=c&--Cj(=KX^6`gCv{|0OP4Ns#jY{mkO=!K_phALYC%IruqK> z)s-c)k5&4LgF&8SIg(j?p%Q=r0N}TC(xlXGpwlC_vbh1IJGL0Mwl{wcs3&3Ct1D-MIiqq`qYUbO}^eE%8yKgbyf0*hpqXT*pK|OlsrF0fVGL z3d$6NpIq_yaa?YXZ>U(Xg@VeN8;*WZJx}B5U1ip#r)hEwxL0Gq89a6E=uL9UTG-vn zQGqShpY}_Kl=5WXSmevo5-2wfz%V_DTZbg{q;K^bs%O8nbhy+v?Wh#O}l+Ys5iV z!*mZ#)3t=vTs#)-Y^3BlA&5P;=lWMc4yJ9ar#oIqrvVtVeA|aZ$o1|Ga{Xdgv)bj+ z#oktLLxI#hBJSb~a;*dlw+oz%^vC1rO|rGW(e#<^u4Px??fD2JarWe6o@=Um`&jSR z-gFA_pe4z{`G#@$4r@PB)n=0QEuw?XkBnpn#yC0l{{R}^Q%z{JGVH?KzB&dX7l+=h*R9{3)qw-V4=X@dOvIJVAhXOKn+|hST#8^V7X`cQCG` zro#1C4i3_pz`)~y>6-I_;Y*Du#TS@e0A?$adE;?B@H%3%jIOOFbH%vYP3n7Iz40zd zqk{IzX+F~KlDsHMf&hxhBiPyBjFY(3eie^Z)Xwh%!9B&m++?&&w5T)p)N{Ex`i$U? z_2iRBJ-myE_oHB;UPmL>zB|{@p8@afEp@$8Z?ZDoM+$|Q$fF9!<^b?VBd;|TDmX>S z9Net#;xYYpdwYUR0_>N3`SzlpHC}1^_&qdsj)~KOJkD z&6v^kOGs8In;E626lsr6*IxPnM8Vz6JcG~a658J1=YPQs^= z{6izKuBT6e=ULR{)0SMUP%P-6ft1^fAoe4sE4L!W(^b|Yy1QLhxW0Rk~_4ckoZLh9iXyT2WmI%OdR58X#>?^F6C)31fBQMOLji)?% z9CSX_`hv`8dv#68b3Kvgs6JDrihlKeIr*+W8_?Y=Mk}sIHR6cC^yE4SuwyUZAvK=}#dvs6~te#u3 z#z@W%-1h$f>(rJPmq{CH!f9aCWrU+-^BzbHs!!Y}7~H&r$gCYZTAEwy-98Ig?2V{Q z?`)mzoPwV~2sy#e0^r>H7F-7 z`g$E=PgYiX=Hs}{UPR~Y)IHS?~&** znkc2a4AZz=x%p2$dUp1%I$=1mdFP)3V!(rfLh*sYCp||`O3Bl;nKb*Qxti0=v{ft0 zWrpK`RImhf>0c9z##XNR>Pr4y2U4YzXrIIymD(R6rxz}8-@6%1dLHJuE4>q0l0$f~ zv{7*6h1f?pzzxZ+_r!MA_YH2)Ot$Yo5ZkfX$8qPC!Qgg2^`QDTt8Z(DXEJ$UJMLDD z`9O{fkXNQX39e}36+3rl3&l&7Gn2BO-%p84+beGlswI7)q|Or=@5nyG6`5ywZ3dz) zFXDJSxkCJdC>S0=Y%n}_$KhGGSCU*xi)U{v(XJFU{{T14Gsx?}>&G1`p@Q1R<~b*n z&Aje?q`2MadUNkxYMR%~>86ym?siuR=x+H9?Hv@!-HhCn{>A+miiK=tXGtZlDZ zL5}kT25O zYj}}aqln05=W`IPh1_$zDfK^2^&Q;5WLR3-Lu)YIyX98MZ}xx(pM3gexyH1b;S23! zE4axxLQXPA1EK1BQ>RYr-eQDX-1KcjR@7%x907jPhvo^uUAZJKcqgU?Gg-IZcC9i< zCd7aMV0Zh(@sZdND-y*gONY0GzR)G%rTf50PFDxh12`DXUD3czP2NZz;yZ}sRAvPH z*&~dcsn0n)bgZF4LGu|#5mULXJ>8C*8)&my0yl0f<8qe3J@b*@BC0}dY_lo_V5e#} z4hnTVkTM1by>!yW7Ncz%v}CqIGdl*{HgHcJc=oPo;=5li2;MLwCF7Ykva4h6@t)jt z#anS(bvkR(_HW(lK$kE_i`}U>o0U*hI_0oQ9G}MmvR$oh?nA){mu5mT1qF^s!yYr& zsphL(*-c>)ds$iLl)DVBSbxCAdiCSinxl7hHI2-YN`^GJRSu()894+K$>*+m)~X$u z&ozBnJ;nS*R>E?yOAsluW;UOo$2{^l{c5w=Y0$*5Jj(KhEE@+bILX>^)1a%i+I#D_ z?Ixf_xRFjK3zazQ@{XR_6#K0jM7VTG%wESBUIjeMyTj4~># z9u7Y5Jv#pYI_8B~z4sg`Cv@4#=@Z;Y#9b!woCA(Q_04wH))R##MiBGIy;NNm_5jj6 zYqSD#7-t#kd96DO^#jRMxesjfU6iXuo5HG#w2Zw+PL^NZNSl#6pK8F9!#0zbyM%0$ z&}RS-PHVfloRxUaHZUjYT%7uklP1})g9IxB*B!m9O0tU7y7Psub2G#(*z!XwWcI~J zHu9>-Kh|Udc;r>f+sj#)%gV4(kOo0L`qe3JQtB|V^&fNr_*V*dIFnC8v5d%AJ8dJ^ z@Tu)C9y!F2g?9s$la0gQr6XEgTrh?-CH)lsKcy|!mcM3G8x^?udFZ6)9S=2nCazyC zq>-rhlU=B|V64xQcCG;YtDe8p2Gg}na(@?aao)PyHP&?Vcki?-*K#|M!w&4vVjI@5V!Tms9ScIK!Uo;eI86ONxj zQoXR+PnYGkI`%)EDAd%>64L1Z)ck{OK_Ui8n2a0^qdQ6UtedIT$XFPX=R3Lped#Tu z`)#k4BP2pzH!=B$9m4gh*TY3a1)LwfXvhV_5MS4T2>olvNi7U{Tt@jO?*2GRvGWK7 zuS}k%ySoiNE^H!rLqNhwHZCwqj9`*^=hm?-FJ`te`Br#No0)(hdh#iw#IwgM2qYIW zu0V-Lf3Kxr+?f)TnlIjcqQ>KChXK%nRlY(``Sh%*bS*bsks@h)sX9o@gU@V^{{UL2 zd#c~x%9i)@6b#>SI%oT%rhUywZKQjM5*T-)l3V2=Q8CFUxjid*=x4U$HC-+A$A?YQ zX8UiGkO^PAo_6{kyqfELi#uyDHLJfMs0?-y@}FVzAE!0X+D0X_Mz;b9q$WHs7*fZO zKAeitxA66~o{cox-Nt2B{m7dw9FMyr0diQ7FntAQI*D#dN#5r{sjPRF7P^(#q}MU2>$-jW*w_|rfd2rh zSc?~C!h#7S01W*rxsukx5V}0hT04MLJOdk)U|^r=UREZgtdlZvu?>vZT8^PDv<(`; zv^0{$!BsQU1Me~I=zVIRh;OvbCIfRV#P06nX_bK_XPh7%0QVKOrD*n+miH{NDY=Iz z+p`)KL&3o#k=*vLGuJIOEA2(@MwtwlOqk%5valb<&BA~>402C;e#R-C6=IS|zjGdu z;-+aJyWGmVe6(G=vPj9u;0*KI>07q{01a+#ptCl2QJ4{#2IpL}h35*nU_H<0Sh`+> zmlopT^!VVORU>bh#X^&U4_y8qop&~y^tE7R{p9&y-C+?BE(R3jl50O|_KPNjqkT-{ zVIa2_*0YJOVZe$w)PxxMerGt&2S6#K!#a)Ao4JZ-i1_De$v7b6udQbIqgubP)nj`} z!$yG*mN*LjosT@9tuSi-C)SL*v;*v6usnp*`1w*mA0Ic zM8KKVhTY(Nrx@xuBDBA^G|evEA9kx@GaxZY@K#!tm@tM@Hi#84N!kYTBGG&PPqRp!vQd$*7=;7^6YI zjr^|O4;`em__#tV2=i>pKmpO=Aw(`Sz`-b-h1S zxe#1QBE-#vbN>J=n2(eZ_0Kha-ZxYIpJ4}-vY*~Ms8Z}lQJ+EFiY)vir`Sm>vKI+1 z50@$3u%PE8ckj({%2B&aBPFXLd)r9%{h`m6KrhC3ZS+0L@2?vMQ%+#lKC8QvY*!i0s z20oai-$aRZRgLt~q>|1%rj4x#P&~H^SPx=38RtD}Lv5vMFqol@1j)-38&1Xh>x0Q8 z410Rl8RTlwU5kx!Vr~^{Tht2h=PY=Klady;xROC;)7`#y};QC!rkn zsV()k$$6t5P#INnRI_x$fs#P<&o!@adwMS|MvE9AjgmAyz#|G#bAU%v*Xvw-)nNN0 zw+mfHNU=VpZKT{->$bXVntMTR&_`|ZEKedHPwfoByG)Z=f8x^AqllnT_02G*DYx!-KL{( zE^d>_Z#q2rdl+uox#zz>N~fh6E=iVaj3)4e9zf1WIP1vjeXGtbV)0nhzRP)Q14%L; zGGjkhRv$Or$!>dad*ZL#Ynq*%s>vi1PYUGRNeoPF)*#z(7=SV`az$NC?s8nQm871w z`;jxnIk(BO?0Q(5&*x>8uudaTIXLw_vDTO`b~e&_eznE6zo)wywCpr;pxRWC<+rCj zefv~vsXUFjn4d%ur9M=&@=8)~m||7Z;yzGP=qXD8U#29Q4P0 z4A&tQ_IuWfUYvEWdZc4G^37cvQ>ftiqV+l*Mpd)8Ul8fhZJHzz$_q4xNSBk#l3A4e z!}9N2qr_JFU8;D1BHVd+Wqf3WakZH<{>yXU8Sh+r>Q2Q#VgNvdXDh)6Aa?1^Wy>lh zw7L*bkle)}g@EY99Isz|it}9y({Pk5o}VSL=qy50=8Ue^zcbe^ukK}8CbTFgW4JF! z$0JtvE5GdN_i!Y>}U8vz~FYzHRm25)L!FAvX)Q2VnE;T0fYLA>Y=-@ zR+T8JbFRk>>bALA-o*mtfI_J#P5JQ}UPglwhR8pr0$%GhkV!mk~$GHd1Y zsKF#n6G$Z52^*7kdt>p>KGpf>5a+X-mL8UgZpZ59jHW;= zcJA79*q+^L9diEwM!b>cvQsFSX5vef#EmffyF!uzj1HfTd?aZ}RM%Ah00YmRB=s|O zhAS+BT!OeRLn$S_aysYK92%sM+ANJ7oUbpOg9H#TPJ8~CtwSD|EY-}R+FTwRIFq43 zI&ta^QI6I{bQ8#3cYf)z7z~bqdi`@BbHR9r0Bb?P+MsqM9IwPV(*}x46Wz1#Pj8SZyGXcXS!=lZv-@ zsohO$a~1`PTQrvDIMe`L7-XKAz&!DstQ z9`63*$y2dazGlE~xKdYf?gs;k;chMNg~@wV-UDNM{HnvTAmoht3OQ0m&X-#n+DspA zU$NT>9tjt5X-X*$at}Ro*kgf7rdeC~ZtOvEYVoUs`^iXPz;br&?bv=bksNYaO3w_1 zMZ1<~R@^cV3OZvI)HmDjrnS;-?Rc+B<#aF@J06 zf9fW|^7DjU*d04ppP{%Dh@E`LRU>PhspvxVB>Rr_>{@gY8&hGq8Yuwdc1O+pv04@$ z7SXLQhNp6kF;JmS;n$|%JN`ALmo3L$ikH0|E_hTD>ejc&*Y=Uy`BMURr`}cLJ3H_( z#d~GdmW4f}cgUq|?qFq83N$#FkT?A0zFu%f@4#m@rK)MVWc)hrl}_ln`Lmv!RSin} zT$KzrlQ4!p4;Ubxhm3ZtadB;{Xex0~O*CSE!YQrWXsLS)@Z8(3(5a9lP}N1boc2~O=RsN%9~a%<8(N{?|=vMs@)(u|7k%u+OlsE^~uy;Hk>f zIXUzT7dz#NqGO&z9y9gMD>6MSSYK?0-_5tk4UwE7>JBQ*msb~>iMSssC?p2p8t7g- zc?g$#V`BF=3tGk5R>EJ&uvA#2U)%Gh_waa@{%-dHQCbHNE1gVmV}`xF)9~ z!x;2MdX~ULRPR1Ilj!Q!Jy`m~zNp)kgWIr+}Jwf`{0J6-J zw)NY$wRUq&CAf#qLU#P6hw)>-y+L(trB1$M+p@@dHtb~jkaJtcySq&ec$+)3|JD3~ zdw6Ds31f$B^L)p++33gmQ()3;TI20dvMEIA$@#r%c>J+uc>@>(fuFA@*XdR7wFHYv znXOz!CGO{8#6GterLf=|-0EEe-nnA?<+d2SEy z3Xl;{cVfg~d;0dOTDY~)96XO5-20Bf$>4M#V*q}6ts@O=H$_r~;R1sVjeC|>=DB!UivuE%F)}c=c&tr8E zdzOvEZk-SXKsY2b9#6~na4Sl2igt-U%2%{el^&0yn_Jrm_sfX}@UNG`oP`Ct414fPS-x;F9Khk@~&6*(Wp#s+;l{*|L)qG|%V9=&ErHbW)X^feq2(<6ErM-pA-D)>SYD0mU7#xx|=Ky-1J?eiF zcwfcVwv*o=FJ}-9BuOd|!MG=YGNZnE`qw;a-by#I5yMoKdey&BR=Hi?x>C86k+zA4bWKX3M3NyN7*cLY!w zfo=e1;08XO0H#mktv>SY(iRa-IKf;DTYznVOSAf{%?^Va;VzJrsMhn z-ntzw*Hbz|-&{`}w2}Z3WJYq!@3)-PT7I3RvcVEWC7qOr$oYuLBOvz2>0J{~b@r(2 zp}UFi+C_>y6SuF-z~db`{VSHGcD>Za?E530&@=&St4FA`G2LnaG-4CEfCPiQF9)Zu z^siZ$!NXBilTjiIwq|yU78#LyZao3*oE%pjXC1|^x|b_8 z#nE_VEv~T(1Z~BJ;$+GDrJJbW=NYakI*xazxwo^jS3JYUx>l*7-9vFLzuDtQ10}a9 zQVuYC)lDw;O$pOdl4(}*2|UCxoaMMU$3i$AtLR0#@ZI*E_RSvN+f%qwqDZdXNVv{- zF_(Wl;D7~I*8DY~YxXm0cD5GQ+LU=!hSm1QST1l-0J}~;@{YZ0Nak){ZdskTAOMs4k8ai5_+2z1re1m0-gGvss3&H_2?r%rJx6nz<+VQx zN1=FT=J&)ezT)mwhB(!URRH9XjAsCK>0I=;jcE_|eTC9SlP{km6M>&Ac5pu*&b9iy zX0+JNu2pxr>oHtH_fTG0M=YrjC(C&mRPoV6ag+Mjn*2r7wMZnkFkQ=Orrp9%l_3T) z+qZ2BKJPsA{6&l7?l-e+xaXVzlNo^~I@bRHfOKCHXrg#EfRV!|?<0?r<30TV!0BIU zlILohP^m9;>ZOQ+TC>@qxgcjXp&i#wYY$GA&S2tc))Xp8-=0}{0Atts)nj^*xJGPm z18#lm>~PtgDzN8+(C?=Xc&W!`cJW*QGt|>{RTo-h!+)0XsgWFYM?sZg)1j;==gfU^ zky=rH`q10EviXWvE7u2%is-9XblI#NY>lr4M|U2FWN#(=LAqZ$IdBzNOP1}&%6LCY z^Z~UQmOq_G+;9#Ufs@z#{&nRVP4w2+Hu_hPvDrlnEGn*ke=cc3W1nVG^scJXYsgw) z@-xFbEU~K-gvL~q?T$J1uQruxj&Mq6q~%FP=yqv*kpa1YB!E7qx2>)>rrr20(2sia z{{XZoXykG}Q!C{uzvsp>x(^AxofYc12d&gFhm$2eh(F$1rE>tApDV9>weE%N!Vesf6E=MfDrz zmQ6k7KV>Wnt10R_Zz&-nOTh|G71aT$Z>qjGyUPw?WWf?gf zV?Mb2>ZYF+gbG{ES6r7P0bTazpd%;Nxvgv29Nx|A*u5^FXJb3FATor>DU#st95h_5c;a^7CkBz=h^uN}u;mC&t&Uc5Gu z;_6%sZ}}8sp&bt=jEe3r{1<(3DH>FXaEw7f~>o&YfvGY3(d;59 zv&|KAf``MfL2?YXrZjJv0-TKX`Hlx;fu6Ok9j%CX*rLYJj3(}cXQ#I!yIJ)3WVeyy zf0%plbM0J~p{ME-N}#Eg56oMqL!Zy^?Ot^7b$Pj@?v58iq*{BUDqA?x)ZInU$TwQO%}l#?W0X=8;L&M{c=JoXn7MRKgx(qxoumE+&B^sD-2n|BS&H`Wm> z@csAuN6Mi{gZ!OoGV3jAOQH zJ3SJ`?=8c~@kb^Z$3hn*RItuxjl8JQqaA;M^WPk=@~f?FBx!FuE*S>hy-8EZBNdF3 zlCn4wb9XL9{hKw5+6aVqD1KlYB#%+~@+uoc3{i_~hglQulKf}+jMqM|3|CS~JSB?~ z!LoSI7#Jf4ovK_UQy`0S#^?#i2kLnJJJ(Jd4C&tTx@VzE>3hu5k_#(qSuLT9=LL*! z-Z0%j>FLI63g=bGFkOd>art&~r2Bdw(yXqf5*@FW;17Iz)=kCK7n3|F013`}@mE&} zrLm<{rJ`gK%OV8@cxD(qGoM;7?%p?&D|X!<9Czu+{Ob}$SgyXMaVeorH>82O8lBHI4$Lv^T1&l((s2ZlbF9fe!euP-C9^5Si=xE%&m4}VJJgc_EN z;G)`+kclh_e-rt86Qq=|ZsEQC0RTv(P z^{Jzd;z+{oB4itKG25WUbT!Qzv$Ar!*HW&rbEw(r`_U3)fQ~`P1Jr*|dR8dcuP>nz zt*Av#4#Ii-4Q}0eQ`(f&5CDXSBO$PO?eE1=vb)j%`xU-cJn^`C`_#hxw@T+9Ezkef z{CTyBjf!2|{D{D^jQpy(;g7vwYF4SKT;|?GA}%(bNFKQUVyi=SJTX16Vq$JK{{Z#t zq|&mh<}#dsHjsD(cHmbG>GJAs=&j2-Zj81ThkIlnnC>Hi>&gCAr*PI1WW#Llw1x%p z3uneifxX*>-Hvgdy{Xz}lW#rstKTm14p_SpwYPDC4_HlnB{XM!+x?hfjW-ocFBk@1c(?wb-_{7PofR8g$Iewkso%oEFLC zXMx-DtTZBK1MtbEOj4K>0G_Nwaxsp-1(~l#6e~U0QA66JDzeq zYRphacI50RyMS{WZN}aR{{TEyYKiu(O6ciq5w-nO&%7Q=*@jrc5(fu?gT`CG9E{a% zV_C4%Zl$u343aYZ-zX>pcERH{ndItI#~s$0xX*8vJ4g!5I0xp(ZYdMYv$2AC#{0{L zWN$He402CCo}Ws>cxcL8%+94JyEeQZf2dvwEs#azTg+Go$U8tgfa#2N=chHdqeOMR zdgEERi9DmVBYmW92tH@P=Lhbn^zEADEN-Q<)30pw_~SQI4XgnJ1FHZJUqjsHy*zxRgeLb}3Q)#lsSR^bb;buD?>%Qu9fXt^xC{88>fY zS(N2_dJ;MBn&s~`qu888rS87&AF&dakUi|WA z0>(8wC_UJpTw~W6?~1V&x2PqYlVB&>FfGtI*Al9!}wc2R(7de+t>ue%P@^4x0p<6yV7wdiEf7{AjC9wPenB zCD7>ZhVYn@-vgu7_LK5@gAoH&uzJH5tOV?3}ehW zAbjKwI)T?eg<7^q!y+&YSRCh*j`_`XntCo6s8&?yzvsVd&KM;Zb|}|rStG@KL#}ws zLDsEo(NLl)IoQW=C%HX{#~d1Mg~qFUeKp3XJhu&Tx!GP^v0`!=xy5=H{33SucgFW! zyC&}8%alxPRUH?co(2bO)DdaE4$w7Nq1ER%b4uabAUvRp)qKsQv0Uc^KT%aW@Q=F~ z>S-A^wKJme9Q7(NyR9cr)UWk@KUvc?OOYDNlT49D6ao*-rMdZW z)Ewr!T@S{;5A|Dny&fAIFph#b5X2FfXM&}GIUb{$#=N_~(j?O~FAc_Hw2@BvmzU*^ zR3Gnm4c8o3m0uFvtmDf3&0wu!9cRPSYZJB9*9@}C@3_d{KGJ?v0l$Qfz*kovhTxA! zZ93{@xsGrHG;4>Ag6+z&9P#Q0(!A@#x<8D+w0CPZ^9U{E63iJGG>&>UeFtxDqnh;F zl%73OUn=1vw!k7ixFF{_$T%Y$V;=de)ZO`GE8DR(6_)28Y4GHO0GIcy3!eCjw^?%JOs> z9Jk?)1ys}g*{uW;+=fGd9ggfDP~Xy+lTC4I?w`JCW84CauEUR)+Z^Zdtv1l!+QDE9 zGa{?uIaO92kAA&tD9cS*EpBxBMUI(dk3IJ3%#pX2Bm-{JrAgzzJZ7(WGs20k+~4`0 zY!*uwl??8K=9P{LqLa=SB)37uU|(KtjVG zHeq#J%Zaq{98rQm^R880+A#tnD7(tFg^iQmHh#td7F=Cz8U^ z7T{YOZ!`C73uPVEkKN}CGuIX9x{kA_Yr;ESE>#+QaT|Bus*)j5%Vj#}1a0*sj8~3d z>AIxW^T}^(D?tzv6P9;vrz}AG++>4~PipiJ2|tC`!%}#D;o`TN_9kg=CU4%r#P2dj zhb(iq<|i5ZqPG6fT+sG1vtE4^;4!l5Gy-NJom^Z#c~8+>WI5$0TE) z%C_!q+1z=Hwbv_y?^3ia8l;VJ<%Y>p+?6{(#yj#46s1~LyWYk#tIxC4^L=8*>En^_ z_dJAPuv`I?!5>l9tZBX+L1qg<<;Fqbh8xTckbfS3!n?#M9fJcm%n0Z2cONZRH@5fk zwXqwUZ`?pn7|up1tW>m#F14VnMCqhY5_wT1H&E=$DLGxEoxqcvoF0RLS`8_jtm`6> z=oqSDLmolL^TukVT6BqX888-Lb~>@yCLCi2J&(O#)GTZb#jN&IJ+jAd5;QmrOM!xT zJQLp)6@B^hJ!WMJe2dKd7&!MPa?N{)~(5Pu1GRhl(4tdW{&Z#E1 zs35xmbM{g<5rE%5cCa1#Q*CZ!mI<$=0OS#s@0@4+YH-qO>Gd_h&gxX;x7d?UwbfoP zHRZwHFq|L6R)HWi`zr>@~KVcB-XMmm~0`cI5Ia&0v`{ja*w@0p=k)91;PKjE6lkJ;x%l zuPxV0zjMmk+#C!pM%L;S_B>GB`qTT~}GW(xJ81qg7jJ8)VUy z!49K6`tDK)Jv!GvtzJfL=hQUcEW;^TLb=>hL-$!vb!9wybnQ;PyiIBeG%K5wD{(3` zY@}dF5QWd6Y~%6vs(My|b91Fnr(7tvYbAY*PSDX0yfAT{kJlZEs&4u)ooUImoza_N zuKm3vw!36khode)+tRHz(;A{-ke+q{&mNV=*x5yj@$=b@|b ztJ_<=Le@74KPt__sUO3G-=3bF(5b8095o@)qW=J&6`}KDKzARU5O^IrcB4$zZLNjH ztm7)@MBS_bfY_Zg&)w-<^UZ8Y&Q-J1pXXYpYU=5DomJKLK`Jt`jih=J)KzP% zxMPJ@RXFslfXgrll5-&-f^b6)gi_zNx?8+6O^Kq$7o4&DYFl@)KGNm84MFX1*6aqI z3Nmdy27QMa`qsi+&e8emN0AuYv#w9L^shdM+^nWpq%jPR{ZFPVY*|aGfhE1%UuGQ? zHW)Yei2nd0wDwne3Z3tvw#GQnOKbj)s@uw(^%>*WoYZTmvrZ(CZ4L9Q00FiuNz_J1 zrVcae4P%AA^P`63nVk-D8;oNI1D{Odwx$u@0IpQ5Yk*Unf&P0|p3mLT-g_1`3;jCw z#_DT!EhK7CgV!h8xKxEo8L~p`$X@%0QCH!$hxgJd1tj6QBOr6lQx>;(K*J^6=b&rZ z%Fw`N5_LR}Z^xRuE}^wN(T$+(CvPNIH8j!3Cd4^aBP4OS`h6+P^6myuLY_eFkEcq< z*{j`ztj@l|c$t<={JajleATOSaTUqKDnh7Bmif2HK+Z5SG3#8_hz1!NY$KH$HyxlH zfxA7hIIS6_eV%b(#erk*$Qk{5S2XF_nO23Pr&6Q0k$3*EI5=V#AxnA#>)N)h?xnlA zFix!(&6SNo3J28n3(pF`a*5k9Xk%hC&5E7+$4UPw==UuOdroOYcO?O+ih`}ierOtoWFbVQtgN}gy zHO0qs4a=?0q+WGrEYYvZHt-7P&<{?2m45d3UTfI3jZbN?v@EvqV1U7f0aadd1_nvu zxhqGS(G)1fNpmx>)%2YnEmAF0Mv~eX+%dQ@?Im4tg;gIhCmB3_O=#-Z_w(FGZyOvw z1Iphttlqik0331Bxkx0A@)*suVhL>@&UnZL8*ux;dV!KhJRJ2E)(abyxP~jW##AvW zf`SjD7HhXpc$?48PUC`9sW{-bU#3SDPSJ(M z{noXAA)4OqS22bF7|3(D6M%WoT$)mZrJ_a^Tw1)Dqv8*@#o?Lc5XxdPD(?zpe>5kQg9dV8-?}j7uC!L+iw`hVl%yKi=1K*}=wG-P# z3ZbgBdO~4+PB#>_QZ^mw*G`|=Y!~ds+Fhq%ohy}jfzceZp;zP!Z5@AkU{6v z;~A`@LMh!4^;T!3MRK={v!t7ZKkC%;8#aFNBZ5b7PQ7cKo8lGCqfZseBzF)J9zDZ3 z2f6p{Tt=&DANI%FBO=YF8FfiP+TeV|?LGST`c>}<-uOpU)MOg68;iC-A(8g)3&+cX zGH^N`Xp>R9X|k0+BXdsFueB@LHDhnIOrS-Kg*eY7b?e`Z)?K%WHO)5S;v2}k=#{WW z>=B&g?)>`-^;xWZERn}Jw2lcBgTH&B1M9d~F>~PUBUibVBf6F-ZcrIieAs>24U^j) ztA?gxlDk?8&gGpdTRHAUz=+$X%(nh>>@lCpx@{-KHkyoWqiQfLg)+Os7L2Il1+j%} za7KSx;-b~HFA3hmZ>ihdTnNY!yfOKRcAS0Tjz_;W!uapReiZQ+nI4U%&1tH6n_QJB z3zPE@70y_m2rJESeOclp`Iep6Lg7y6_MKT=Xic~XFIN!>}hXx z>!I#?pNiqrSj;UaxQ^k#cR)4*P6pr(?0cRom(?|GA77q(87?A&RCHe~Zd7o3spkQ) zj+~nEDDJgLuM#V%r~5>ue8rKQcGKL3IofNS*Z#|W<69`34HN8HhzI34Cz3s@T3K`w z^G7?K-MSv}s=W8wbQTRF`LUhucJAd*Phd`Xtod}SYuMM$&AjoQ+pXj%t;c*EVETTw z=P%;h+l$M6CrXZHw|P9YX-g|Me~1Ivf3=<~uZ8a<*=;=VT99@$(m5fSj{#73>@(?C zR+N>mxc!W@aXk!3*7{;hg<+hp?;9TEk;gkma5LJcZxX_m(8VG#muQT#ji^T%13ZqE zn+}&};wbHGto%lYk|Yt75;|iX1Ds>I$4aehrc0&S>kF&K(|wj{mITJt@}V3MPjj4> zHHX#guPqH_MXR>;GrUW$Yucn%km#CCtPi*eE_}srAC2563Dp1Vt>O~A zmgwWFUQ^a?<6-datoAliZZO3+J^xUFYz({ywsYnhA#_GZDLIA9Eh8r2&4=a zKCC{RcCFVpNY{-tL1P9#EV~BD>wu#qZ8$k2tv=%L-N@|Me`6%!RX{Reah4}I`T_XY zG%*rynKHezJp6cY+TKrRG-O9_ll!7D;koK^K^f_qMbgcHmipG~c6RLz@{nf)@Y$WdR4cTQ%z4S=410C! zS*%QT=2q?IZ(f~5l%=J~q`7Y`;E(5$QM5=fUvi$Slj+#?t^WWHSxcttkAI|UF_9E) zBp@LUz;eNqjsVHX&lSnqUB{Q17i5u#+LK4NMgg!G6VwsyTADtx(n%C?Dk4fGNiy4h zVy7JP03(1u4@%y0Z&B2T@*(@agw&XtBf^Rf+!qcJH!3m-uoy9Yzl$^{zj}Le?mZ zvXzQ-;#kn`k#Gjm$DtYTikdUmU0I~??P`n*-FHyE4rGhV%$r&9oDslm@#s5J*<9)o zTihF$0ily*U=5(POHN}%{fq`Y*gBa<^ z$6kXq(LpTlYSwqM{h?nZNEnB~7;F)aaof_fzp}KcwGcA<90f9>GZ2< zCv;k;7bVJKM7LUvg{oXS%`9xnSm$(+@&Wt~575_XZKU7cX|`)W7W@3OE(zMhCvnF( z!NpO!(uJ}iwkBy;C0LweJTM@22BnWwzqGe|No3y@#!k+hjB(qZy(z*`cfR9PW6k^B z$u%uL)(cyU2=2226edLkuzsr`9d_rat1?L{ zuVXfe9jn7_VlSbc)h1bAWU+Ej-W=e!ann624Q|`Qf#-%xYpZ`C2aM%eSOJE^1TzqQgd>GcTiU{x0+MG*VjLC!$+C_%?c$GFvX7oHCgLgGm+&yhN@ zCwxtX4UYSlA&2+5?_DmhZ=-8kdMtOz_OZ0XEywQ5;Bu^f?oI{{f5y388^YS%#qP^Q=+%kc>Jrr)x0X&Zn1ejqW(>qh)_=BCP#^ILZ&%p!OtKJ z^{gRrbL5d}WedgT20&L5d3fRSPhNhNv@G^Hr$?6RPRE_4#Mjq~vc}CC^}!r6{{WD2 z>T7c7&7)Y`ZIJo58{dC8AoSh&*J*2_=~p|Nc#Ib;a$OjJ#EcP) zV}^#-E{%>t44zItZ$aL#+9xBH6*bVog7u`l@+4+LVZ#y057br5+e>@9n@c-+{J$j^ zaO$E>+~oBc>_$gpS?OkP7+o*PcLqB2B&pkx^`PpIY1Z&DjEkbeelP}G<$ROVkbgSX zF}}x5DJGrHPg7|v^&7deOF@C@o`SP3Z=(#sN0<-IxSl{f;Cl+{HLF>*SGvANU*8lZ zbJQg|*T1V!%`L0i0vheiSp$3#zK_3_kGgr@m^GRmn!f%>;g0a>06$!O7$Fs+Lw! zT-+f`gXL!gXCHy2OPgSVQrxfJ7|uQGIK=_EOUp~0tWF3OtRR-g&}(*%p6^n#SDM%N z*7HIJ00VbCagL(0bj!o1xCso!4i4?45+>{0DmeLaO9>5>CA#(*=zG>}oM`(D(rk&r$jB@} zU5FjUF`v0CV^s@UBXbw#IXXjT}SszCdLs+8HVf(fTPl@ zv^V}Hx|>lF18fn1%kAUX9BurolD@pubz6s?=2c@JK3%vZ4oFZ(r&>!rQt47lCSa-5 zvwi#lf&ug$v0Q$~t9qRLo829)o`HX>PO)0Jv=)FFWX@GV`=LkiF&(>iuCGwkjg78k zmt+>vouF95Y(s8=9|Y|z%rbWn4}SH+c$350_P+v2bA1qk z*ht<)aj*n8UUQZ>=uhEZR21q|ytIlr>NKAtX2*s+9UY~t8ok+Nd$7}d%yF_o!#5{$ zHUaiD)Bb@WhJW5$o&y{nKAr0uO7T{ys%dF;sbhBG3<#rQg-{i6zMlU8jb`e)yC?cI zf4*?rByq;ldV%lgJ9nG8_B-aD>^d|CJ?z8By!Aqdk$** zcy7-3%?Mi5iBHHtE4Q~mIp(t(_SVYw7-f^oQ6Nt~cbdJqY>=p3355 zr$l9lMr0XCkmKBvf1Pn3Z5q#sZnV@pW7C&wWB8GZB|#5$x_8q6w@v#q>-TM#}~ zc}n1u=%8f&RoB4`_O`Afj7~_wT;n0QBz}3Op=2 zdJsDt{uQIAd^r*yGgct)0IW#m4;=mB{W0~fqr>)317`LS6zVzx2+vQ}x=Y5{djyO~ z1ntM;M55XbchJDP(zNunZ7r?RF@i91`PwtfF(>KGbXre}bZITGFC<`(qegzw8UE@I zm6Uf4!20o6I>qg@x~$?wH9bzxU%s~u zdt>vXp};Jkcg!Efj-x#1+P*l^Z#54NT#Z^73eOm2Lh5|A$iX=7eL1gV@SFInUejjR zb)8yUdwJR8h6{8E@&Gw)-9QI{*P8P2m>AP?sTo-VioM%Dg@;-5?;^E?fFjA3Q`83j z5%oO&6+>0=<)(=w+GdX@n|j?wcv40|WBoeU5#ZZhM%MOd?zh6w2oVo zkEbTDU1?qx608zw3u5G-y5`o8Y73I*O1uki7yohn^d1kvOjIXa|p5@~!^AT_d z@01WTp4d6S827G&OtZSv^oU~9?Zw@|1h8oW63lzEbvUVY%^Eu!Nu<*?KR00k;b~Rb zRfq)qr1Tu%ft*(;%X@0pmsd9~79d=OI1(@zUw^!FUjG0;JZ`izY1v%ge+lcF)#i<; zG|G|*b8`@HkS0m_NXS1g%5&GPdv&jebXa1QUKTOReqo#k+&d5f1a>*=T&IV>weNN$ z4!U)kfPB!RNo6GR6gLa?!L5x?RMG6umFCK`+OGK}X(Vt!;kPIxd+@wh1}0LKk*q2< zxo%jOLGbmAjC8*%PiYlL&AFY3`9LJ{0OO@i_Gd6Gdad5r?T^m6%E`3wGuH>Wt^!{W zX?p#{bJ$%O9?Kz^oGSNQ(SnPIk@dn8W@Qm3mh&*RY73xwk z#%SzrtR6TmmUrB{Q_S+*E_ooIynVa$uRhc?iEX8mRu_hOEyyVek7E@F4u_s|*yA~^ z-9Jow_!8dTE+@K(2{uExVL>5=;hgXRspAt|YS#8iYG=7}_!1Bb05g{d@*d`g2+nY0KcbB&-ad7cSh?t5NV5+Q6No?Q~jsWO6uQKsRfptsSX1UTOGfl8>fVo%P zGnNI8Hsc%)-35DBiF9SP({9D`G>K~wo<@K$kSc)8RHx0r$@&AuaT;fbZFM~+Jwr{_ zt~D!Z$C(+Cun6?X8Q}4Q(A3hcWzT;h%7dq@k>>i}hA$S^K)tmld$)=kX`*bH*J=!O ztx!k(0$|_?J+h!}@il)Ei1gDOp1ko&C8P z_XqH-T{&08H|wk1+`DNp#Oc?wsWG!LJhV^}pM+ znIFmWlGbO3YZJJY!6fn5Ca&pP<=%wzS!q&+)OBeV067K_58eZ|6rZkZsCpaMBa52d5Tn?M7eWu}KzlD-A>%78pRx#9o+4+ZV zK>AeoL+sQ2q14Rbw;(5z+i}HaMDWYGWr(0TIR}h_I6qq62}V4q$;$5N^ecE;`YH7x zb2KtdF^g$@G{saB?nfN-9eK#jbsiIvZFf|CK0B1QSYUTonYd<51zeNz=RbvcUB&nI zbQTtfc1H!EYcUcg>P)R|LoSxawPPL3^wDdQe9_X)c ztljDHM>VD;mLfoD2H?PO;{&K(nB$7O<=yKS{$;W@%a~On8-Q5Kj#0@ODsh!PYZ^c8 zYq`G2_N{K{yFb@NQeQT@PNhaRr@x;o? zB1@p$+IiC4&hjxOVYe!-#DjthXB;1T%1As(9p$_d*}7Xqb8?*-5DbRLUA?QKO-Jmy z%u6hDLvJH2#FZ?{cYWiYIBviFdhE4rQqts%I!vWZdz9mHH#o=itEUStW>s2CcC2N^ zs3Ymb#zdK%$Z>)W4oANOj%$>f*=F$+pt7xzLn(}Lx9-dUol4b5m@&8z0}Oq8Ruu5Emo%f)&Yc%D?DjbgM(%he5!(+oR(<<=bmKYo z>026ncKU>=Z!ErCfUXHVqaLT|PhaO)R}sv&Yvt^9#u>K^k>44|0~A}%w)4$(J_{-n z7(@I({{ULAtStXpJ? zGyEis9QDDjZ0jFt#1{g zqsy+rrz8_0iaZHKpER$I2vf)(kK-NZ<0Iaq z+b(AbZVk@d!FghDp}0oOHs32Lb>UYS$G-=fZ-)F|E{SO_m8C-*ppc|Ii@hx_R&OffPhd)em6|i2pG=>^soTLF<+I(u1apv3_6?DoWO6bp6_>@c zGFX=vba&TU-Nvn@HI|);Wr>~qw?UVToH~KiEBAmMdGxLh>rjT?=EuYk1wm{X2`X83 zusAHIjt+ZzRGJp8_B&*`X8R*~g+U}Am|&|b4!QknVr#1%VrG{1BB^wyDGmp7jFFB& zJcCyU%Ir(7?wc`f{6t#fD5G8Q+^F*9VsKP= z4f0%gc0{*;tkIHwV}f|&82%IeYj)2;k5M1nI$h)2Nk1;70}S*ajt^gY>6B5;TbHJT z?yf$=BMDJjJ~DU*9gR;SX&RlixP7rnZrm^#I6YOf)EbUWCjFJtM@%{td~_omGWQ2P ztB_c3jhVB(PbO^Of53SJ5$jUjwK7R`HFX zG?mrP*iG2dy^7(Z{?M~2c8E5De~10$_5CXtdH0*4mQF#+lg1Bz2Wmduduuw$YlU8g zeFwPo`qiZt8kqBKvuAdAAaaaHZ&B=PduTZ&(2@-~5bm9~vty?m{c7ZTZK%ja1fE9j zpPy=m($OLSY7XX7aHt>rI@8%9h0BMLi0l|(el?SCV=HQH>9$P`wAXOUwL!r`!H48| zsILv-F`H>y1w&(?0lNX*S2s1vKx8IOqi$C{PJ2@!)@D-_gLI*OQIaqRrCN(-%4*gw zV_1iDQ%k`llE)|TtCGbGjy%>b+f4I+k&my|qaQa1kx(}P0aeO&woQZ7^XO_s z`B-jkx$0D8;;O{VPbWE6{VLW;0 zI4kTwmU~u$LM%Y@WZyUfTyV-Wk`-5ssXdf@ic}uE3;r2Jbe5&9{H}0zz_II z;bcVNj#izGwWRxWsH+$ zjuT)}lYk26fzz#5VoSR|mj|!0_Ntd7&G7tpkIchzZ|+o*aD7MCxxEt7-dnNeCVzMbQ_%IU zicogk99P758pggfw2AEjQxscr+_oHmy?DvTTJ!$^3dgMJ+RmY`T*1B<4zfwODl-Pi z+FKjB7~-y3Lmr{y3q3aUdDhK5nMNlEU^{Yp@m_)PagD!-biFPBBehFaaT4fp{$Nid0O#_o9Y4fc zwxv0jP_g?sbLLwthif!Ij&aoXIqg=XjU?0KXB&&KCp`0!QTX6}mq8l6u`%9SPKy+( zNM#NV;qDGQ(}b@1M+mF(oeiglEX4C_$$crkxjB|Z+HyNC(hdTSIOpE8FCo+{rg-F& z%|<*)jAuXAs(2Fq?^N*g7nkxGrgb@D04D^F*sii5C7Kwd`IHiHeLy60Sz@W(J#|F1 z`MFsFPv+@IOx56H698mGl@cKL#|Pi>u4ly`6mRs)9X``W(xbDMblR-tq*x*xN4M^R z+kRpXPBY%Rj}+?qjmDJ{O*v%|hQ}RpMt-%zc#0_7!ncge&z2kU$mYC?F^p>{NnY&e zP*Ihhr@LtP4`g73pD;MU-PM?2j(teUu4m%jucst>Wr4((;n~(OH%hUv^y`9peJU>q zPb~fy@YM4$X$(p-dix$oB58K0J-)x-9%tM8K0*^uY*5!hWSQZ#E zipUwcIV^d}CzJWtDXCrDiDUiY`Q+MrW9hex!VS-2iy5&#!FOKv#e{{WsUEjR?Yk7yrwb?5ol&E7KB z^=rF(`>Ttmd11I@L!50;s}Kn|>31MM=+~uvQMGwPF#-5$n)E;h&oq_JV~g0K2dV=Mx@1q2ySwjJF93EGeh1r!n)I0WqNE!C4!MI`^xmA=^?cJQ!s5ci<=cY+z z5WEqe-mStDBEA?9N#ujYd9I?e>3X#K9C0n4oRTbsLwv5R2Oyq+j>n~S+CAmX=DDrI za{JQG>Nw&U18NwQk;h(rYLAL7;Gb5!iewIi;D#9(*eaDbO{Cq=9qFN$ay6ZQT1&fK zNLyOk1uqWOWpzKpo;v=Xl{M_q!gCm0$NXGje;&OmhMg38ewh`ulKD|K_9{AOKK08% zb3N7WyAbh;d`CeZ#NTV!so;LyS*NXFp z@u!G2O=i^CtVweb!Ypy_aq0(AR zK+y?s18iUk1dhLt^{r{ONjBBl(JAt7L89C%XfQmHne`@CEM*O_Ib)sJd6_rR!&d$S5|_eYHDI>n}suUlDMr6$| z{0(}v=_M5;ZP%F{e}?rdlW^^Qe6oocD#E=`VC}#IkD#ra%bTeF&#K7MTF9V;DS(+h zKtJPLcZVZM=ZZ;2=2G}Q56g^nr)pZhr)^>)hUL`~;G&G=1;%hQTEZ81XG~N2o`0CH8==a;d$Az{3>V$yYs8_3Q2HTy3S@ z%<<}SOy!<2l_#kA39ifGmBi5|qbzP!{Fpf%^39z6X;zmkTY8Jh_aV~!RjJ=yEP99% zi88oBf_L@l-?!Hl=vMG(o(@6wi?v&+0KRj-2)D+Ohmor&!qk0KzkIX9~ji zuy2^Cz-HZ??}Cqx#dwgDD*Z{Z{1|tRor^NiDH=@ zTX71^NMI8`!UscGXW|Z__L=QcC6CFGx_=VqjOW_3v{)jt@d7QV^1{2I!6)S$fzE5E z*0k#zXPS6mK_VlN#BMw&Vad;W)}E4S8R3+im6^$3c$4i{MpTKV7X%=VKfMXYK>&5n zrD?}!sG;*OBuiA62QxMZ$7Mm5J!_Ore^`$CTWGGHID+u0g#+c!L5h~@>fq|=OOt`$ zp17{}6RxCMzcJpyplZ`dvB76G{k)?-ZyN{v037~*m1}r{cZNHCI_q_?MaI`s1CK+& z{{ZXP9NOlsX#`DcbnzX^K3+eF3hwle7I=c?5I|c*GY|u92+9ue4dIF=K+CI2;kqFi#cd%Nfljp>mHh zZaO0t@ohBCMnNP7NR9;2WP`NwymwML6)cBI8~*^a7s^M$D7=D4LNmx9et%kKiFH1! zZlh6eJjAkXEC%6#2*Y*nSlZlaYdl|P8#kyY6zbD_`<^{U#x9!>QD+$YMYY7G?G7L ztZlU~vQF^-0HZs6q=AvhIISU3^K!E3B`0YdQX32Dgtzk?kC-a~v9Z)K1L(s(wbGqY zE%Sz5qn-{&Kc#beF2Ad49w$B zYkzkhl^Ubq7AK(UK=!TOD&~8kGTb)s3gBRO9@OOxE^_i)Q(6{+c(&n`#JgPM_hq1bBv48Az~EUi3goN@yX!@YCurc0(qWpg53ZWvNMe-5?lJ{z7}?K0Bh?Yzk) zT(Rk%aoZK0;)xk;)-jE~U>tyOIOtDG)lJ^TQIm4zX9kz0UF6iG^Cghs8~_UXgZ1^S zNi{f@>EXMG!h^Xv=ie3AtWlWSBDqC!0PT#|JK{S#Xrm1rU=?*y*P-oIO{VoUm9Jwf zSc*rqjp93$o!K1xr=?`s-K~;<>I%r%IqmK%NAJ#6x))_5wgxai3dFtuAje`y>s?S< zaLm&JRAdiAc&|bn?k2ga LE?HP7rilO9RQgmo literal 0 HcmV?d00001 From 4d0961e833cba78205a543b5bcedfa976e2a82f0 Mon Sep 17 00:00:00 2001 From: Jack <32371937+jackzhxng@users.noreply.github.com> Date: Thu, 18 Sep 2025 15:47:55 -0400 Subject: [PATCH 033/395] Update Voxtral README.md (#14414) --- examples/models/voxtral/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/models/voxtral/README.md b/examples/models/voxtral/README.md index 0e7a095af45..a9bd5c9b1af 100644 --- a/examples/models/voxtral/README.md +++ b/examples/models/voxtral/README.md @@ -44,7 +44,7 @@ The Voxtral runner will do the following things: - Feed the formatted inputs to the multimodal modal runner. -# [Option A] Exporting the audio preprocessor +## Exporting the audio preprocessor The exported model takes in a mel spectrogram input tensor as its audio inputs. We provide a simple way to transform raw audio data into a mel spectrogram by exporting a version of Voxtral's audio preprocessor used directly by Transformers. From d40ce3f49af71bfb87786956be2aea5c382af51b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?M=C3=A5ns=20Nilsson?= Date: Thu, 18 Sep 2025 22:36:05 +0200 Subject: [PATCH 034/395] Add support for non tensor inputs to portable executor runner (#14377) Additionally: * Fix issue in input file handling where vector reallocations could cause input_buffers pointers to point to garbage. * Enable pytest_sum_vgf pytest in Arm backend that need this support. cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 Co-authored-by: Zingo Andersen --- backends/arm/test/ops/test_acos.py | 11 ++---- backends/arm/test/ops/test_add.py | 10 ++---- backends/arm/test/ops/test_sum.py | 7 +++- backends/arm/test/tester/test_pipeline.py | 15 +++++++- .../executor_runner/executor_runner.cpp | 22 +++++++++--- extension/runner_util/inputs.cpp | 35 ++++++++++++++++++- 6 files changed, 75 insertions(+), 25 deletions(-) diff --git a/backends/arm/test/ops/test_acos.py b/backends/arm/test/ops/test_acos.py index 28dadcf95be..f078f46f98e 100644 --- a/backends/arm/test/ops/test_acos.py +++ b/backends/arm/test/ops/test_acos.py @@ -4,7 +4,6 @@ # LICENSE file in the root directory of this source tree. from typing import Tuple -import pytest import torch from executorch.backends.arm.test import common @@ -105,10 +104,7 @@ def test_acos_vgf_FP(test_data: Tuple): tosa_version="TOSA-1.0+FP", run_on_vulkan_runtime=True, ) - try: - pipeline.run() - except FileNotFoundError as e: - pytest.skip(f"VKML executor_runner not found - not built - skip {e}") + pipeline.run() @common.parametrize("test_data", test_data_suite) @@ -122,7 +118,4 @@ def test_acos_vgf_INT(test_data: Tuple): tosa_version="TOSA-1.0+INT", run_on_vulkan_runtime=True, ) - try: - pipeline.run() - except FileNotFoundError as e: - pytest.skip(f"VKML executor_runner not found - not built - skip {e}") + pipeline.run() diff --git a/backends/arm/test/ops/test_add.py b/backends/arm/test/ops/test_add.py index 24fdfbb5457..bb690d89f59 100644 --- a/backends/arm/test/ops/test_add.py +++ b/backends/arm/test/ops/test_add.py @@ -211,10 +211,7 @@ def test_add_tensor_vgf_FP(test_data: input_t1): tosa_version="TOSA-1.0+FP", run_on_vulkan_runtime=True, ) - try: - pipeline.run() - except FileNotFoundError as e: - pytest.skip(f"VKML executor_runner not found - not built - skip {e}") + pipeline.run() @common.parametrize("test_data", Add.test_data) @@ -228,10 +225,7 @@ def test_add_tensor_vgf_INT(test_data: input_t1): tosa_version="TOSA-1.0+INT", run_on_vulkan_runtime=True, ) - try: - pipeline.run() - except FileNotFoundError as e: - pytest.skip(f"VKML executor_runner not found - not built - skip {e}") + pipeline.run() def get_symmetric_a16w8_add_quantizer(per_channel_quantization=False): diff --git a/backends/arm/test/ops/test_sum.py b/backends/arm/test/ops/test_sum.py index 9308315f76d..45f3a1f2267 100644 --- a/backends/arm/test/ops/test_sum.py +++ b/backends/arm/test/ops/test_sum.py @@ -94,7 +94,11 @@ def test_view_u85_INT_1_0(test_data: Tuple): @common.SkipIfNoModelConverter def test_sum_dim_intlist_vgf_FP(test_data: input_t1): pipeline = VgfPipeline[input_t1]( - Sum(), test_data(), aten_op, tosa_version="TOSA-1.0+FP" + Sum(), + test_data(), + aten_op, + tosa_version="TOSA-1.0+FP", + run_on_vulkan_runtime=True, ) pipeline.run() @@ -107,6 +111,7 @@ def test_sum_dim_intlist_vgf_INT(test_data: input_t1): test_data(), aten_op, tosa_version="TOSA-1.0+INT", + run_on_vulkan_runtime=True, ) pipeline.run() diff --git a/backends/arm/test/tester/test_pipeline.py b/backends/arm/test/tester/test_pipeline.py index 123c1af44c3..b0446f948c0 100644 --- a/backends/arm/test/tester/test_pipeline.py +++ b/backends/arm/test/tester/test_pipeline.py @@ -906,7 +906,7 @@ class VgfPipeline(BasePipelineMaker, Generic[T]): exir_ops: Exir dialect ops expected to be found in the graph after to_edge. if not using use_edge_to_transform_and_lower. - run_on_vulkan_runtime: Set to true to test VGF output on VKML runtime. + run_on_vulkan_runtime: Whether to test VGF output on VKML runtime. vgf_compiler_flags: Optional compiler flags. @@ -1018,3 +1018,16 @@ def __init__( qtol=qtol, inputs=self.test_data, ) + self.run_on_vulkan_runtime = run_on_vulkan_runtime + + # TODO: Remove once CI fully working + def run(self): + import pytest + + if self.run_on_vulkan_runtime: + try: + super().run() + except FileNotFoundError as e: + pytest.skip(f"VKML executor_runner not found - not built - skip {e}") + else: + super().run() diff --git a/examples/portable/executor_runner/executor_runner.cpp b/examples/portable/executor_runner/executor_runner.cpp index 4f4208a5b53..5ce872eec8e 100644 --- a/examples/portable/executor_runner/executor_runner.cpp +++ b/examples/portable/executor_runner/executor_runner.cpp @@ -175,21 +175,33 @@ int main(int argc, char** argv) { std::vector> input_buffers; std::stringstream list_of_input_files(FLAGS_inputs); - std::string token; + std::string path; + + // First reserve memory for number of vector elements to avoid vector + // reallocations when emplacing back. + std::vector file_paths; + while (std::getline(list_of_input_files, path, ',')) { + file_paths.push_back(std::move(path)); + } + inputs_storage.reserve(file_paths.size()); + + for (const auto& file_path : file_paths) { + std::ifstream input_file_handle( + file_path, std::ios::binary | std::ios::ate); - while (std::getline(list_of_input_files, token, ',')) { - std::ifstream input_file_handle(token, std::ios::binary | std::ios::ate); if (!input_file_handle) { - ET_LOG(Error, "Failed to open input file: %s\n", token.c_str()); + ET_LOG(Error, "Failed to open input file: %s\n", file_path.c_str()); return 1; } std::streamsize file_size = input_file_handle.tellg(); input_file_handle.seekg(0, std::ios::beg); + // Reserve memory for actual file contents. inputs_storage.emplace_back(file_size, '\0'); + if (!input_file_handle.read(&inputs_storage.back()[0], file_size)) { - ET_LOG(Error, "Failed to read input file: %s\n", token.c_str()); + ET_LOG(Error, "Failed to read input file: %s\n", file_path.c_str()); return 1; } diff --git a/extension/runner_util/inputs.cpp b/extension/runner_util/inputs.cpp index eceaf3cfeca..c1112489afb 100644 --- a/extension/runner_util/inputs.cpp +++ b/extension/runner_util/inputs.cpp @@ -78,7 +78,40 @@ Result prepare_input_tensors( continue; } if (tag.get() != Tag::Tensor) { - ET_LOG(Debug, "Skipping non-tensor input %zu", i); + if (!hard_code_inputs_to_ones) { + Error err = Error::Ok; + auto [buffer, buffer_size] = input_buffers.at(i); + + ET_LOG( + Debug, "Verifying and setting input for non-tensor input %zu", i); + + if (tag.get() == Tag::Int) { + int64_t int_input; + std::memcpy(&int_input, buffer, buffer_size); + err = method.set_input(runtime::EValue(int_input), i); + } else if (tag.get() == Tag::Double) { + double double_input; + std::memcpy(&double_input, buffer, buffer_size); + err = method.set_input(runtime::EValue(double_input), i); + } else if (tag.get() == Tag::Bool) { + bool bool_input; + std::memcpy(&bool_input, buffer, buffer_size); + err = method.set_input(runtime::EValue(bool_input), i); + } else { + ET_LOG( + Error, + "Input %zu of type %zu not supported", + i, + static_cast(tag.get())); + err = Error::InvalidArgument; + } + if (err != Error::Ok) { + BufferCleanup cleanup({inputs, num_allocated}); + return err; + } + } else { + ET_LOG(Debug, "Skipping non-tensor input %zu", i); + } continue; } Result tensor_meta = method_meta.input_tensor_meta(i); From c07521aa21fb6c7e7d4d3f3020d2584d70341354 Mon Sep 17 00:00:00 2001 From: Rohan Joshi Date: Thu, 18 Sep 2025 13:46:43 -0700 Subject: [PATCH 035/395] Targets for Qualcomm llm eval Differential Revision: D82655293 Pull Request resolved: https://github.com/pytorch/executorch/pull/14381 --- examples/qualcomm/oss_scripts/llama/TARGETS | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/examples/qualcomm/oss_scripts/llama/TARGETS b/examples/qualcomm/oss_scripts/llama/TARGETS index 10462595c56..51315df3ed2 100644 --- a/examples/qualcomm/oss_scripts/llama/TARGETS +++ b/examples/qualcomm/oss_scripts/llama/TARGETS @@ -26,6 +26,16 @@ runtime.python_library( ], ) +runtime.python_library( + name = "masking_utils", + srcs = [ + "masking_utils.py", + ], + deps = [ + "//caffe2:torch", + ], +) + runtime.python_library( name = "decoder_constants", srcs = [ @@ -39,6 +49,7 @@ runtime.python_library( deps = [ ":decoder_constants", ":decoder_utils", + ":masking_utils", "//executorch/examples/models/llama:source_transformation", "//caffe2:torch", "//executorch/backends/qualcomm/partition:partition", @@ -90,6 +101,7 @@ python_binary( "//executorch/examples/qualcomm/oss_scripts/llama:range_setting_pt2e", "fbsource//third-party/pypi/lm-eval:lm-eval", ], + keep_gpu_sections = True, ) runtime.command_alias( From c00612fba98a6e491816ea9d36f69a46eaec7c7d Mon Sep 17 00:00:00 2001 From: Mitch Bailey <57704435+jmahbs@users.noreply.github.com> Date: Thu, 18 Sep 2025 21:53:55 +0100 Subject: [PATCH 036/395] Arm Backend: Expose PMU trace output from FVP run (#14401) Exposes PMU trace output from an FVP. This lays part of the foundation to enable us to use this output to as a data overlay in Model Explorer visualisations. The end goal here is to be able to visualise some profiling data in Model Explorer using our Tosa Flatbuffer adapter. To enable this we need to implement a few changes: 1. Expose PMU trace output from a FVP. This gives us performance data from an FVP run. (This PR) 2. Expose Vela's debug database. This gives us generic information on operators in a our model, and can be combined with the trace output to provide more detailed profiling analysis 3. Write a script to combine the trace output and the debug database so we can visualise it in Model Explorer in Executorch. Here's a snippet of the PMU trace output: ``` { "name": "axi_enabled_cycles", "ph": "X", "ts": "1029", "pid": "DMA", "tid": "axi_enabled_cycles", "dur": "1014" } ``` cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 --- backends/arm/scripts/run_fvp.sh | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/backends/arm/scripts/run_fvp.sh b/backends/arm/scripts/run_fvp.sh index 0f76d0496de..5d3088c865a 100755 --- a/backends/arm/scripts/run_fvp.sh +++ b/backends/arm/scripts/run_fvp.sh @@ -22,6 +22,7 @@ data_file="" target="ethos-u55-128" timeout="600" etrecord_file="" +trace_file="" help() { echo "Usage: $(basename $0) [options]" @@ -31,6 +32,7 @@ help() { echo " --target= Target to build and run for Default: ${target}" echo " --timeout= Maximum target runtime, used to detect hanging, might need to be higer on large models Default: ${timeout}" echo " --etrecord= If ETDump is used you can supply a ETRecord file matching the PTE" + echo " --trace_file= File to write PMU trace output to" exit 0 } @@ -42,6 +44,7 @@ for arg in "$@"; do --target=*) target="${arg#*=}";; --timeout=*) timeout="${arg#*=}";; --etrecord=*) etrecord_file="${arg#*=}";; + --trace_file=*) trace_file="${arg#*=}";; *) ;; esac @@ -86,6 +89,14 @@ fi log_file=$(mktemp) +extra_args_u55=() +extra_args_u85=() + +if [[ -n "${trace_file}" ]]; then + extra_args_u55+=(-C "ethosu.extra_args=--pmu-trace ${trace_file}") + extra_args_u85+=(-C "mps4_board.subsystem.ethosu.extra_args=--pmu-trace ${trace_file}") +fi + if [[ ${target} == *"ethos-u55"* ]]; then ${nobuf} ${fvp_model} \ -C ethosu.num_macs=${num_macs} \ @@ -93,6 +104,7 @@ if [[ ${target} == *"ethos-u55"* ]]; then -C mps3_board.telnetterminal0.start_telnet=0 \ -C mps3_board.uart0.out_file='-' \ -C mps3_board.uart0.shutdown_on_eot=1 \ + "${extra_args_u55[@]}" \ -a "${elf_file}" \ ${data_file} \ --timelimit ${timeout} 2>&1 | sed 's/\r$//' | tee ${log_file} || true # seconds @@ -105,6 +117,7 @@ elif [[ ${target} == *"ethos-u85"* ]]; then -C mps4_board.telnetterminal0.start_telnet=0 \ -C mps4_board.uart0.out_file='-' \ -C mps4_board.uart0.shutdown_on_eot=1 \ + "${extra_args_u85[@]}" \ -a "${elf_file}" \ ${data_file} \ --timelimit ${timeout} 2>&1 | sed 's/\r$//' | tee ${log_file} || true # seconds From 2640a86ea21856c3b2980596dfd8b1a89af4d4b2 Mon Sep 17 00:00:00 2001 From: Anthony Shoumikhin Date: Thu, 18 Sep 2025 14:00:19 -0700 Subject: [PATCH 037/395] Fix message truncating logic to respect UTF8 encoding. (#14394) --- runtime/platform/log.cpp | 54 ++++++++++++++++---- runtime/platform/test/CMakeLists.txt | 8 ++- runtime/platform/test/logging_test.cpp | 69 +++++++++++++++++++++++--- runtime/platform/test/targets.bzl | 1 + 4 files changed, 115 insertions(+), 17 deletions(-) diff --git a/runtime/platform/log.cpp b/runtime/platform/log.cpp index b338ee10a71..a09987271e7 100644 --- a/runtime/platform/log.cpp +++ b/runtime/platform/log.cpp @@ -59,6 +59,38 @@ static_assert( kLevelToPal[size_t(LogLevel::Fatal)] == et_pal_log_level_t::kFatal, ""); +#if ET_LOG_ENABLED +static size_t get_valid_utf8_prefix_length(const char* bytes, size_t length) { + if (!bytes || length == 0) { + return 0; + } + const auto* data = reinterpret_cast(bytes); + size_t i = length; + while (i > 0 && (data[i - 1] & 0xC0) == 0x80) { + --i; + } + if (i == 0) { + return 0; + } + const size_t lead_pos = i - 1; + const unsigned char lead = data[lead_pos]; + size_t need = 0; + + if (lead < 0x80) { + need = 1; + } else if ((lead & 0xE0) == 0xC0) { + need = 2; + } else if ((lead & 0xF0) == 0xE0) { + need = 3; + } else if ((lead & 0xF8) == 0xF0) { + need = 4; + } else { + return lead_pos; + } + return length - lead_pos == need ? length : lead_pos; +} +#endif // ET_LOG_ENABLED + /** * Log a string message. * @@ -84,20 +116,24 @@ void vlogf( // Maximum length of a log message. static constexpr size_t kMaxLogMessageLength = 256; - char buf[kMaxLogMessageLength]; - size_t len = vsnprintf(buf, kMaxLogMessageLength, format, args); - if (len >= kMaxLogMessageLength - 1) { - buf[kMaxLogMessageLength - 2] = '$'; - len = kMaxLogMessageLength - 1; - } - buf[kMaxLogMessageLength - 1] = 0; + char buffer[kMaxLogMessageLength]; + + const auto write_count = + vsnprintf(buffer, kMaxLogMessageLength, format, args); + const size_t used_length = (write_count < 0) + ? 0 + : (write_count >= static_cast(kMaxLogMessageLength) + ? kMaxLogMessageLength - 1 + : static_cast(write_count)); + const auto valid_length = get_valid_utf8_prefix_length(buffer, used_length); + buffer[valid_length] = '\0'; - et_pal_log_level_t pal_level = (level < LogLevel::NumLevels) + const auto pal_level = (level < LogLevel::NumLevels) ? kLevelToPal[size_t(level)] : et_pal_log_level_t::kUnknown; pal_emit_log_message( - timestamp, pal_level, filename, function, line, buf, len); + timestamp, pal_level, filename, function, line, buffer, valid_length); #endif // ET_LOG_ENABLED } diff --git a/runtime/platform/test/CMakeLists.txt b/runtime/platform/test/CMakeLists.txt index 901fd0499cd..fee7566da3d 100644 --- a/runtime/platform/test/CMakeLists.txt +++ b/runtime/platform/test/CMakeLists.txt @@ -33,7 +33,13 @@ et_cxx_test( # # et_cxx_test(platform_death_test SOURCES executor_pal_death_test.cpp) -et_cxx_test(logging_test SOURCES logging_test.cpp) +# No weak function symbols Windows/MSVC, thus PAL intercept is not supported. +if(NOT WIN32) + et_cxx_test(logging_test SOURCES logging_test.cpp stub_platform.cpp) + set_source_files_properties( + logging_test.cpp PROPERTIES COMPILE_DEFINITIONS "ET_MIN_LOG_LEVEL=Debug" + ) +endif() # TODO: Re-enable this test on OSS # diff --git a/runtime/platform/test/logging_test.cpp b/runtime/platform/test/logging_test.cpp index d44cd2d5e71..3ddc506c062 100644 --- a/runtime/platform/test/logging_test.cpp +++ b/runtime/platform/test/logging_test.cpp @@ -10,24 +10,79 @@ #include #include +#include +#include using namespace executorch::runtime; -class LoggingTest : public ::testing::Test { - public: - static void SetUpTestSuite() { - // Initialize runtime. - runtime_init(); - } -}; +class LoggingTest : public ::testing::Test {}; TEST_F(LoggingTest, LogLevels) { + PalSpy spy; + InterceptWith iw(spy); + ET_LOG(Debug, "Debug log."); + EXPECT_EQ(spy.last_log_message_args.message, "Debug log."); + ET_LOG(Info, "Info log."); + EXPECT_EQ(spy.last_log_message_args.message, "Info log."); + ET_LOG(Error, "Error log."); + EXPECT_EQ(spy.last_log_message_args.message, "Error log."); + ET_LOG(Fatal, "Fatal log."); + EXPECT_EQ(spy.last_log_message_args.message, "Fatal log."); } TEST_F(LoggingTest, LogFormatting) { + PalSpy spy; + InterceptWith iw(spy); + ET_LOG(Info, "Sample log with integer: %u", 100); + EXPECT_EQ(spy.last_log_message_args.message, "Sample log with integer: 100"); +} + +static std::string get_prefix(std::size_t length, bool use_multibyte) { + if (!use_multibyte) { + return std::string(length, 'A'); + } + std::ostringstream result; + result << std::string(length % 4, 'A'); + std::size_t remaining = length - (length % 4); + while (remaining > 0) { + result << "\xF0\x9F\x91\x8D"; + remaining -= 4; + } + return result.str(); +} + +TEST_F(LoggingTest, Utf8Truncation) { + PalSpy spy; + InterceptWith iw(spy); + + const char euro[] = "\xE2\x82\xAC"; + const char thumbs_up[] = "\xF0\x9F\x91\x8D"; + const char e_acute[] = "\xC3\xA9"; + const char capital_a_tilde[] = "\xC3\x83"; + + struct TruncCase { + size_t prefix_length; + const char* codepoint; + }; + const TruncCase cases[] = { + {253, euro}, + {252, thumbs_up}, + {254, e_acute}, + {254, capital_a_tilde}, + }; + for (bool use_multibyte_prefix : {false, true}) { + for (const auto& c : cases) { + const std::string prefix = + get_prefix(c.prefix_length, use_multibyte_prefix); + const std::string suffix = "_SHOULD_BE_CUT"; + ET_LOG(Info, "%s%s%s", prefix.c_str(), c.codepoint, suffix.c_str()); + EXPECT_EQ(spy.last_log_message_args.message, prefix); + EXPECT_EQ(spy.last_log_message_args.length, prefix.size()); + } + } } diff --git a/runtime/platform/test/targets.bzl b/runtime/platform/test/targets.bzl index 6a46eb29f4b..a5d77ef5a4e 100644 --- a/runtime/platform/test/targets.bzl +++ b/runtime/platform/test/targets.bzl @@ -84,6 +84,7 @@ def define_common_targets(): "logging_test.cpp", ], deps = [ + ":stub_platform", "//executorch/runtime/platform:platform", ], compiler_flags = [ From f6b5380351e42021f213aa703baf4aaaeb9c02a6 Mon Sep 17 00:00:00 2001 From: Jack <32371937+jackzhxng@users.noreply.github.com> Date: Thu, 18 Sep 2025 17:26:02 -0400 Subject: [PATCH 038/395] Bump Optimum ET pin (#14333) --- .ci/docker/ci_commit_pins/optimum-executorch.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.ci/docker/ci_commit_pins/optimum-executorch.txt b/.ci/docker/ci_commit_pins/optimum-executorch.txt index ef3282ba6cc..30b9427824f 100644 --- a/.ci/docker/ci_commit_pins/optimum-executorch.txt +++ b/.ci/docker/ci_commit_pins/optimum-executorch.txt @@ -1 +1 @@ -40b02a2dc61bbf901a2df91719f47c98d65368ec +828ae02053a6e0e20a2dfd6e737ba10c6f4dee6b From 0f2206253f71b06c0dc813312a60a0ab619a67e6 Mon Sep 17 00:00:00 2001 From: Hansong Zhang <107070759+kirklandsign@users.noreply.github.com> Date: Thu, 18 Sep 2025 15:10:46 -0700 Subject: [PATCH 039/395] Use fbjni 0.7.0 and set ANDROID_SUPPORT_FLEXIBLE_PAGE_SIZES=ON (#14418) Partial fix for https://github.com/pytorch/executorch/issues/11597 We can upgrade to NDK 28 when fbjni upgrades. --- docs/source/using-executorch-android.md | 4 ++-- examples/demo-apps/android/LlamaDemo/app/build.gradle.kts | 2 +- extension/android/CMakeLists.txt | 2 +- extension/android/build.gradle | 2 +- extension/android/executorch_android/build.gradle | 2 +- extension/benchmark/android/benchmark/app/build.gradle.kts | 2 +- scripts/build_android_library.sh | 1 + 7 files changed, 8 insertions(+), 7 deletions(-) diff --git a/docs/source/using-executorch-android.md b/docs/source/using-executorch-android.md index 23513302063..6f0c5dad736 100644 --- a/docs/source/using-executorch-android.md +++ b/docs/source/using-executorch-android.md @@ -83,7 +83,7 @@ To add the AAR file to your app: An AAR file itself does not contain dependency info, unlike the Maven one which bundled with pom.xml. The Java package requires `fbjni` and `soloader`, and currently requires users to explicitly declare the dependency. Therefore, two more `dependencies` in gradle rule is required: ``` implementation("com.facebook.soloader:soloader:0.10.5") -implementation("com.facebook.fbjni:fbjni:0.5.1") +implementation("com.facebook.fbjni:fbjni:0.7.0") ``` ### Example usage @@ -100,7 +100,7 @@ And include it in gradle: dependencies { implementation(files("libs/executorch.aar")) implementation("com.facebook.soloader:soloader:0.10.5") - implementation("com.facebook.fbjni:fbjni:0.5.1") + implementation("com.facebook.fbjni:fbjni:0.7.0") } ``` diff --git a/examples/demo-apps/android/LlamaDemo/app/build.gradle.kts b/examples/demo-apps/android/LlamaDemo/app/build.gradle.kts index 19cfda847db..beba2696c15 100644 --- a/examples/demo-apps/android/LlamaDemo/app/build.gradle.kts +++ b/examples/demo-apps/android/LlamaDemo/app/build.gradle.kts @@ -57,7 +57,7 @@ dependencies { implementation("androidx.appcompat:appcompat:1.6.1") implementation("androidx.camera:camera-core:1.3.0-rc02") implementation("androidx.constraintlayout:constraintlayout:2.2.0-alpha12") - implementation("com.facebook.fbjni:fbjni:0.5.1") + implementation("com.facebook.fbjni:fbjni:0.7.0") implementation("com.google.code.gson:gson:2.8.6") implementation(files("libs/executorch.aar")) implementation("com.google.android.material:material:1.12.0") diff --git a/extension/android/CMakeLists.txt b/extension/android/CMakeLists.txt index 2599d202e61..34a1d3d2fd0 100644 --- a/extension/android/CMakeLists.txt +++ b/extension/android/CMakeLists.txt @@ -30,7 +30,7 @@ endif() # libc++ dependencies are consistent. WARNING # Users need to use the SAME fbjni # version here and in app gradle dependency for runtime compatibility! if(NOT FBJNI_VERSION) - set(FBJNI_VERSION 0.5.1) + set(FBJNI_VERSION 0.7.0) endif() set(FBJNI_AAR_URL diff --git a/extension/android/build.gradle b/extension/android/build.gradle index 3a5d42e9838..86e53d5873f 100644 --- a/extension/android/build.gradle +++ b/extension/android/build.gradle @@ -6,7 +6,7 @@ allprojects { compileSdkVersion = 34 buildToolsVersion = '33.0.1' - fbjniJavaOnlyVersion = "0.5.1" + fbjniJavaOnlyVersion = "0.7.0" soLoaderNativeLoaderVersion = "0.10.5" } diff --git a/extension/android/executorch_android/build.gradle b/extension/android/executorch_android/build.gradle index 7d91cfd1194..e36044e3da5 100644 --- a/extension/android/executorch_android/build.gradle +++ b/extension/android/executorch_android/build.gradle @@ -49,7 +49,7 @@ task copyTestRes(type: Exec) { } dependencies { - implementation 'com.facebook.fbjni:fbjni:0.5.1' + implementation 'com.facebook.fbjni:fbjni:0.7.0' implementation 'com.facebook.soloader:nativeloader:0.10.5' implementation libs.core.ktx testImplementation 'junit:junit:4.12' diff --git a/extension/benchmark/android/benchmark/app/build.gradle.kts b/extension/benchmark/android/benchmark/app/build.gradle.kts index 4ee7efd1f97..7554164583a 100644 --- a/extension/benchmark/android/benchmark/app/build.gradle.kts +++ b/extension/benchmark/android/benchmark/app/build.gradle.kts @@ -42,7 +42,7 @@ android { dependencies { implementation(files("libs/executorch.aar")) implementation("com.facebook.soloader:soloader:0.10.5") - implementation("com.facebook.fbjni:fbjni:0.5.1") + implementation("com.facebook.fbjni:fbjni:0.7.0") implementation("com.google.code.gson:gson:2.8.6") implementation("org.json:json:20250107") implementation("androidx.core:core-ktx:1.13.1") diff --git a/scripts/build_android_library.sh b/scripts/build_android_library.sh index a50d15709bd..f88dbd2cfc4 100755 --- a/scripts/build_android_library.sh +++ b/scripts/build_android_library.sh @@ -36,6 +36,7 @@ build_android_native_library() { cmake . -DCMAKE_INSTALL_PREFIX="${CMAKE_OUT}" \ -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}/build/cmake/android.toolchain.cmake" \ + -DANDROID_SUPPORT_FLEXIBLE_PAGE_SIZES=ON \ --preset "android-${ANDROID_ABI}" \ -DANDROID_PLATFORM=android-26 \ -DEXECUTORCH_ENABLE_EVENT_TRACER="${EXECUTORCH_ANDROID_PROFILING:-OFF}" \ From c2ddeec5a555915d4fd2e9e6c0e3a755a46a6576 Mon Sep 17 00:00:00 2001 From: cccclai Date: Thu, 18 Sep 2025 16:07:42 -0700 Subject: [PATCH 040/395] Removing spamming log (#14423) As title, otherwise output like ``` E 00:00:03.830554 executorch:util.h:125] second_input_sizes[0] = 1 thereE 00:00:03.883490 executorch:util.h:125] second_input_sizes[0] = 1 isE 00:00:03.929477 executorch:util.h:125] second_input_sizes[0] = 1 noE 00:00:03.983967 executorch:util.h:125] second_input_sizes[0] = 1 ultimateE 00:00:04.033875 executorch:util.h:125] second_input_sizes[0] = 1 questionE 00:00:04.088452 executorch:util.h:125] second_input_sizes[0] = 1 .E 00:00:04.139406 executorch:util.h:125] second_input_sizes[0] = 1 ItE 00:00:04.191997 executorch:util.h:125] second_input_sizes[0] = 1 isE 00:00:04.241043 executorch:util.h:125] second_input_sizes[0] = 1 aE 00:00:04.289894 executorch:util.h:125] second_input_sizes[0] = 1 questionE 00:00:04.341772 executorch:util.h:125] second_input_sizes[0] = 1 thatE 00:00:04.399873 executorch:util.h:125] second_input_sizes[0] = 1 hasE 00:00:04.455845 executorch:util.h:125] second_input_sizes[0] = 1 noE 00:00:04.509937 executorch:util.h:125] second_input_sizes[0] = 1 questionE 00:00:04.555430 executorch:util.h:125] second_input_sizes[0] = 1 ``` ### Summary [PLEASE REMOVE] See [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests) for ExecuTorch PR guidelines. [PLEASE REMOVE] If this PR closes an issue, please add a `Fixes #` line. [PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests). ### Test plan [PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable. --- extension/llm/runner/util.h | 4 ---- 1 file changed, 4 deletions(-) diff --git a/extension/llm/runner/util.h b/extension/llm/runner/util.h index 513fd109255..8fb245107ab 100644 --- a/extension/llm/runner/util.h +++ b/extension/llm/runner/util.h @@ -121,10 +121,6 @@ inline runtime::Result populate_start_pos_or_cache_position( auto second_input_sizes = second_input_info.sizes(); auto numel = second_input_sizes[0]; - for (int i = 0; i < second_input_sizes.size(); ++i) { - ET_LOG(Error, "second_input_sizes[%d] = %d", i, second_input_sizes[i]); - } - TensorPtr start_pos_tensor; if (numel > 1) { // `cache_position` goes from start_pos to start_pos + From 5fd66ee788a663e47017b8c4b4c33a2d882aac4c Mon Sep 17 00:00:00 2001 From: haowhsu-quic <111341466+haowhsu-quic@users.noreply.github.com> Date: Fri, 19 Sep 2025 07:27:46 +0800 Subject: [PATCH 041/395] Qualcomm AI Engine Direct - fix sliding attention update bug Differential Revision: D82745889 Pull Request resolved: https://github.com/pytorch/executorch/pull/14411 --- examples/qualcomm/oss_scripts/llama/runner/kv_manager.cpp | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/examples/qualcomm/oss_scripts/llama/runner/kv_manager.cpp b/examples/qualcomm/oss_scripts/llama/runner/kv_manager.cpp index a049b54abb6..bd6d27d4b85 100644 --- a/examples/qualcomm/oss_scripts/llama/runner/kv_manager.cpp +++ b/examples/qualcomm/oss_scripts/llama/runner/kv_manager.cpp @@ -242,9 +242,8 @@ void KVManager::update_attention_mask( std::fill_n( cur_ptr, std::abs(n_past + ar_len) - avalible_cache_len, neg_val); } - - cur_ptr += metadata_.context_len; } + cur_ptr += metadata_.context_len; } } From d87306352b7339269dc70a5b9880aa6e822d2847 Mon Sep 17 00:00:00 2001 From: Siddartha Pothapragada Date: Thu, 18 Sep 2025 23:12:26 -0700 Subject: [PATCH 042/395] Summary: Add Stateful FC Cortex-m linearOps (#14252) Integrate with CMSIS-NN with per-channel quantization support Test Plan: With local changes :Run e2e test on FVP simulator ./examples/arm/run_mcu_models_fvp.sh --target=cortex-m55 --models=qlinear Reviewers: Subscribers: Tasks: Tags: ### Summary [PLEASE REMOVE] See [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests) for ExecuTorch PR guidelines. [PLEASE REMOVE] If this PR closes an issue, please add a `Fixes #` line. [PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests). ### Test plan [PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable. Co-authored-by: Github Executorch --- backends/cortex_m/CMakeLists.txt | 84 ++- .../ops/cmsis_scratch_buffer_context.h | 187 +++++ backends/cortex_m/ops/cortex_m_ops_common.h | 46 +- backends/cortex_m/ops/op_quantized_linear.cpp | 171 +++++ backends/cortex_m/ops/operators.py | 213 ++++++ backends/cortex_m/ops/operators.yaml | 12 + backends/cortex_m/passes/passes_utils.py | 59 ++ .../passes/quantized_linear_fusion_pass.py | 645 ++++++++++++++++++ .../passes/quantized_op_fusion_pass.py | 2 +- examples/arm/aot_arm_compiler.py | 32 +- 10 files changed, 1400 insertions(+), 51 deletions(-) create mode 100644 backends/cortex_m/ops/cmsis_scratch_buffer_context.h create mode 100644 backends/cortex_m/ops/op_quantized_linear.cpp create mode 100644 backends/cortex_m/passes/quantized_linear_fusion_pass.py diff --git a/backends/cortex_m/CMakeLists.txt b/backends/cortex_m/CMakeLists.txt index 1567b8b5e1c..bd12c7d8183 100644 --- a/backends/cortex_m/CMakeLists.txt +++ b/backends/cortex_m/CMakeLists.txt @@ -12,7 +12,7 @@ if(NOT CMAKE_CXX_STANDARD) set(CMAKE_CXX_STANDARD 17) endif() -# Source root directory for executorch. +# Source root directory for executorch if(NOT EXECUTORCH_ROOT) set(EXECUTORCH_ROOT ${CMAKE_CURRENT_SOURCE_DIR}/../..) endif() @@ -21,70 +21,90 @@ include(${EXECUTORCH_ROOT}/tools/cmake/Utils.cmake) include(${EXECUTORCH_ROOT}/tools/cmake/Codegen.cmake) include(FetchContent) -# CMSIS-NN version to download +# CMSIS-NN configuration with dynamic path detection set(CMSIS_NN_VERSION - "v4.1.0" + "v7.0.0" CACHE STRING "CMSIS-NN version to download" ) - -# Declare CMSIS-NN as a FetchContent project -FetchContent_Declare( - cmsis_nn - GIT_REPOSITORY https://github.com/ARM-software/CMSIS-NN.git - GIT_TAG ${CMSIS_NN_VERSION} +set(CMSIS_NN_LOCAL_PATH + "" + CACHE PATH "Path to existing local CMSIS-NN installation" ) -# Download and make CMSIS-NN available -FetchContent_MakeAvailable(cmsis_nn) +# Try to find existing / local CMSIS-NN installation. This is useful for +# debugging and testing with local changes. This is not common, as the CMSIS-NN +# library is downloaded via FetchContent in the default/regular case. +if(CMSIS_NN_LOCAL_PATH AND EXISTS "${CMSIS_NN_LOCAL_PATH}") + message(STATUS "Using CMSIS-NN from specified path: ${CMSIS_NN_LOCAL_PATH}") + add_subdirectory(${CMSIS_NN_LOCAL_PATH} cmsis_nn_build) +else() + # Use FetchContent with automatic fallback + message(STATUS "Using CMSIS-NN via FetchContent") + + FetchContent_Declare( + cmsis_nn + GIT_REPOSITORY https://github.com/ARM-software/CMSIS-NN.git + GIT_TAG ${CMSIS_NN_VERSION} + GIT_SHALLOW TRUE + ) + + FetchContent_GetProperties(cmsis_nn) + if(NOT cmsis_nn_POPULATED) + FetchContent_Populate(cmsis_nn) + add_subdirectory(${cmsis_nn_SOURCE_DIR} ${cmsis_nn_BINARY_DIR}) + endif() +endif() -# Print paths for debugging -message(STATUS "CMSIS-NN source dir: ${cmsis_nn_SOURCE_DIR}") -message(STATUS "CMSIS-NN binary dir: ${cmsis_nn_BINARY_DIR}") +# Add MVEI define to cmsis-nn target +if(TARGET cmsis-nn) + target_compile_definitions(cmsis-nn PUBLIC ARM_MATH_MVEI=1) + get_target_property(CMSIS_NN_INCLUDES cmsis-nn INTERFACE_INCLUDE_DIRECTORIES) + message(STATUS "CMSIS-NN include dirs: ${CMSIS_NN_INCLUDES}") +else() + message( + FATAL_ERROR + "CMSIS-NN target not found. Check your CMSIS_NN_LOCAL_PATH or network connection." + ) +endif() # Cortex-M ops kernel sources set(_cortex_m_kernels__srcs ${CMAKE_CURRENT_SOURCE_DIR}/ops/op_quantize_per_tensor.cpp ${CMAKE_CURRENT_SOURCE_DIR}/ops/op_dequantize_per_tensor.cpp ${CMAKE_CURRENT_SOURCE_DIR}/ops/op_quantized_add.cpp + ${CMAKE_CURRENT_SOURCE_DIR}/ops/op_quantized_linear.cpp ) -# Generate C++ bindings to register kernels into Executorch (for runtime) +# Generate C++ bindings to register kernels into Executorch set(_yaml_file ${CMAKE_CURRENT_LIST_DIR}/ops/operators.yaml) gen_selected_ops(LIB_NAME "cortex_m_ops_lib" OPS_SCHEMA_YAML "${_yaml_file}") - generate_bindings_for_kernels( LIB_NAME "cortex_m_ops_lib" CUSTOM_OPS_YAML "${_yaml_file}" ) -message("Generated files ${gen_command_sources}") -# Build a library for cortex_m_kernels +# Build library for cortex_m_kernels add_library(cortex_m_kernels ${_cortex_m_kernels__srcs}) -target_compile_options(cortex_m_kernels PUBLIC ${_common_compile_options}) -# Include directories for cortex_m_kernels -target_include_directories( +# Use PRIVATE for implementation dependencies to avoid INTERFACE pollution +target_link_libraries( cortex_m_kernels - PRIVATE ${EXECUTORCH_ROOT}/.. - ${EXECUTORCH_ROOT}/runtime/core/portable_type/c10 - ${cmsis_nn_SOURCE_DIR}/Include + PRIVATE cmsis-nn + PRIVATE executorch ) -# Link directly to the CMSIS-NN static library file -target_link_libraries( - cortex_m_kernels PUBLIC ${cmsis_nn_BINARY_DIR}/libcmsis-nn.a executorch +# Include directories for cortex_m_kernels +target_include_directories( + cortex_m_kernels PRIVATE ${EXECUTORCH_ROOT}/.. + ${EXECUTORCH_ROOT}/runtime/core/portable_type/c10 ) -# Add dependency to ensure CMSIS-NN builds before we try to link. Use the actual -# CMSIS-NN target name (usually 'cmsis-nn') -add_dependencies(cortex_m_kernels cmsis-nn) - # cortex_m_ops_lib: Register Cortex-M ops kernels into Executorch runtime gen_operators_lib( LIB_NAME "cortex_m_ops_lib" KERNEL_LIBS cortex_m_kernels DEPS executorch ) install( - TARGETS cortex_m_kernels cortex_m_ops_lib + TARGETS cortex_m_kernels cortex_m_ops_lib cmsis-nn EXPORT ExecuTorchTargets DESTINATION lib PUBLIC_HEADER DESTINATION include/executorch/backends/cortex_m/ops/ diff --git a/backends/cortex_m/ops/cmsis_scratch_buffer_context.h b/backends/cortex_m/ops/cmsis_scratch_buffer_context.h new file mode 100644 index 00000000000..4b9fdaebdf7 --- /dev/null +++ b/backends/cortex_m/ops/cmsis_scratch_buffer_context.h @@ -0,0 +1,187 @@ +/* + * Copyright (c) Meta Platforms, Inc. and affiliates. + * All rights reserved. + * + * This source code is licensed under the BSD-style license found in the + * LICENSE file in the root directory of this source tree. + */ +#pragma once + +#include "cortex_m_ops_common.h" +extern "C" { +#include "arm_nnfunctions.h" +} + +namespace cortex_m { +namespace native { + +// During AOT phase, quantized_linear_fusion_pass allocates total buffer +// and passes in as 'Tensor'. (Total buffer = 8-byte header + x bytes) +// ┌─────────────────┬─────────────────────────────────────┐ +// │ KernelSum Header│ CMSIS Workspace │ +// │ (8 bytes) │ (x bytes) │ +// └─────────────────┴─────────────────────────────────────┘ +// │ │ +// │ └─> Passed to CMSIS API +// │ +// └─> State for kernel sum + +// C++ Runtime: +// ┌─────────────────┬─────────────────────────────────────┐ +// │ KernelSum Header│ CMSIS Workspace │ +// │ (8 bytes) │ (x bytes) │ +// └─────────────────┴─────────────────────────────────────┘ +// ^ ^ +// │ │ +// scratch_ptr cmsis_workspace_ptr +// │ │ +// ▼ ▼ +// arm_vector_sum_s8() writes kernel sums (with bias if avail): +// [sum₀+bias₀][sum₁+bias₁][sum₂+bias₂]...[sum_{n-1}+bias_{n-1}] +// (n * 4-byte int32_t values = x bytes) +// +// - n = out_features (number of output features) +// - x = n * 4 bytes (total CMSIS buffer size) +// - Total buffer = 8 + x bytes + +class CMSISScratchBufferContext final { + public: + CMSISScratchBufferContext( + Tensor& scratch_buffer, + const Tensor& weights, + const Tensor& weight_zero_point, + const torch::executor::optional& bias) + : scratch_ptr_(scratch_buffer.mutable_data_ptr()), + total_size_(scratch_buffer.size(0)), + base_ptr_(reinterpret_cast(scratch_ptr_)), + in_features_(weights.size(1)), + out_features_(weights.size(0)), + is_per_channel_(weight_zero_point.numel() > 1), + weight_data_offset_(calculate_offset(weights.const_data_ptr())), + weight_zp_data_offset_( + calculate_offset(weight_zero_point.const_data_ptr())), + bias_data_offset_( + bias.has_value() + ? calculate_offset(bias.value().const_data_ptr()) + : 0), + header_(reinterpret_cast(scratch_ptr_)), + cmsis_workspace_ptr_(scratch_ptr_ + KERNEL_SUM_HEADER_SIZE) { + cmsis_nn_dims filter_dims = {in_features_, 1, 1, out_features_}; + validate_size(filter_dims); + } + + cmsis_nn_context get_cmsis_ctx() const { + cmsis_nn_context ctx; + ET_CHECK_MSG( + reinterpret_cast(cmsis_workspace_ptr_) % 4 == 0, + "CMSIS workspace not 4-byte aligned"); + ctx.buf = cmsis_workspace_ptr_; + ctx.size = get_cmsis_workspace_size(); + return ctx; + } + + bool is_kernel_sum_updated() const { + return header_->updated; + } + + void compute_kernel_sums_if_needed() { + if (!header_->updated) { + arm_vector_sum_s8( + reinterpret_cast(cmsis_workspace_ptr_), + in_features_, + out_features_, + get_weight_data(), + get_weight_zp_data()[0], + 0, + get_bias_data()); + header_->updated = true; + ET_LOG( + Info, + "Computed kernel sums. [required_bytes : %d]", + header_->required_size); + } + } + + const int8_t* get_weight_data() const { + return reinterpret_cast(base_ptr_ + weight_data_offset_); + } + + const int32_t* get_weight_zp_data() const { + return reinterpret_cast(base_ptr_ + weight_zp_data_offset_); + } + + const int32_t* get_bias_data() const { + return bias_data_offset_ == 0 + ? nullptr + : reinterpret_cast(base_ptr_ + bias_data_offset_); + } + + bool is_per_channel_quant() const { + return is_per_channel_; + } + int32_t get_in_features() const { + return in_features_; + } + int32_t get_out_features() const { + return out_features_; + } + + private: + static constexpr size_t KERNEL_SUM_HEADER_SIZE = 8; + + // Header for kernel sum computation state only + struct KernelSumHeader { + bool updated = false; + int32_t required_size = 0; + }; + static_assert( + sizeof(KernelSumHeader) == KERNEL_SUM_HEADER_SIZE, + "KernelSumHeader must be exactly 8 bytes"); + + int8_t* scratch_ptr_; + size_t total_size_; + uint8_t* base_ptr_; + + // Context members + const int32_t in_features_; + const int32_t out_features_; + const bool is_per_channel_; + const uint32_t weight_data_offset_; + const uint32_t weight_zp_data_offset_; + const uint32_t bias_data_offset_; + + KernelSumHeader* header_; + int8_t* cmsis_workspace_ptr_; + + uint32_t calculate_offset(const void* ptr) const { + if (ptr == nullptr) + return 0; + + const uint8_t* ptr_bytes = reinterpret_cast(ptr); + ET_CHECK_MSG(ptr_bytes >= base_ptr_, "Pointer is before base address"); + + const std::ptrdiff_t offset = ptr_bytes - base_ptr_; + ET_CHECK_MSG( + offset >= 0 && offset <= UINT32_MAX, "Offset out of valid range"); + return static_cast(offset); + } + + size_t get_cmsis_workspace_size() const { + return total_size_ - KERNEL_SUM_HEADER_SIZE; + } + + void validate_size(const cmsis_nn_dims& filter_dims) const { + header_->required_size = + arm_fully_connected_s8_get_buffer_size(&filter_dims); + + ET_CHECK_MSG( + get_cmsis_workspace_size() >= + static_cast(header_->required_size), + "Scratch buffer size %zu insufficient for required size %d", + get_cmsis_workspace_size(), + header_->required_size); + } +}; + +} // namespace native +} // namespace cortex_m diff --git a/backends/cortex_m/ops/cortex_m_ops_common.h b/backends/cortex_m/ops/cortex_m_ops_common.h index 5ef2d9d4bf9..eaa7027e46c 100644 --- a/backends/cortex_m/ops/cortex_m_ops_common.h +++ b/backends/cortex_m/ops/cortex_m_ops_common.h @@ -22,6 +22,10 @@ using ScalarType = executorch::aten::ScalarType; using Scalar = torch::executor::Scalar; using Error = executorch::runtime::Error; +// From arm_nn_math_types.h +#define ARM_NN_Q31_MAX ((int32_t)(0x7FFFFFFFL)) +#define ARM_NN_Q31_MIN ((int32_t)(0x80000000L)) + // Basic tensor type / layout validation and dimension order checking inline void validate_cmsis_nn_tensor_requirements( const Tensor& input1, @@ -32,16 +36,19 @@ inline void validate_cmsis_nn_tensor_requirements( // Basic dtype validation ET_CHECK_MSG( input1.scalar_type() == expected_dtype, - "Input1 dtype must be %hhd", - expected_dtype); + "Input1 dtype must be %hhd, got %hhd", + expected_dtype, + input1.scalar_type()); ET_CHECK_MSG( input2.scalar_type() == expected_dtype, - "Input2 dtype must be %hhd", - expected_dtype); + "Input2 dtype must be %hhd, got %hhd", + expected_dtype, + input2.scalar_type()); ET_CHECK_MSG( output.scalar_type() == expected_dtype, - "Output dtype must be %hhd", - expected_dtype); + "Output dtype must be %hhd, got %hhd", + expected_dtype, + output.scalar_type()); // Dim order consistency ET_CHECK_MSG( @@ -114,6 +121,33 @@ inline void validate_quantization_params( "Single quant Output"); } +// Refer to CMSIS-NN 'arm_nn_requantize' implementation for details: +// https://github.com/ARM-software/CMSIS-NN/blob/main/Include/arm_nnsupportfunctions.h#L1625 +// multiplier: Range {ARM_NN_Q31_MIN + 1, Q32_MAX} +// shift : Range {-31, 30} +inline bool validate_per_channel_quant_params( + const int32_t* multipliers, + const int32_t* shifts, + int num_channels) { + for (int i = 0; i < num_channels; ++i) { + // Multiplier: {ARM_NN_Q31_MIN + 1, ARM_NN_Q31_MAX} + if (multipliers[i] <= ARM_NN_Q31_MIN || multipliers[i] > ARM_NN_Q31_MAX) { + ET_LOG( + Error, + "weight_multiplier[%d] out of CMSIS-NN range: %d", + i, + multipliers[i]); + return false; + } + // Shift: {-31, 30} for arm_nn_requantize + if (shifts[i] < -31 || shifts[i] > 30) { + ET_LOG(Error, "weight_shift[%d] out of range: %d", i, shifts[i]); + return false; + } + } + return true; +} + inline Error resize_to_broadcast_target_size( const Tensor& input1, const Tensor& input2, diff --git a/backends/cortex_m/ops/op_quantized_linear.cpp b/backends/cortex_m/ops/op_quantized_linear.cpp new file mode 100644 index 00000000000..d1ccb6d0d45 --- /dev/null +++ b/backends/cortex_m/ops/op_quantized_linear.cpp @@ -0,0 +1,171 @@ +/* + * Copyright (c) Meta Platforms, Inc. and affiliates. + * All rights reserved. + * + * This source code is licensed under the BSD-style license found in the + * LICENSE file in the root directory of this source tree. + */ + +#include "cmsis_scratch_buffer_context.h" +#include "cortex_m_ops_common.h" + +extern "C" { +#include "arm_nnfunctions.h" +} + +namespace cortex_m { +namespace native { +using KernelRuntimeContext = torch::executor::KernelRuntimeContext; + +Tensor& quantized_linear_out( + KernelRuntimeContext& context, + const Tensor& input, + const Scalar& input_zero_point, + const Scalar& input_multiplier, + const Scalar& input_shift, + const Tensor& weights, + const Tensor& weight_zero_point, + const Tensor& weight_multiplier, + const Tensor& weight_shift, + const torch::executor::optional& bias, + const Tensor& bias_multiplier, + const Tensor& bias_shift, + const Tensor& scratch_buffer, + const Scalar& output_zero_point, + const Scalar& in_features, + const Scalar& out_features, + Tensor& out) { + ET_LOG(Info, "quantized_linear_out: called"); + validate_cmsis_nn_tensor_requirements(input, weights, out); + + ET_CHECK_MSG( + scratch_buffer.scalar_type() == ScalarType::Char, + "Scratch buffer must be int8"); + + const int32_t batch_size = input.size(0); + const int32_t in_feat = static_cast(in_features.to()); + const int32_t out_feat = static_cast(out_features.to()); + const int32_t input_zp = static_cast(input_zero_point.to()); + const int32_t output_zp = + static_cast(output_zero_point.to()); + const bool is_per_channel = (weight_zero_point.numel() > 1); + + const int8_t* input_data = input.const_data_ptr(); + const int8_t* weight_data = weights.const_data_ptr(); + const int32_t* bias_data = + bias.has_value() ? bias.value().const_data_ptr() : nullptr; + int8_t* output_data = out.mutable_data_ptr(); + const int32_t* weight_zp_data = weight_zero_point.const_data_ptr(); + const int32_t* weight_mult_data = weight_multiplier.const_data_ptr(); + const int32_t* weight_shift_data = weight_shift.const_data_ptr(); + + if (!validate_per_channel_quant_params( + weight_mult_data, weight_shift_data, out_feat)) { + context.fail(Error::InvalidArgument); + return out; + } + + // Initialize scratch buffer context (validates early) + CMSISScratchBufferContext scratch_ctx( + const_cast(scratch_buffer), weights, weight_zero_point, bias); + + scratch_ctx.compute_kernel_sums_if_needed(); + cmsis_nn_context ctx = scratch_ctx.get_cmsis_ctx(); + + // Setup CMSIS-NN parameters + cmsis_nn_fc_params fc_params; + fc_params.input_offset = -input_zp; + fc_params.output_offset = output_zp; + fc_params.activation.min = std::numeric_limits::min(); + fc_params.activation.max = std::numeric_limits::max(); + + cmsis_nn_dims input_dims = {1, 1, 1, in_feat}; + cmsis_nn_dims filter_dims = {in_feat, 1, 1, out_feat}; + cmsis_nn_dims bias_dims = {1, 1, 1, out_feat}; + cmsis_nn_dims output_dims = {1, 1, 1, out_feat}; + + arm_cmsis_nn_status status; + for (int32_t b = 0; b < batch_size; b++) { + const int8_t* batch_input = input_data + b * in_feat; + int8_t* batch_output = output_data + b * out_feat; + + ET_CHECK_MSG( + batch_input != nullptr && weight_data != nullptr, + "Null input pointers"); + ET_CHECK_MSG(in_feat > 0 && out_feat > 0, "Invalid dimensions"); + + if (is_per_channel) { + cmsis_nn_per_channel_quant_params per_channel_quant_params; + per_channel_quant_params.multiplier = + const_cast(weight_mult_data); + per_channel_quant_params.shift = const_cast(weight_shift_data); + + status = arm_fully_connected_per_channel_s8( + &ctx, + &fc_params, + &per_channel_quant_params, + &input_dims, + batch_input, + &filter_dims, + weight_data, + &bias_dims, + bias_data, + &output_dims, + batch_output); + } else { + fc_params.filter_offset = -weight_zp_data[0]; + cmsis_nn_per_tensor_quant_params per_tensor_quant_params; + per_tensor_quant_params.multiplier = weight_mult_data[0]; + per_tensor_quant_params.shift = weight_shift_data[0]; + + status = arm_fully_connected_s8( + &ctx, + &fc_params, + &per_tensor_quant_params, + &input_dims, + batch_input, + &filter_dims, + weight_data, + &bias_dims, + bias_data, + &output_dims, + batch_output); + } + + if (status != ARM_CMSIS_NN_SUCCESS) { + ET_LOG( + Error, + "quantized_linear_out: CMSIS-NN failed with status [%d]", + status); + context.fail(Error::Internal); + return out; + } + } + return out; +} + +// Functional variant (stub, not used at runtime) +Tensor quantized_linear( + KernelRuntimeContext& context, + const Tensor& input, + const Scalar& input_zero_point, + const Scalar& input_multiplier, + const Scalar& input_shift, + const Tensor& weights, + const Tensor& weight_zero_point, + const Tensor& weight_multiplier, + const Tensor& weight_shift, + const torch::executor::optional& bias, + const Tensor& bias_multiplier, + const Tensor& bias_shift, + const Tensor& scratch_buffer, + const Scalar& output_zero_point, + const Scalar& in_features, + const Scalar& out_features) { + ET_LOG(Info, "quantized_linear: called"); + assert(false); + return const_cast(input); +} + +} // namespace native +} // namespace cortex_m diff --git a/backends/cortex_m/ops/operators.py b/backends/cortex_m/ops/operators.py index 926dcd85e4b..d642531e950 100644 --- a/backends/cortex_m/ops/operators.py +++ b/backends/cortex_m/ops/operators.py @@ -223,3 +223,216 @@ def quantized_add_out_impl( out.copy_(result_quantized) return out + + +# =================================================================== +# QUANTIZED LINEAR OPERATION DEFINITION +# =================================================================== + + +def _check_per_tensor_or_per_channel(param: torch.Tensor, out_channels: int, name: str): + assert param.numel() in [ + 1, + out_channels, + ], f"{name} must be per-tensor (1) or per-channel ({out_channels}), got {param.numel()}" + + +lib.define( + "quantized_linear.out(" + "Tensor input, Scalar input_zero_point, Scalar input_multiplier, Scalar input_shift, " + "Tensor weights, " + "Tensor weight_zero_point, Tensor weight_multiplier, Tensor weight_shift, " + "Tensor? bias, Tensor bias_multiplier, Tensor bias_shift, " + "Tensor scratch_buffer, Scalar output_zero_point, Scalar in_features, Scalar out_features, " + "*, Tensor(a!) out) -> Tensor(a!)" +) + +# Define functional variant (non-out version) +lib.define( + "quantized_linear(" + "Tensor input, Scalar input_zero_point, Scalar input_multiplier, Scalar input_shift, " + "Tensor weights, " + "Tensor weight_zero_point, Tensor weight_multiplier, Tensor weight_shift, " + "Tensor? bias, Tensor bias_multiplier, Tensor bias_shift, " + "Tensor scratch_buffer, Scalar output_zero_point, Scalar in_features, Scalar out_features" + ") -> Tensor" +) + + +# Fake meta function for shape inference (out variant) +@register_fake("cortex_m::quantized_linear.out") +def quantized_linear_out_meta( + input: torch.Tensor, + input_zero_point: int, + input_multiplier: int, + input_shift: int, + weights: torch.Tensor, + weight_zero_point: torch.Tensor, + weight_multiplier: torch.Tensor, + weight_shift: torch.Tensor, + bias: torch.Tensor, + bias_multiplier: torch.Tensor, + bias_shift: torch.Tensor, + scratch_buffer: torch.Tensor, + output_zero_point: int, + in_features: int, + out_features: int, + out: torch.Tensor, +) -> torch.Tensor: + # Validate dimensions + batch_size = input.shape[0] + out_channels = weights.shape[0] + + # Validate weight quantization parameters dimensions + _check_per_tensor_or_per_channel( + weight_zero_point, out_channels, "weight_zero_point" + ) + _check_per_tensor_or_per_channel( + weight_multiplier, out_channels, "weight_multiplier" + ) + _check_per_tensor_or_per_channel(weight_shift, out_channels, "weight_shift") + + # Validate output shape + expected_shape = (batch_size, out_channels) + assert ( + out.shape == expected_shape + ), f"Output shape {out.shape} must be {expected_shape}" + + return out + + +# Fake meta function for shape inference (functional variant) +@register_fake("cortex_m::quantized_linear") +def quantized_linear_meta( + input: torch.Tensor, + input_zero_point: int, + input_multiplier: int, + input_shift: int, + weights: torch.Tensor, + weight_zero_point: torch.Tensor, + weight_multiplier: torch.Tensor, + weight_shift: torch.Tensor, + bias: torch.Tensor, + bias_multiplier: torch.Tensor, + bias_shift: torch.Tensor, + scratch_buffer: torch.Tensor, + output_zero_point: int, + in_features: int, + out_features: int, +) -> torch.Tensor: + # Validate dimensions (same as out variant) + batch_size = input.shape[0] + out_channels = weights.shape[0] + + # Validate weight quantization parameters dimensions + _check_per_tensor_or_per_channel( + weight_zero_point, out_channels, "weight_zero_point" + ) + _check_per_tensor_or_per_channel( + weight_multiplier, out_channels, "weight_multiplier" + ) + _check_per_tensor_or_per_channel(weight_shift, out_channels, "weight_shift") + + # Calculate output shape for functional variant + output_shape = (batch_size, out_channels) + return torch.empty(output_shape, dtype=input.dtype, device=input.device) + + +@impl(lib, "quantized_linear.out", "CompositeExplicitAutograd") +def quantized_linear_out_impl( + input: torch.Tensor, + input_zero_point: int, + input_multiplier: int, + input_shift: int, + weights: torch.Tensor, + weight_zero_point: torch.Tensor, + weight_multiplier: torch.Tensor, + weight_shift: torch.Tensor, + bias: torch.Tensor, + bias_multiplier: torch.Tensor, + bias_shift: torch.Tensor, + scratch_buffer: torch.Tensor, + output_zero_point: int, + in_features: int, + out_features: int, + *, + out: torch.Tensor, +) -> torch.Tensor: + """ + Fallback implementation for meta/testing + Note: This won't be called at runtime, only during compilation + """ + + # Per-channel dequantization + input_scale = input_multiplier * (2.0 ** (-input_shift)) + input_fp = (input.float() - input_zero_point) * input_scale + if weight_zero_point.numel() == 1: + # Per-tensor + weight_scale = weight_multiplier.item() * (2.0 ** (-weight_shift.item())) + weights_fp = (weights.float() - weight_zero_point.item()) * weight_scale + else: + # Per-channel + weight_scales = weight_multiplier.float() * (2.0 ** (-weight_shift.float())) + weights_fp = ( + weights.float() - weight_zero_point.float().unsqueeze(1) + ) * weight_scales.unsqueeze(1) + bias_fp = None + if bias is not None: + bias_scales = bias_multiplier.float() * (2.0 ** (-bias_shift.float())) + bias_fp = bias.float() * bias_scales + + result_fp = torch.nn.functional.linear(input_fp, weights_fp, bias_fp) + else: + result_fp = torch.nn.functional.linear(input_fp, weights_fp) + result_quantized = torch.clamp( + torch.round(result_fp + output_zero_point), -128, 127 + ).to(torch.int8) + out.copy_(result_quantized) + return out + + +# Functional variant implementation +@impl(lib, "quantized_linear", "CompositeExplicitAutograd") +def quantized_linear_impl( + input: torch.Tensor, + input_zero_point: int, + input_multiplier: int, + input_shift: int, + weights: torch.Tensor, + weight_zero_point: torch.Tensor, + weight_multiplier: torch.Tensor, + weight_shift: torch.Tensor, + bias: torch.Tensor, + bias_multiplier: torch.Tensor, + bias_shift: torch.Tensor, + scratch_buffer: torch.Tensor, + output_zero_point: int, + in_features: int, + out_features: int, +) -> torch.Tensor: + """ + Functional variant - creates output tensor and calls out variant + """ + # Create output tensor + batch_size = input.shape[0] + output = torch.empty( + (batch_size, out_features), dtype=torch.int8, device=input.device + ) + return quantized_linear_out_impl( + input, + input_zero_point, + input_multiplier, + input_shift, + weights, + weight_zero_point, + weight_multiplier, + weight_shift, + bias, + bias_multiplier, + bias_shift, + scratch_buffer, + output_zero_point, + in_features, + out_features, + out=output, + ) diff --git a/backends/cortex_m/ops/operators.yaml b/backends/cortex_m/ops/operators.yaml index f2615a1f525..b41c0c68fa5 100644 --- a/backends/cortex_m/ops/operators.yaml +++ b/backends/cortex_m/ops/operators.yaml @@ -27,3 +27,15 @@ kernels: - arg_meta: null kernel_name: cortex_m::quantized_add_out + +- func: cortex_m::quantized_linear(Tensor input, Scalar input_zero_point, Scalar input_multiplier, Scalar input_shift, Tensor weights, Tensor weight_zero_point, Tensor weight_multiplier, Tensor weight_shift, Tensor? bias, Tensor bias_multiplier, Tensor bias_shift, Tensor scratch_buffer, Scalar output_zero_point, Scalar in_features, Scalar out_features) -> Tensor + variants: function + kernels: + - arg_meta: null + kernel_name: cortex_m::quantized_linear + +- func: cortex_m::quantized_linear.out(Tensor input, Scalar input_zero_point, Scalar input_multiplier, Scalar input_shift, Tensor weights, Tensor weight_zero_point, Tensor weight_multiplier, Tensor weight_shift, Tensor? bias, Tensor bias_multiplier, Tensor bias_shift, Tensor scratch_buffer, Scalar output_zero_point, Scalar in_features, Scalar out_features, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: cortex_m::quantized_linear_out diff --git a/backends/cortex_m/passes/passes_utils.py b/backends/cortex_m/passes/passes_utils.py index 3f6e05fc4de..7155f997bf4 100644 --- a/backends/cortex_m/passes/passes_utils.py +++ b/backends/cortex_m/passes/passes_utils.py @@ -8,6 +8,10 @@ import torch +from executorch.exir.dialects._ops import ops as exir_ops + +from torch.fx import Node + def dequantize_per_tensor_cmsis( qtensor: torch.Tensor, zero_point: int, multiplier: int, shift: int @@ -92,3 +96,58 @@ def quantize_multiplier_aot(scale: float) -> tuple[int, int]: def cleanup_erased_nodes(graph_module: torch.fx.GraphModule): # Placeholder for any additional cleanup if needed pass + + +def transfer_metadata( + new_node: Node, source_node: Node, pass_name: str = "QuantizedPass" +) -> None: + """Transfer metadata with proper provenance tracking.""" + if hasattr(source_node, "meta") and source_node.meta: + new_node.meta = source_node.meta.copy() + if "from_node" in new_node.meta: + from_node_list = new_node.meta.get("from_node", []).copy() + from_node_list.append( + {"source": source_node.name, "pass": pass_name, "op": "fuse"} + ) + new_node.meta["from_node"] = from_node_list + for field in ["tensor_meta", "stack_trace"]: + if field in source_node.meta: + new_node.meta[field] = source_node.meta[field] + + +def is_dequant_node(node: Node) -> bool: + """Check if node is a dequantize operation.""" + dequant_targets = { + exir_ops.edge.cortex_m.dequantize_per_tensor.default, + exir_ops.edge.quantized_decomposed.dequantize_per_tensor.default, + exir_ops.edge.quantized_decomposed.dequantize_per_channel.default, + } + return node.op == "call_function" and node.target in dequant_targets + + +def is_quant_node(node: Node) -> bool: + """Check if node is a quantize operation.""" + quant_targets = { + exir_ops.edge.cortex_m.quantize_per_tensor.default, + exir_ops.edge.quantized_decomposed.quantize_per_tensor.default, + } + return node.op == "call_function" and node.target in quant_targets + + +def cleanup_nodes(nodes_to_erase, graph): + """Clean up marked nodes from graph.""" + failed_nodes = [] + + for node in reversed(nodes_to_erase): + if node in graph.nodes and len(node.users) == 0: + try: + graph.erase_node(node) + except Exception as e: + print(f"Warning: Failed to erase node {node}: {e}") + failed_nodes.append(node) + continue + + if failed_nodes: + print(f"Warning: {len(failed_nodes)} nodes could not be erased") + + return failed_nodes diff --git a/backends/cortex_m/passes/quantized_linear_fusion_pass.py b/backends/cortex_m/passes/quantized_linear_fusion_pass.py new file mode 100644 index 00000000000..8f8a90eec2f --- /dev/null +++ b/backends/cortex_m/passes/quantized_linear_fusion_pass.py @@ -0,0 +1,645 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +import logging +from typing import Optional + +import executorch.backends.cortex_m.ops.operators # noqa +import torch +import torch.fx + +from executorch.backends.cortex_m.passes.passes_utils import ( + cleanup_nodes, + is_dequant_node, + quantize_multiplier_aot, + transfer_metadata, +) + +from executorch.backends.transforms.utils import create_mutable_buffer, get_param_tensor +from executorch.exir import ExportedProgram +from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass +from torch.fx import Node +from torch.fx.passes.infra.pass_manager import PassResult + +logger = logging.getLogger("quantized_linear_fusion_pass") +logger.setLevel(logging.INFO) + + +class QuantizedLinearFusionPass(ExportPass): + """ + Cortex-M backend pass that fuses quantized linear-like patterns. + Fuses: dequantize -> [linear/addmm/fc_ops] -> quantize + Into: cortex_m.quantized_linear.default with direct parameters. + """ + + SUPPORTED_OPS_MAPPING = { + exir_ops.edge.aten.addmm.default: exir_ops.edge.cortex_m.quantized_linear.default, + exir_ops.edge.aten.mm.default: exir_ops.edge.cortex_m.quantized_linear.default, + } + + requires_exported_program = True + + def __init__(self, exported_program: ExportedProgram): + super().__init__() + self._exported_program = exported_program + self.nodes_to_erase = [] + + def call(self, graph_module: torch.fx.GraphModule) -> PassResult: + logger.info("Starting QuantizedLinearFusionPass") + assert id(self._exported_program.graph_module.graph) == id( + graph_module.graph + ), "QuantizedLinearFusionPass requires same graph instance" + + try: + fusion_count = self._fuse_quantized_linear_patterns(graph_module) + if fusion_count > 0: + graph_module.graph.eliminate_dead_code() + graph_module.graph.lint() + graph_module.recompile() + logger.info(f"Linear fusion completed: {fusion_count} patterns fused") + return PassResult(graph_module, fusion_count > 0) + except Exception as e: + logger.error(f"Error in QuantizedLinearFusionPass: {e}") + raise e + + def _extract_linear_pattern(self, quantize_node: Node): + if not quantize_node.args: + return None + fc_node = quantize_node.args[0] + if not ( + fc_node.op == "call_function" + and fc_node.target in self.SUPPORTED_OPS_MAPPING + ): + return None + + op_name = str(fc_node.target).split(".")[-1] + + if "addmm" in str(fc_node.target): + input_dq_node = fc_node.args[1] + else: + input_dq_node = fc_node.args[0] + if not is_dequant_node(input_dq_node): + logger.info("input_dq_node is not a dequant node") + return None + weight_dq_node, bias_dq_node = self._extract_weight_bias_from_fc_op(fc_node) + if not weight_dq_node: + logger.info("No weight, bias dequantize node found") + return None + return ( + quantize_node, + fc_node, + input_dq_node, + weight_dq_node, + bias_dq_node, + op_name, + ) + + def _extract_weight_bias_from_fc_op(self, fc_node: Node): + """Generic extraction for FC-like operations.""" + + if "addmm" in str(fc_node.target): + if len(fc_node.args) >= 3: + bias_arg = fc_node.args[0] + weight_arg = fc_node.args[2] + weight_dq_node = self._trace_to_dequantize(weight_arg) + logger.info( + f"weight_arg: {weight_arg}, traced weight_dq_node: {weight_dq_node}" + ) + + if weight_dq_node is None: + logger.info("No weight dequantize node found ") + + # For bias, try to trace to dequantize but allow None (no-bias case) + bias_dq_node = self._trace_to_dequantize(bias_arg) + if bias_dq_node is None: + logger.info("No bias dequantize node found - likely no-bias linear") + return weight_dq_node, bias_dq_node + elif any(op in str(fc_node.target) for op in ["linear", "mm"]): + if len(fc_node.args) >= 2: + weight_arg = fc_node.args[1] + bias_arg = fc_node.args[2] if len(fc_node.args) > 2 else None + weight_dq_node = self._trace_to_dequantize(weight_arg) + bias_dq_node = self._trace_to_dequantize(bias_arg) if bias_arg else None + return weight_dq_node, bias_dq_node + return None, None + + def _extract_input_quantization_parameters( + self, input_dq_node: Node + ) -> Optional[dict]: + """Extract input quantization parameters from dequantize node.""" + try: + # Find the quantize operation that produces the int8 tensor + input_quantize_node = None + if hasattr(input_dq_node, "args") and input_dq_node.args: + quantize_candidate = input_dq_node.args[0] + if getattr( + quantize_candidate, "op", None + ) == "call_function" and "quantize" in str( + getattr(quantize_candidate, "target", "") + ): + input_quantize_node = quantize_candidate + + if not input_quantize_node: + logger.error("Could not find quantize node for input!") + return None + + # Extract input quantization parameters + input_scale = self._extract_param_value(input_dq_node.args[1]) + input_zero_point = int(self._extract_param_value(input_dq_node.args[2])) + input_multiplier, input_shift = quantize_multiplier_aot(input_scale) + + return { + "input_scale": input_scale, + "input_zero_point": input_zero_point, + "input_multiplier": input_multiplier, + "input_shift": input_shift, + "input_tensor": input_quantize_node, + } + except Exception as e: + logger.error(f"Failed to extract input quantization parameters: {e}") + return None + + def _extract_output_quantization_parameters( + self, quantize_node: Node + ) -> Optional[dict]: + """Extract output quantization parameters from quantize node.""" + try: + output_scale = self._extract_param_value(quantize_node.args[1]) + output_zero_point = int(self._extract_param_value(quantize_node.args[2])) + + return { + "output_scale": output_scale, + "output_zero_point": output_zero_point, + } + except Exception as e: + logger.error(f"Failed to extract output quantization parameters: {e}") + return None + + def _create_constant_parameter_buffer( + self, graph, quantize_node: Node, data: torch.Tensor, name: str + ): + """Create a parameter buffer""" + buffer_name = f"{name}_{id(quantize_node)}" + + setattr(graph.owning_module, buffer_name, data) + + # Create a get_attr node + with graph.inserting_before(quantize_node): + buffer_node = graph.create_node( + op="get_attr", target=buffer_name, name=buffer_name + ) + + # Set metadata + buffer_node.meta["val"] = data + + return buffer_node + + def _extract_weight_parameters(self, weight_dq_node: Node) -> Optional[dict]: + try: + weight_tensor = weight_dq_node.args[0] + weight_scale = weight_dq_node.args[1] + weight_zero_point = ( + weight_dq_node.args[2] if len(weight_dq_node.args) > 2 else None + ) + + weight_scale_data = self._extract_param_value(weight_scale) + weight_zp_data = ( + self._extract_param_value(weight_zero_point) + if weight_zero_point + else None + ) + + # Get actual tensor data to determine output features + weight_tensor_data = get_param_tensor(self._exported_program, weight_tensor) + out_features = weight_tensor_data.shape[0] + + # Handle both per-tensor and per-channel + if ( + isinstance(weight_scale_data, torch.Tensor) + and weight_scale_data.numel() > 1 + ): + # Per-channel: ensure we have the right number of elements + assert ( + weight_scale_data.numel() == out_features + ), f"Scale size {weight_scale_data.numel()} != out_features {out_features}" + + multipliers = [] + shifts = [] + for scale in weight_scale_data: + mult, shift = quantize_multiplier_aot(scale.item()) + multipliers.append(mult) + shifts.append(shift) + + weight_multiplier = torch.tensor(multipliers, dtype=torch.int32) + weight_shift = torch.tensor(shifts, dtype=torch.int32) + weight_zp_tensor = ( + weight_zp_data.int() + if weight_zp_data is not None + else torch.zeros(out_features, dtype=torch.int32) + ) + else: + # Per-tensor: create tensors with correct size for output features + scale_val = ( + weight_scale_data.item() + if isinstance(weight_scale_data, torch.Tensor) + else weight_scale_data + ) + mult, shift = quantize_multiplier_aot(scale_val) + + # Create tensors sized for out_features (not single element) + weight_multiplier = torch.full((out_features,), mult, dtype=torch.int32) + weight_shift = torch.full((out_features,), shift, dtype=torch.int32) + weight_zp_tensor = torch.full( + (out_features,), + weight_zp_data if weight_zp_data else 0, + dtype=torch.int32, + ) + + # Validate multipliers + for i, mult in enumerate(weight_multiplier): + if mult < (1 << 30) or mult > ((1 << 31) - 1): + logger.error( + f"Invalid multiplier[{i}]: {mult}, scale was: {weight_scale_data}" + ) + return None + + return { + "weight_tensor": weight_tensor, + "weight_zero_point_data": weight_zp_tensor, + "weight_multiplier_data": weight_multiplier, + "weight_shift_data": weight_shift, + } + except Exception as e: + logger.error(f"Failed to extract weight parameters: {e}") + return None + + def _extract_bias_parameters(self, bias_dq_node: Optional[Node]) -> Optional[dict]: + """ + Extract bias parameters for quantized linear fusion. + Handles both dequantized bias nodes and constant bias tensors. + Returns a dict with bias_tensor, bias_multiplier, and bias_shift. + """ + if not bias_dq_node: + # No bias present + return None + try: + # Case 1: Bias is a dequantize node + if hasattr(bias_dq_node, "op") and is_dequant_node(bias_dq_node): + bias_tensor = bias_dq_node.args[0] + bias_scale = bias_dq_node.args[1] + + bias_scale_data = self._extract_param_value(bias_scale) + + if ( + isinstance(bias_scale_data, torch.Tensor) + and bias_scale_data.numel() > 1 + ): + # Per-channel bias + bias_multipliers = [] + bias_shifts = [] + for scale_val in bias_scale_data.tolist(): + mult, shift = quantize_multiplier_aot(scale_val) + bias_multipliers.append(mult) + bias_shifts.append(shift) + return { + "bias_tensor": bias_tensor, + "bias_multiplier": bias_multipliers, + "bias_shift": bias_shifts, + } + else: + # Per-tensor bias + bias_scale_val = ( + bias_scale_data.item() + if isinstance(bias_scale_data, torch.Tensor) + else bias_scale_data + ) + bias_multiplier, bias_shift = quantize_multiplier_aot( + bias_scale_val + ) + return { + "bias_tensor": bias_tensor, + "bias_multiplier": bias_multiplier, + "bias_shift": bias_shift, + } + else: + # Case 2: Bias is a constant tensor (not dequantized) + # This can happen if bias is not quantized in the model + bias_tensor = bias_dq_node + # Use default multiplier/shift for unquantized bias + bias_multiplier = 1 + bias_shift = 0 + return { + "bias_tensor": bias_tensor, + "bias_multiplier": bias_multiplier, + "bias_shift": bias_shift, + } + except Exception as e: + logger.error(f"Failed to extract bias parameters: {e}") + return None + + def _prepare_bias_tensors( + self, bias_params: Optional[dict], out_features: int + ) -> tuple[torch.Tensor, torch.Tensor]: + """ + Prepare bias multiplier and shift tensors for kernel call. + Returns (bias_multiplier_tensor, bias_shift_tensor) both sized [out_features]. + """ + if bias_params: + bias_multiplier = bias_params["bias_multiplier"] + bias_shift = bias_params["bias_shift"] + + # Convert to tensors of the right size + if isinstance(bias_multiplier, int): + bias_multiplier_tensor = torch.full( + [out_features], bias_multiplier, dtype=torch.int32 + ) + elif isinstance(bias_multiplier, list): + assert ( + len(bias_multiplier) == out_features + ), f"Bias multiplier size {len(bias_multiplier)} != out_features {out_features}" + bias_multiplier_tensor = torch.tensor( + bias_multiplier, dtype=torch.int32 + ) + elif isinstance(bias_multiplier, torch.Tensor): + assert ( + bias_multiplier.numel() == out_features + ), f"Bias multiplier size {bias_multiplier.numel()} != out_features {out_features}" + bias_multiplier_tensor = bias_multiplier + else: + raise TypeError( + f"Unsupported bias_multiplier type: {type(bias_multiplier)}" + ) + + if isinstance(bias_shift, int): + bias_shift_tensor = torch.full( + [out_features], bias_shift, dtype=torch.int32 + ) + elif isinstance(bias_shift, list): + assert ( + len(bias_shift) == out_features + ), f"Bias shift size {len(bias_shift)} != out_features {out_features}" + bias_shift_tensor = torch.tensor(bias_shift, dtype=torch.int32) + elif isinstance(bias_shift, torch.Tensor): + assert ( + bias_shift.numel() == out_features + ), f"Bias shift size {bias_shift.numel()} != out_features {out_features}" + bias_shift_tensor = bias_shift + else: + raise TypeError(f"Unsupported bias_shift type: {type(bias_shift)}") + + return bias_multiplier_tensor, bias_shift_tensor + else: + # No bias: return zero tensors of correct shape + return ( + torch.zeros([out_features], dtype=torch.int32), + torch.zeros([out_features], dtype=torch.int32), + ) + + def _extract_param_value(self, node_or_value): + """ + Extract a scalar value from a Node or a direct float/int. + """ + if isinstance(node_or_value, (float, int)): + return node_or_value + # If it's a tensor, get its scalar value if possible + if isinstance(node_or_value, torch.Tensor): + return node_or_value.item() if node_or_value.numel() == 1 else node_or_value + # If it's a Node, use get_param_tensor + if hasattr(node_or_value, "op"): + tensor = get_param_tensor(self._exported_program, node_or_value) + return tensor.item() if tensor.numel() == 1 else tensor + raise TypeError(f"Unsupported parameter type: {type(node_or_value)}") + + def _calculate_cmsis_scratch_size(self, weight_tensor) -> int: + """Calculate CMSIS-NN scratch buffer size for quantized linear operations. + + Source: CMSIS-NN arm_fully_connected_s8_get_buffer_size() returns filter_dims->w * sizeof(int32_t). + This buffer stores pre-computed kernel sums (weight row sums) - one int32_t per output feature. + Same buffer size applies to both per-tensor and per-channel quantization paths since both use + identical kernel sum optimization in the underlying matrix multiplication. + """ + try: + print(f"weight_tensor type: {type(weight_tensor)}, value: {weight_tensor}") + weight_shape = get_param_tensor(self._exported_program, weight_tensor).shape + out_features = weight_shape[0] # filter_dims->w in CMSIS terms + + # CMSIS-NN implementation expects the following size + cmsis_buffer_size = out_features * 4 # sizeof(int32_t) + return cmsis_buffer_size + except Exception as e: + logger.error(f"Failed to calculate CMSIS scratch size: {e}") + return 2048 # Fallback + + def _create_scratch_buffer(self, graph, quantize_node: Node, weight_tensor): + cmsis_scratch = self._calculate_cmsis_scratch_size(weight_tensor) + + kernel_sum_header = 8 # sizeof(KernelSumHeader) + total_size = kernel_sum_header + cmsis_scratch + + logger.info( + f"Kernel sum header: {kernel_sum_header}, CMSIS buffer: {cmsis_scratch}, total: {total_size}" + ) + + return create_mutable_buffer( + self._exported_program, + name=f"b_cmsis_linear_scratch_{id(quantize_node)}", + data=torch.zeros((total_size,), dtype=torch.int8), + ) + + def _create_fused_node( + self, + graph, + quantize_node: Node, + quant_params: dict, + weight_params: dict, + bias_params: Optional[dict], + quantized_target, + ) -> Node: + """Generic fused node creation for any FC-like operation.""" + # Extract all parameters + input_tensor = quant_params["input_tensor"] + input_zp = quant_params["input_zero_point"] + input_multiplier = quant_params["input_multiplier"] + input_shift = quant_params["input_shift"] + weight_tensor = weight_params["weight_tensor"] + + weight_zp_node = self._create_constant_parameter_buffer( + graph, quantize_node, weight_params["weight_zero_point_data"], "weight_zp" + ) + weight_mult_node = self._create_constant_parameter_buffer( + graph, quantize_node, weight_params["weight_multiplier_data"], "weight_mult" + ) + weight_shift_node = self._create_constant_parameter_buffer( + graph, quantize_node, weight_params["weight_shift_data"], "weight_shift" + ) + # Get dimensions + weight_shape = get_param_tensor(self._exported_program, weight_tensor).shape + assert ( + len(weight_shape) == 2 + ), f"Weight tensor must be 2D, got shape {weight_shape}" + in_features = weight_shape[1] + out_features = weight_shape[0] + + # Handle bias + bias_tensor = bias_params["bias_tensor"] if bias_params else None + bias_multiplier, bias_shift = self._prepare_bias_tensors( + bias_params, out_features + ) + output_zp = quant_params["output_zero_point"] + + scratch_buffer = self._create_scratch_buffer( + graph, quantize_node, weight_tensor + ) + + with graph.inserting_after(quantize_node): + fused = graph.create_node( + "call_function", + target=quantized_target, + args=( + input_tensor, + input_zp, + input_multiplier, + input_shift, + weight_tensor, + weight_zp_node, + weight_mult_node, + weight_shift_node, + bias_tensor, + bias_multiplier, + bias_shift, + scratch_buffer, + output_zp, + in_features, + out_features, + ), + kwargs={}, + ) + + transfer_metadata(fused, quantize_node, "QuantizedLinearFusionPass") + return fused + + def _mark_for_cleanup(self, nodes): + for node in nodes: + if node is not None: + self.nodes_to_erase.append(node) + + def _cleanup_nodes(self, graph): + cleanup_nodes(self.nodes_to_erase, graph) + self.nodes_to_erase.clear() + + def _extract_linear_pattern_with_validation(self, quantize_node: Node): + pattern_info = self._extract_linear_pattern(quantize_node) + if not pattern_info: + return None + # Optionally add more validation here if needed + return pattern_info + + def _trace_to_dequantize(self, node: Optional[Node], max_depth=3) -> Optional[Node]: + """Trace through transformations to find dequantize node.""" + current_node = node + depth = 0 + while current_node and depth < max_depth: + if is_dequant_node(current_node): + return current_node + if current_node.op == "call_function" and current_node.target in { + exir_ops.edge.aten.permute_copy.default, + exir_ops.edge.aten.view_copy.default, + }: + if current_node.args: + current_node = current_node.args[0] + depth += 1 + continue + break + return None + + def _fuse_quantized_linear_patterns( + self, graph_module: torch.fx.GraphModule + ) -> int: + fusion_count = 0 + graph = graph_module.graph + for node in list(graph.nodes): + if not ( + node.op == "call_function" and "quantize_per_tensor" in str(node.target) + ): + continue + pattern_info = self._extract_linear_pattern_with_validation(node) + if not pattern_info: + continue + + ( + quantize_node, + fc_node, + input_dq_node, + weight_dq_node, + bias_dq_node, + op_name, + ) = pattern_info + + # Get quantized target for this FC operation + quantized_target = self.SUPPORTED_OPS_MAPPING.get(fc_node.target) + if not quantized_target: + logger.warning(f"No quantized target found for {fc_node.target}") + continue + + logger.info(f"✅ Found complete cortex_m Q/DQ + {op_name} pattern!") + + try: + input_params = self._extract_input_quantization_parameters( + input_dq_node + ) + if not input_params: + logger.error( + "Quantization parameter extraction failed for node: %s", node + ) + return None + output_params = self._extract_output_quantization_parameters( + quantize_node + ) + if not output_params: + logger.error( + "Output quantization parameter extraction failed for node: %s", + node, + ) + return None + quant_params = {**input_params, **output_params} + logger.info(f"Quantization parameters: {quant_params}") + + weight_params = self._extract_weight_parameters(weight_dq_node) + if not weight_params: + continue + bias_params = self._extract_bias_parameters(bias_dq_node) + if bias_dq_node and not bias_params: + continue + fused_node = self._create_fused_node( + graph, + quantize_node, + quant_params, + weight_params, + bias_params, + quantized_target, + ) + logger.info(f"Created fused {op_name} node: {fused_node}") + + quantize_node.replace_all_uses_with(fused_node) + self._mark_for_cleanup( + [ + quantize_node, + fc_node, + input_dq_node, + weight_dq_node, + bias_dq_node, + ] + ) + fusion_count += 1 + logger.info(f"✅ Successfully fused {op_name} operation {fusion_count}") + except Exception as e: + logger.error( + f"Failed to fuse {op_name} pattern for {fc_node.name}: {e}" + ) + continue + self._cleanup_nodes(graph) + return fusion_count diff --git a/backends/cortex_m/passes/quantized_op_fusion_pass.py b/backends/cortex_m/passes/quantized_op_fusion_pass.py index ca6d8b97795..eebf6866d83 100644 --- a/backends/cortex_m/passes/quantized_op_fusion_pass.py +++ b/backends/cortex_m/passes/quantized_op_fusion_pass.py @@ -36,7 +36,7 @@ class QuantizedOpFusionPass(ExportPass): # Generic operation mapping SUPPORTED_OPS_MAPPING = { exir_ops.edge.aten.add.Tensor: exir_ops.edge.cortex_m.quantized_add.default, - # Future ops to be added here: + # Future binary ops to be added here: } def __init__(self): diff --git a/examples/arm/aot_arm_compiler.py b/examples/arm/aot_arm_compiler.py index 106ab35363c..5513529509e 100644 --- a/examples/arm/aot_arm_compiler.py +++ b/examples/arm/aot_arm_compiler.py @@ -38,6 +38,10 @@ from executorch.backends.arm.vgf import VgfCompileSpec, VgfPartitioner # To use Cortex-M backend +from executorch.backends.cortex_m.passes.quantized_linear_fusion_pass import ( + QuantizedLinearFusionPass, +) + from executorch.backends.cortex_m.passes.quantized_op_fusion_pass import ( QuantizedOpFusionPass, ) @@ -55,6 +59,7 @@ ExecutorchBackendConfig, to_edge_transform_and_lower, ) + from executorch.extension.export_util.utils import save_pte_program from tabulate import tabulate from torch.utils.data import DataLoader @@ -148,7 +153,8 @@ def quantize( evaluator_name: str | None, evaluator_config: Dict[str, Any] | None, ) -> torch.nn.Module: - """This is the official recommended flow for quantization in pytorch 2.0 export""" + """This is the official recommended flow for quantization in pytorch 2.0 + export""" logging.info("Quantizing Model...") logging.debug(f"Original model: {model}") quantizer = None @@ -605,7 +611,7 @@ def get_args(): parser.add_argument( "--enable_qdq_fusion_pass", action="store_true", - help="Enable the QuantizedOpFusionPass fusion step", + help="Enable the Quantized qdq fusion Op passes", ) parser.add_argument( "--enable_debug_mode", @@ -806,22 +812,24 @@ def to_edge_no_delegate(exported_program, args, model: torch.nn.Module, example_ return model_int8, edge -def transform_for_cortex_m_backend(edge, args): +def transform_for_cortex_m_backend(edge_program_manager, args): # Let's make sure we are using optimized Cortex M backend # NB: If we can't find and replace ops those are expected to be replaced, # bad things will happen at runtime, like "missing operator" errors! # Instantiate the mandatory ReplaceQuantNodesPass - passes = [ReplaceQuantNodesPass()] - - # Conditionally add the QuantizedOpFusionPass + passes = [ReplaceQuantNodesPass] if args.enable_qdq_fusion_pass: - passes.append(QuantizedOpFusionPass()) - - # Apply the passes - edge = edge.transform(passes) - - return edge + passes += [QuantizedLinearFusionPass, QuantizedOpFusionPass] + current_edge = edge_program_manager + for pass_cls in passes: + transform_pass = ( + pass_cls(current_edge.exported_program()) + if pass_cls.__name__ == "QuantizedLinearFusionPass" + else pass_cls() + ) + current_edge = current_edge.transform([transform_pass]) + return current_edge if __name__ == "__main__": # noqa: C901 From 246009b64f86a91b6909e4c2f1600319cb52de07 Mon Sep 17 00:00:00 2001 From: Mengwei Liu Date: Thu, 18 Sep 2025 23:38:58 -0700 Subject: [PATCH 043/395] Test xnnpack with pybindings (#13133) Make sure we run xnnpack delegated model using pybindings, in `test_models.sh`. --- .ci/scripts/test_model.sh | 6 +++--- examples/xnnpack/aot_compiler.py | 29 +++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/.ci/scripts/test_model.sh b/.ci/scripts/test_model.sh index 74eb75c6ddd..de28597b1d5 100755 --- a/.ci/scripts/test_model.sh +++ b/.ci/scripts/test_model.sh @@ -131,13 +131,13 @@ test_model_with_xnnpack() { return 0 fi - # Delegation + # Delegation and test with pybindings if [[ ${WITH_QUANTIZATION} == true ]]; then SUFFIX="q8" - "${PYTHON_EXECUTABLE}" -m examples.xnnpack.aot_compiler --model_name="${MODEL_NAME}" --delegate --quantize + "${PYTHON_EXECUTABLE}" -m examples.xnnpack.aot_compiler --model_name="${MODEL_NAME}" --delegate --quantize --test_after_export else SUFFIX="fp32" - "${PYTHON_EXECUTABLE}" -m examples.xnnpack.aot_compiler --model_name="${MODEL_NAME}" --delegate + "${PYTHON_EXECUTABLE}" -m examples.xnnpack.aot_compiler --model_name="${MODEL_NAME}" --delegate --test_after_export fi OUTPUT_MODEL_PATH="${MODEL_NAME}_xnnpack_${SUFFIX}.pte" diff --git a/examples/xnnpack/aot_compiler.py b/examples/xnnpack/aot_compiler.py index 81eeb75c72c..9a78138adf3 100644 --- a/examples/xnnpack/aot_compiler.py +++ b/examples/xnnpack/aot_compiler.py @@ -61,6 +61,14 @@ default="", help="Generate and save an ETRecord to the given file location", ) + parser.add_argument( + "-t", + "--test_after_export", + action="store_true", + required=False, + default=False, + help="Test the pte with pybindings", + ) parser.add_argument("-o", "--output_dir", default=".", help="output directory") args = parser.parse_args() @@ -117,3 +125,24 @@ quant_tag = "q8" if args.quantize else "fp32" model_name = f"{args.model_name}_xnnpack_{quant_tag}" save_pte_program(exec_prog, model_name, args.output_dir) + + if args.test_after_export: + logging.info("Testing the pte with pybind") + from executorch.extension.pybindings.portable_lib import ( + _load_for_executorch_from_buffer, + ) + + # Import custom ops. This requires portable_lib to be loaded first. + from executorch.extension.llm.custom_ops import ( # noqa: F401, F403 + custom_ops, + ) # usort: skip + + # Import quantized ops. This requires portable_lib to be loaded first. + from executorch.kernels import quantized # usort: skip # noqa: F401, F403 + from torch.utils._pytree import tree_flatten + + m = _load_for_executorch_from_buffer(exec_prog.buffer) + logging.info("Successfully loaded the model") + flattened = tree_flatten(example_inputs)[0] + res = m.run_method("forward", flattened) + logging.info("Successfully ran the model") From 8da822cd154db854eade5562a28608b964558d17 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Martin=20Lindstr=C3=B6m?= <33344797+martinlsm@users.noreply.github.com> Date: Fri, 19 Sep 2025 11:12:29 +0200 Subject: [PATCH 044/395] Arm backend: Add pass order validation to ArmPassManager (#14148) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Introduce a mechanism to enforce required ordering of passes in ArmPassManager. Each ArmPass must now declare which passes are required to run after it, ensuring ordering constraints are always upheld. This prevents accidental breakage when modifying pass ordering in the manager. Ordering constraints are verified by the new method ArmPass.validate_constraints_mandatory. We considered reusing torch.fx.passes.infra.pass_manager.PassManager.validate_constraints, but that utility only checks pairwise ordering and cannot enforce that a pass is actually run, which did not meet our needs. This patch only implements the mechanism and tests for it. Defining the actual pass orderings are done in a later patch. ### Test plan The change comes with added unit tests in backends/arm/test/misc/test_pass_required_order.py Signed-off-by: Adrian Lundell Signed-off-by: Martin Lindström Co-authored-by: Adrian Lundell Co-authored-by: Martin Lindström --- backends/arm/_passes/add_bias_pass.py | 6 +- .../arm/_passes/annotate_decomposed_matmul.py | 4 +- .../_passes/annotate_output_dim_order_pass.py | 7 +- backends/arm/_passes/arm_pass.py | 33 ++++++- backends/arm/_passes/arm_pass_manager.py | 33 ++++++- backends/arm/_passes/broadcast_args_pass.py | 6 +- .../arm/_passes/cast_bool_to_int8_pass.py | 4 + backends/arm/_passes/cast_int64_pass.py | 3 + backends/arm/_passes/cast_to_int32_pass.py | 4 + backends/arm/_passes/conv1d_unsqueeze_pass.py | 4 + .../convert_any_default_dim_dims_pass.py | 4 + .../_passes/convert_expand_copy_to_repeat.py | 4 +- .../_passes/convert_full_like_to_full_pass.py | 4 + .../convert_int64_const_ops_to_int32.py | 3 + .../convert_int64_output_ops_to_int32.py | 3 + .../arm/_passes/convert_int_pow_to_mul.py | 5 + backends/arm/_passes/convert_minmax_pass.py | 4 + .../arm/_passes/convert_split_to_slice.py | 4 + .../arm/_passes/convert_squeezes_to_view.py | 4 + backends/arm/_passes/convert_to_clamp.py | 4 +- backends/arm/_passes/decompose_acosh_pass.py | 5 + .../decompose_adaptive_avg_pool2d_pass.py | 4 + backends/arm/_passes/decompose_addmm_pass.py | 5 + .../_passes/decompose_asin_and_acos_pass.py | 4 + backends/arm/_passes/decompose_asinh_pass.py | 5 + backends/arm/_passes/decompose_atan_pass.py | 4 + backends/arm/_passes/decompose_atanh_pass.py | 5 + backends/arm/_passes/decompose_avg_pool2d.py | 4 +- .../_passes/decompose_batch_norm_no_stats.py | 5 +- backends/arm/_passes/decompose_cosh_pass.py | 5 + .../decompose_cosine_similarity_pass.py | 4 + backends/arm/_passes/decompose_cumsum_pass.py | 5 +- backends/arm/_passes/decompose_div_pass.py | 6 +- backends/arm/_passes/decompose_elu_pass.py | 5 + .../arm/_passes/decompose_embedding_pass.py | 3 + backends/arm/_passes/decompose_expm1_pass.py | 5 + backends/arm/_passes/decompose_gelu_pass.py | 4 + backends/arm/_passes/decompose_glu_pass.py | 5 + .../arm/_passes/decompose_grouped_conv.py | 3 + .../arm/_passes/decompose_groupnorm_pass.py | 5 +- .../arm/_passes/decompose_layernorm_pass.py | 5 +- .../arm/_passes/decompose_leaky_relu_pass.py | 5 + .../decompose_linalg_vector_norm_pass.py | 4 + backends/arm/_passes/decompose_linear_pass.py | 6 +- backends/arm/_passes/decompose_logit_pass.py | 5 + backends/arm/_passes/decompose_masked_fill.py | 5 + .../decompose_maxpool2d_with_dilation.py | 4 + .../arm/_passes/decompose_meandim_pass.py | 4 + backends/arm/_passes/decompose_ne_pass.py | 5 + backends/arm/_passes/decompose_round_pass.py | 5 + backends/arm/_passes/decompose_select.py | 4 + backends/arm/_passes/decompose_sign_pass.py | 5 + backends/arm/_passes/decompose_silu_pass.py | 4 + backends/arm/_passes/decompose_sinh_pass.py | 5 + .../arm/_passes/decompose_softmax_pass.py | 4 + .../decompose_softmax_unstable_pass.py | 5 + backends/arm/_passes/decompose_sqrt_pass.py | 3 +- backends/arm/_passes/decompose_sum_pass.py | 4 + backends/arm/_passes/decompose_var_pass.py | 5 + .../decorate_fp32_to_int32_casting_pass.py | 5 + .../fold_qdq_with_annotated_qparams_pass.py | 8 +- backends/arm/_passes/fuse_batchnorm2d_pass.py | 4 + .../arm/_passes/fuse_constant_ops_pass.py | 5 + .../_passes/fuse_equal_placeholders_pass.py | 3 + .../_passes/fuse_quantized_activation_pass.py | 4 + backends/arm/_passes/insert_rescales_pass.py | 4 +- backends/arm/_passes/insert_table_ops.py | 4 +- backends/arm/_passes/match_arg_dtype_pass.py | 4 + backends/arm/_passes/match_arg_ranks_pass.py | 4 +- backends/arm/_passes/mm_to_bmm_pass.py | 4 + backends/arm/_passes/remove_noop_pass.py | 3 + .../arm/_passes/replace_inf_values_pass.py | 4 + .../replace_scalar_with_tensor_pass.py | 7 +- .../arm/_passes/scalars_to_attribute_pass.py | 4 +- .../arm/_passes/size_adjust_input_pass.py | 4 +- .../arm/_passes/to_tosa_memory_format_pass.py | 9 ++ .../_passes/unsqueeze_before_repeat_pass.py | 6 +- .../unsqueeze_scalar_placeholders_pass.py | 4 + .../arm/test/misc/test_pass_required_order.py | 95 +++++++++++++++++++ backends/transforms/decompose_sdpa.py | 3 + backends/transforms/fuse_view_copy.py | 4 + 81 files changed, 489 insertions(+), 24 deletions(-) create mode 100644 backends/arm/test/misc/test_pass_required_order.py diff --git a/backends/arm/_passes/add_bias_pass.py b/backends/arm/_passes/add_bias_pass.py index 31c0c0505cb..a8a76c0a47b 100644 --- a/backends/arm/_passes/add_bias_pass.py +++ b/backends/arm/_passes/add_bias_pass.py @@ -3,13 +3,15 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.backends.arm._passes import ArmPass from executorch.backends.arm._passes.arm_pass_utils import get_first_fake_tensor from executorch.backends.transforms.utils import create_constant_placeholder from executorch.exir.dialects._ops import ops as exir_ops -from executorch.exir.pass_base import PassResult +from executorch.exir.pass_base import ExportPass, PassResult from torch.export.graph_signature import InputKind @@ -19,6 +21,8 @@ class AddBiasPass(ArmPass): The bias is set to zero. """ + _passes_required_after: Set[Type[ExportPass]] = set() + targeted_ops = (exir_ops.edge.aten.convolution.default,) def call(self, graph_module): diff --git a/backends/arm/_passes/annotate_decomposed_matmul.py b/backends/arm/_passes/annotate_decomposed_matmul.py index 8156ca0b89d..81b7b36cc0b 100644 --- a/backends/arm/_passes/annotate_decomposed_matmul.py +++ b/backends/arm/_passes/annotate_decomposed_matmul.py @@ -7,7 +7,7 @@ import itertools import operator -from typing import cast, List +from typing import cast, List, Set, Type import torch from executorch.backends.arm._passes.arm_pass_utils import create_node @@ -29,6 +29,8 @@ class AnnotateDecomposedMatmulPass(ExportPass): matmul-op (can be mm or bmm). """ + _passes_required_after: Set[Type[ExportPass]] = set() + def _match_partition_to_node( self, node: torch.fx.Node, partitioned_inputs: List[torch.fx.Node] ) -> torch.fx.Node: diff --git a/backends/arm/_passes/annotate_output_dim_order_pass.py b/backends/arm/_passes/annotate_output_dim_order_pass.py index 08f93383a9c..8dc13326e4a 100644 --- a/backends/arm/_passes/annotate_output_dim_order_pass.py +++ b/backends/arm/_passes/annotate_output_dim_order_pass.py @@ -3,9 +3,12 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. + +from typing import Set, Type + from executorch.backends.arm._passes import ArmPass from executorch.backends.arm._passes.arm_pass_utils import get_output_dim_orders -from executorch.exir.pass_base import PassResult +from executorch.exir.pass_base import ExportPass, PassResult class AnnotateOutputDimOrderPass(ArmPass): @@ -14,6 +17,8 @@ class AnnotateOutputDimOrderPass(ArmPass): for verifying that the dim order does not change unexpectedly in later passes. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call(self, graph_module): output_node = graph_module.graph.output_node() output_node.meta["original_dim_orders"] = get_output_dim_orders(graph_module) diff --git a/backends/arm/_passes/arm_pass.py b/backends/arm/_passes/arm_pass.py index 085267a174e..c76b5d157a7 100644 --- a/backends/arm/_passes/arm_pass.py +++ b/backends/arm/_passes/arm_pass.py @@ -6,7 +6,8 @@ # pyre-unsafe import traceback -from typing import Optional +from abc import abstractmethod +from typing import List, Optional, Set, Type import torch from executorch.exir.pass_base import ExportPass, NodeMetadata @@ -19,6 +20,36 @@ def __init__(self, exported_program: Optional[torch.export.ExportedProgram] = No super(ArmPass, self).__init__() self.exported_program = exported_program + @property + @abstractmethod + def _passes_required_after(self) -> Set[Type[ExportPass]]: + """The subclass defines passes that must run after it""" + pass + + @staticmethod + def get_required_passes(pass_) -> List[str]: + """ + Returns the list of passes that must be run after this pass, sorted by name. + """ + if hasattr(pass_, "_passes_required_after"): + return sorted([ArmPass.get_name(p) for p in pass_._passes_required_after]) + else: + return [] + + @staticmethod + def get_name(pass_) -> str: + """ + Returns the name of the pass. + """ + if isinstance(pass_, ExportPass): + return pass_.__class__.__name__ + elif hasattr(pass_, "__name__"): + return pass_.__name__ + else: + raise ValueError( + f"Cannot get name for pass: {pass_}. It must be an instance of ExportPass or have a __name__ attribute." + ) + def call_operator(self, op, args, kwargs, meta, updated: Optional[bool] = False): if not updated: return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/arm_pass_manager.py b/backends/arm/_passes/arm_pass_manager.py index f49206da67e..c6530357f3b 100644 --- a/backends/arm/_passes/arm_pass_manager.py +++ b/backends/arm/_passes/arm_pass_manager.py @@ -7,6 +7,9 @@ # pyre-unsafe + +from collections import defaultdict + import executorch.backends.arm.tosa.dialect # noqa: unused from executorch.backends.arm._passes import ( AddBiasPass, @@ -94,6 +97,7 @@ UnsqueezeScalarPlaceholdersPass, ) +from executorch.backends.arm._passes.arm_pass import ArmPass from executorch.backends.arm.tosa.specification import ( TosaLoweringContext, TosaSpecification, @@ -115,6 +119,32 @@ def __init__(self, tosa_spec: TosaSpecification) -> None: self.tosa_spec = tosa_spec super().__init__() + def validate_constraints_mandatory(self): + """ + Validates that necessary passes have run before transforming to backend. + + Note that this differs from the original validate_constraints function, which + only checks the order of passes. + """ + passes_to_run = defaultdict(list) + + for current_pass in self.passes: + current_pass_name = ArmPass.get_name(current_pass) + for required_pass_name in ArmPass.get_required_passes(current_pass): + passes_to_run[required_pass_name].append(current_pass_name) + + passes_to_run.pop(current_pass_name, None) + + if len(passes_to_run) > 0: + error_msg = "The following constraints for passes are not met:\n" + for required_pass, requiring_passes in passes_to_run.items(): + for requiring_pass in requiring_passes: + error_msg += ( + f" - {required_pass} must run after {requiring_pass}\n" + ) + + raise RuntimeError(error_msg) + def _transform(self, graph_module: GraphModule): with TosaLoweringContext(self.tosa_spec): return self(graph_module).graph_module @@ -125,7 +155,6 @@ def _tosa_INT_pipeline(self, exported_program: ExportedProgram) -> GraphModule: self.add_pass(RemoveGetItemPass()) self.add_pass(ConvertSplitToSlicePass()) self.add_pass(ConvertMmToBmmPass()) - self.add_pass(DecomposeLinearVectorNormPass()) self.add_pass( DecomposeMeanDimPass(exported_program.graph_module, self.tosa_spec) ) @@ -175,6 +204,7 @@ def _tosa_INT_pipeline(self, exported_program: ExportedProgram) -> GraphModule: self.add_pass(RemoveNoopPass()) self.add_pass(InsertRescalePass()) + self.validate_constraints_mandatory() return self._transform(exported_program.graph_module) def _tosa_FP_pipeline(self, exported_program: ExportedProgram) -> GraphModule: @@ -258,6 +288,7 @@ def _tosa_FP_pipeline(self, exported_program: ExportedProgram) -> GraphModule: self.add_pass(RemoveNoopPass()) self.add_pass(InsertRescalePass()) + self.validate_constraints_mandatory() return self._transform(exported_program.graph_module) def transform_to_backend_pipeline(self, exported_program: ExportedProgram): diff --git a/backends/arm/_passes/broadcast_args_pass.py b/backends/arm/_passes/broadcast_args_pass.py index f125ba13ff4..659e6aca686 100644 --- a/backends/arm/_passes/broadcast_args_pass.py +++ b/backends/arm/_passes/broadcast_args_pass.py @@ -3,6 +3,8 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + from executorch.backends.arm._passes import ArmPass from executorch.backends.arm._passes.arm_pass_utils import ( @@ -12,7 +14,7 @@ from executorch.exir.dialects._ops import ops as exir_ops -from executorch.exir.pass_base import PassResult +from executorch.exir.pass_base import ExportPass, PassResult from torch.fx import GraphModule, Node @@ -22,6 +24,8 @@ class BroadcastArgsPass(ArmPass): This is done when more than one arg needs broadcasting. """ + _passes_required_after: Set[Type[ExportPass]] = set() + targeted_ops = { exir_ops.edge.aten.add.Tensor, exir_ops.edge.aten.sub.Tensor, diff --git a/backends/arm/_passes/cast_bool_to_int8_pass.py b/backends/arm/_passes/cast_bool_to_int8_pass.py index 1352671b01e..771b6d9e174 100644 --- a/backends/arm/_passes/cast_bool_to_int8_pass.py +++ b/backends/arm/_passes/cast_bool_to_int8_pass.py @@ -6,6 +6,8 @@ # The TOSA BITWISE_AND, BITWISE_OR, and BITWISE_XOR don't handle bool as input # If input/output is bool lest add a cast/conversion pass before/after to/from int8. +from typing import Set, Type + import torch from executorch.exir.dialects._ops import ops as exir_ops @@ -15,6 +17,8 @@ class CastBoolToInt8Pass(ExportPass): """Casts the input to int8 if it is not already and casts back the output to the original input dtype.""" + _passes_required_after: Set[Type[ExportPass]] = set() + targeted_ops = { exir_ops.edge.aten.bitwise_and.Tensor, exir_ops.edge.aten.bitwise_or.Tensor, diff --git a/backends/arm/_passes/cast_int64_pass.py b/backends/arm/_passes/cast_int64_pass.py index 8052c8fd2ce..d7b2a6b6b43 100644 --- a/backends/arm/_passes/cast_int64_pass.py +++ b/backends/arm/_passes/cast_int64_pass.py @@ -6,6 +6,7 @@ # pyre-unsafe import logging +from typing import Set, Type import torch from executorch.exir.pass_base import ExportPass, PassResult @@ -19,6 +20,8 @@ class CastInt64BuffersToInt32Pass(ExportPass): Cast int64 buffers to int32 if the int64 data is in int32 range. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def __init__(self, exported_program: torch.export.ExportedProgram): super(CastInt64BuffersToInt32Pass, self).__init__() self.exported_program = exported_program diff --git a/backends/arm/_passes/cast_to_int32_pass.py b/backends/arm/_passes/cast_to_int32_pass.py index c4b009e2b88..2e574568235 100644 --- a/backends/arm/_passes/cast_to_int32_pass.py +++ b/backends/arm/_passes/cast_to_int32_pass.py @@ -3,6 +3,8 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.exir.dialects._ops import ops as exir_ops @@ -12,6 +14,8 @@ class CastToInt32Pass(ExportPass): """Casts the input to int32 if it is not already and casts back the output to the original input dtype.""" + _passes_required_after: Set[Type[ExportPass]] = set() + targeted_ops = { exir_ops.edge.aten.bitwise_left_shift.Tensor, exir_ops.edge.aten.bitwise_right_shift.Tensor, diff --git a/backends/arm/_passes/conv1d_unsqueeze_pass.py b/backends/arm/_passes/conv1d_unsqueeze_pass.py index 56f674e9066..718c94fc196 100644 --- a/backends/arm/_passes/conv1d_unsqueeze_pass.py +++ b/backends/arm/_passes/conv1d_unsqueeze_pass.py @@ -6,6 +6,8 @@ # LICENSE file in the root directory of this source tree. +from typing import Set, Type + from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.pass_base import ExportPass @@ -21,6 +23,8 @@ class Conv1dUnsqueezePass(ExportPass): 3) squeeze the output back down to 3d. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op != exir_ops.edge.aten.convolution.default: return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/convert_any_default_dim_dims_pass.py b/backends/arm/_passes/convert_any_default_dim_dims_pass.py index 7085f17add0..f4ec0c57b2a 100644 --- a/backends/arm/_passes/convert_any_default_dim_dims_pass.py +++ b/backends/arm/_passes/convert_any_default_dim_dims_pass.py @@ -3,6 +3,8 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.exir.dialects._ops import ( # type: ignore[import-not-found] ops as exir_ops, @@ -44,6 +46,8 @@ class ConvertAnyDefaultDimDimsPass(ExportPass): squeeze(dim = [dim1, dim2]) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call(self, graph_module: torch.fx.GraphModule): modified = False for node in graph_module.graph.nodes: diff --git a/backends/arm/_passes/convert_expand_copy_to_repeat.py b/backends/arm/_passes/convert_expand_copy_to_repeat.py index ee509c7ebb5..1c6b52b150a 100644 --- a/backends/arm/_passes/convert_expand_copy_to_repeat.py +++ b/backends/arm/_passes/convert_expand_copy_to_repeat.py @@ -6,7 +6,7 @@ # pyre-unsafe import logging -from typing import cast +from typing import cast, Set, Type import torch @@ -50,6 +50,8 @@ class ConvertExpandCopyToRepeatPass(ExportPass): Replace expand copy with repeat since it is a repeat that can only repeat singleton dimensions. """ + _passes_required_after: Set[Type[ExportPass]] = set() + expand_copy = exir_ops.edge.aten.expand_copy.default repeat = exir_ops.edge.aten.repeat.default diff --git a/backends/arm/_passes/convert_full_like_to_full_pass.py b/backends/arm/_passes/convert_full_like_to_full_pass.py index 234e2ecda82..2f46e19005a 100644 --- a/backends/arm/_passes/convert_full_like_to_full_pass.py +++ b/backends/arm/_passes/convert_full_like_to_full_pass.py @@ -3,6 +3,8 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.pass_base import ExportPass @@ -19,6 +21,8 @@ class ConvertFullLikeToFullPass(ExportPass): Skip layout and device since it's not relevant for our backend. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in [ exir_ops.edge.aten.full_like.default, diff --git a/backends/arm/_passes/convert_int64_const_ops_to_int32.py b/backends/arm/_passes/convert_int64_const_ops_to_int32.py index 704c89dbd78..9af44f56f11 100644 --- a/backends/arm/_passes/convert_int64_const_ops_to_int32.py +++ b/backends/arm/_passes/convert_int64_const_ops_to_int32.py @@ -7,6 +7,7 @@ import logging +from typing import Set, Type import torch from executorch.backends.arm._passes.fuse_constant_ops_pass import ComputeConstantOpsAOT @@ -30,6 +31,8 @@ class ConvertInt64ConstOpsToInt32Pass(ExportPass): 5. `torch.tensor` """ + _passes_required_after: Set[Type[ExportPass]] = set() + torch_ops = [ torch.ops.aten.full.default, torch.ops.aten.arange.default, diff --git a/backends/arm/_passes/convert_int64_output_ops_to_int32.py b/backends/arm/_passes/convert_int64_output_ops_to_int32.py index 788201be6c8..d0d29d14e30 100644 --- a/backends/arm/_passes/convert_int64_output_ops_to_int32.py +++ b/backends/arm/_passes/convert_int64_output_ops_to_int32.py @@ -7,6 +7,7 @@ import logging +from typing import Set, Type import torch from executorch.backends.arm._passes.arm_pass_utils import ( @@ -44,6 +45,8 @@ class ConvertInt64OutputOpsToInt32Pass(ExportPass): the int32 range. """ + _passes_required_after: Set[Type[ExportPass]] = set() + aten_cast_ops = ( torch.ops.aten.to.dtype, torch.ops.aten.to.dtype_layout, diff --git a/backends/arm/_passes/convert_int_pow_to_mul.py b/backends/arm/_passes/convert_int_pow_to_mul.py index f22a2fd0b3c..8f9b3a9cb4b 100644 --- a/backends/arm/_passes/convert_int_pow_to_mul.py +++ b/backends/arm/_passes/convert_int_pow_to_mul.py @@ -5,8 +5,11 @@ # pyre-unsafe +from typing import Set, Type + from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass class ConvertIntPowToMuls(ArmPass): @@ -16,6 +19,8 @@ class ConvertIntPowToMuls(ArmPass): Needs to be run before doing scalar to tensor conversion. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op != exir_ops.edge.aten.pow.Tensor_Scalar: return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/convert_minmax_pass.py b/backends/arm/_passes/convert_minmax_pass.py index 9f409632c20..2cf59ab2300 100644 --- a/backends/arm/_passes/convert_minmax_pass.py +++ b/backends/arm/_passes/convert_minmax_pass.py @@ -3,6 +3,8 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.pass_base import ExportPass, PassResult @@ -29,6 +31,8 @@ class ConvertMinMaxPass(ExportPass): squeeze(dim = [dim1, dim2]) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def check_argmax(self, node): """ Raises a RuntimeError if the argmax value returned by the min/max op is used in the graph. diff --git a/backends/arm/_passes/convert_split_to_slice.py b/backends/arm/_passes/convert_split_to_slice.py index 67bd9d73e81..7578c07ca53 100644 --- a/backends/arm/_passes/convert_split_to_slice.py +++ b/backends/arm/_passes/convert_split_to_slice.py @@ -5,6 +5,8 @@ # pyre-unsafe +from typing import Set, Type + import torch.fx from executorch.backends.arm._passes.arm_pass_utils import ( create_node, @@ -19,6 +21,8 @@ class ConvertSplitToSlicePass(ExportPass): Replace a split operation with many slice operations. """ + _passes_required_after: Set[Type[ExportPass]] = set() + split_ops = ( exir_ops.edge.aten.split_with_sizes_copy.default, exir_ops.edge.aten.split_copy.Tensor, diff --git a/backends/arm/_passes/convert_squeezes_to_view.py b/backends/arm/_passes/convert_squeezes_to_view.py index 889dbe74172..9c5d26a7c22 100644 --- a/backends/arm/_passes/convert_squeezes_to_view.py +++ b/backends/arm/_passes/convert_squeezes_to_view.py @@ -6,6 +6,8 @@ # pyre-unsafe +from typing import Set, Type + from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.pass_base import ExportPass @@ -15,6 +17,8 @@ class ConvertSqueezesToViewPass(ExportPass): Replaces squeeze/unsqueeze operators with view. These are simply special cases of the view op, so removing them gives us less cases to handle in the node visitiors. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in [ exir_ops.edge.aten.squeeze_copy.dims, diff --git a/backends/arm/_passes/convert_to_clamp.py b/backends/arm/_passes/convert_to_clamp.py index 8f2c9b16f9a..3f8cac30b96 100644 --- a/backends/arm/_passes/convert_to_clamp.py +++ b/backends/arm/_passes/convert_to_clamp.py @@ -3,7 +3,7 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. -from typing import Tuple +from typing import Set, Tuple, Type from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.pass_base import ExportPass @@ -24,6 +24,8 @@ def get_clamp_params(op, args) -> Tuple[float | None, float | None]: class ConvertToClampPass(ExportPass): + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in edge_operators: return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_acosh_pass.py b/backends/arm/_passes/decompose_acosh_pass.py index 1d92dd68c4a..30c5c137482 100644 --- a/backends/arm/_passes/decompose_acosh_pass.py +++ b/backends/arm/_passes/decompose_acosh_pass.py @@ -5,8 +5,11 @@ # pyre-unsafe +from typing import Set, Type + from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass # For MI case edge_acosh_op = exir_ops.edge.aten.acosh.default @@ -19,6 +22,8 @@ class DecomposeAcoshPass(ArmPass): acosh(x) = log(x + sqrt((x-1)(x+1)) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta, updated=False): if op is not edge_acosh_op: diff --git a/backends/arm/_passes/decompose_adaptive_avg_pool2d_pass.py b/backends/arm/_passes/decompose_adaptive_avg_pool2d_pass.py index abfcc8e3945..f1623b4aca7 100644 --- a/backends/arm/_passes/decompose_adaptive_avg_pool2d_pass.py +++ b/backends/arm/_passes/decompose_adaptive_avg_pool2d_pass.py @@ -4,12 +4,14 @@ # LICENSE file in the root directory of this source tree. from math import ceil, floor +from typing import Set, Type import torch from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass edge_ops = (exir_ops.edge.aten._adaptive_avg_pool2d.default,) aten_ops = (torch.ops.aten.adaptive_avg_pool2d.default,) @@ -41,6 +43,8 @@ class DecomposeAdaptiveAvgPool2dPass(ArmPass): The output is of size output_size_h x output_size_w for any input. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta, updated=False): if op not in (edge_ops + aten_ops): return super().call_operator(op, args, kwargs, meta, updated) diff --git a/backends/arm/_passes/decompose_addmm_pass.py b/backends/arm/_passes/decompose_addmm_pass.py index b59a8cb02d3..142f3143f38 100644 --- a/backends/arm/_passes/decompose_addmm_pass.py +++ b/backends/arm/_passes/decompose_addmm_pass.py @@ -3,10 +3,13 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass # For MI case @@ -36,6 +39,8 @@ def get_ops(op): class DecomposeAddmmPass(ArmPass): """Decomposes the addmm operator into tensor multiplication and addition.""" + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in [edge_addmm, aten_addmm]: return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_asin_and_acos_pass.py b/backends/arm/_passes/decompose_asin_and_acos_pass.py index e067f17b0ca..c083cc669c2 100644 --- a/backends/arm/_passes/decompose_asin_and_acos_pass.py +++ b/backends/arm/_passes/decompose_asin_and_acos_pass.py @@ -7,11 +7,13 @@ import logging from math import pi +from typing import Set, Type import torch from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass # For MI case edge_asin_op = (exir_ops.edge.aten.asin.default,) @@ -54,6 +56,8 @@ class DecomposeAsinAndAcosPass(ArmPass): """ + _passes_required_after: Set[Type[ExportPass]] = set() + def _build_polynomial( self, coefficients: list[float], variable: torch.Tensor, meta: dict[str, str] ) -> torch.Tensor: diff --git a/backends/arm/_passes/decompose_asinh_pass.py b/backends/arm/_passes/decompose_asinh_pass.py index a0b78c51a77..b8f7300beb5 100644 --- a/backends/arm/_passes/decompose_asinh_pass.py +++ b/backends/arm/_passes/decompose_asinh_pass.py @@ -6,8 +6,11 @@ # pyre-unsafe +from typing import Set, Type + from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass # For MI case edge_asinh_op = (exir_ops.edge.aten.asinh.default,) @@ -20,6 +23,8 @@ class DecomposeAsinhPass(ArmPass): asinh(x) = log(x + sqrt(x^2 + 1)) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in edge_asinh_op: return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_atan_pass.py b/backends/arm/_passes/decompose_atan_pass.py index 57b9dde5216..7faef26a245 100644 --- a/backends/arm/_passes/decompose_atan_pass.py +++ b/backends/arm/_passes/decompose_atan_pass.py @@ -5,9 +5,11 @@ import logging from math import pi +from typing import Set, Type from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass edge_atan = exir_ops.edge.aten.atan.default # MI case @@ -35,6 +37,8 @@ def _get_atan_ops(op): class DecomposeAtanPass(ArmPass): """Decomposes the atan operator into a rational (Padé) approximation.""" + _passes_required_after: Set[Type[ExportPass]] = set() + def _rational_approximation(self, z, ops, meta): """Creates a (2,1) Padé approximation for atan(x) on [-1, 1].""" diff --git a/backends/arm/_passes/decompose_atanh_pass.py b/backends/arm/_passes/decompose_atanh_pass.py index dfdad41e556..d06598923b3 100644 --- a/backends/arm/_passes/decompose_atanh_pass.py +++ b/backends/arm/_passes/decompose_atanh_pass.py @@ -3,8 +3,11 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass edge_atanh = exir_ops.edge.aten.atanh.default # MI case @@ -30,6 +33,8 @@ class DecomposeAtanhPass(ArmPass): atanh(x) = 0.5 * log((1 + x) / (1 - x)) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op is not edge_atanh: return super().call_operator(op, args, kwargs, meta, updated=False) diff --git a/backends/arm/_passes/decompose_avg_pool2d.py b/backends/arm/_passes/decompose_avg_pool2d.py index 21ed6b518c7..0240661053b 100644 --- a/backends/arm/_passes/decompose_avg_pool2d.py +++ b/backends/arm/_passes/decompose_avg_pool2d.py @@ -4,6 +4,8 @@ # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.backends.arm.operators.operator_validation_utils import ( adjust_pooling_pad_if_needed, @@ -34,7 +36,7 @@ def get_decomposition(op) -> tuple: class DecomposeAvgPool2d(ExportPass): - """ """ + _passes_required_after: Set[Type[ExportPass]] = set() def call_operator(self, op, args, kwargs, meta): if op not in (edge_div_ops + aten_div_ops): diff --git a/backends/arm/_passes/decompose_batch_norm_no_stats.py b/backends/arm/_passes/decompose_batch_norm_no_stats.py index 5fdb8db2d7c..82937241369 100644 --- a/backends/arm/_passes/decompose_batch_norm_no_stats.py +++ b/backends/arm/_passes/decompose_batch_norm_no_stats.py @@ -6,12 +6,13 @@ # pyre-unsafe import operator +from typing import Set, Type import torch from executorch.backends.arm._passes import ArmPass from executorch.backends.arm._passes.arm_pass_utils import create_node from executorch.exir.dialects._ops import ops as exir_ops -from executorch.exir.pass_base import PassResult +from executorch.exir.pass_base import ExportPass, PassResult class DecomposeBatchNormNoStatsPass(ArmPass): @@ -33,6 +34,8 @@ class DecomposeBatchNormNoStatsPass(ArmPass): Source: https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call(self, graph_module: torch.fx.GraphModule) -> PassResult: # noqa: C901 bn_ops = ( exir_ops.edge.aten._native_batch_norm_legit.no_stats, diff --git a/backends/arm/_passes/decompose_cosh_pass.py b/backends/arm/_passes/decompose_cosh_pass.py index a94cf9ecff0..b71ca388651 100644 --- a/backends/arm/_passes/decompose_cosh_pass.py +++ b/backends/arm/_passes/decompose_cosh_pass.py @@ -3,8 +3,11 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass # For MI case edge_cosh = exir_ops.edge.aten.cosh.default @@ -19,6 +22,8 @@ class DecomposeCoshPass(ArmPass): """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta, updated=False): if op is not edge_cosh: return super().call_operator(op, args, kwargs, meta, updated) diff --git a/backends/arm/_passes/decompose_cosine_similarity_pass.py b/backends/arm/_passes/decompose_cosine_similarity_pass.py index 9978e653408..e2ab01b345f 100644 --- a/backends/arm/_passes/decompose_cosine_similarity_pass.py +++ b/backends/arm/_passes/decompose_cosine_similarity_pass.py @@ -3,6 +3,8 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.exir.pass_base import ExportPass @@ -22,6 +24,8 @@ class DecomposeCosineSimilarityPass(ExportPass): out = div(dot, denom) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in torch_cosine_similarity: return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_cumsum_pass.py b/backends/arm/_passes/decompose_cumsum_pass.py index 155ccd11594..04e6275c6c1 100644 --- a/backends/arm/_passes/decompose_cumsum_pass.py +++ b/backends/arm/_passes/decompose_cumsum_pass.py @@ -4,6 +4,7 @@ # LICENSE file in the root directory of this source tree. from math import prod +from typing import Set, Type import torch from executorch.backends.arm._passes import ArmPass @@ -12,7 +13,7 @@ from executorch.backends.transforms.utils import create_constant_placeholder from executorch.exir.dialects._ops import ops as exir_ops -from executorch.exir.pass_base import PassResult +from executorch.exir.pass_base import ExportPass, PassResult from torch.export.graph_signature import InputKind @@ -39,6 +40,8 @@ class DecomposeCumsumPass(ArmPass): And the convolution is applied over dimension H. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call(self, graph_module): graph = graph_module.graph targets = (exir_ops.edge.aten.cumsum.default, torch.ops.aten.cumsum.default) diff --git a/backends/arm/_passes/decompose_div_pass.py b/backends/arm/_passes/decompose_div_pass.py index 893531dac69..b6e289ff049 100644 --- a/backends/arm/_passes/decompose_div_pass.py +++ b/backends/arm/_passes/decompose_div_pass.py @@ -1,4 +1,4 @@ -# Copyright 2024 Arm Limited and/or its affiliates. +# Copyright 2024-2025 Arm Limited and/or its affiliates. # All rights reserved. # # This source code is licensed under the BSD-style license found in the @@ -6,6 +6,8 @@ # pyre-unsafe +from typing import Set, Type + import torch from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.pass_base import ExportPass @@ -37,6 +39,8 @@ class DecomposeDivPass(ExportPass): y = mul(a,x) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in (edge_div_ops + aten_div_ops): return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_elu_pass.py b/backends/arm/_passes/decompose_elu_pass.py index 743f1b46f4d..ba3d32b7529 100644 --- a/backends/arm/_passes/decompose_elu_pass.py +++ b/backends/arm/_passes/decompose_elu_pass.py @@ -3,8 +3,11 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass edge_elu_ops = (exir_ops.edge.aten.elu.default,) @@ -55,6 +58,8 @@ class DecomposeEluPass(ArmPass): - exir_ops.edge.aten.mul.Scalar """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in edge_elu_ops: return super().call_operator(op, args, kwargs, meta, updated=False) diff --git a/backends/arm/_passes/decompose_embedding_pass.py b/backends/arm/_passes/decompose_embedding_pass.py index 6de971f402f..5b2ad27eaf6 100644 --- a/backends/arm/_passes/decompose_embedding_pass.py +++ b/backends/arm/_passes/decompose_embedding_pass.py @@ -8,6 +8,7 @@ import logging from math import prod +from typing import Set, Type import torch from executorch.exir.dialects._ops import ops as exir_ops @@ -33,6 +34,8 @@ class DecomposeEmbeddingPass(ExportPass): i = indices is expected to be int32 before this pass """ + _passes_required_after: Set[Type[ExportPass]] = set() + aten_ops = (torch.ops.aten.embedding.default,) edge_ops = (exir_ops.edge.aten.embedding.default,) diff --git a/backends/arm/_passes/decompose_expm1_pass.py b/backends/arm/_passes/decompose_expm1_pass.py index 5b1b90495b5..21d3c975de3 100644 --- a/backends/arm/_passes/decompose_expm1_pass.py +++ b/backends/arm/_passes/decompose_expm1_pass.py @@ -3,8 +3,11 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass edge_expm1_ops = (exir_ops.edge.aten.expm1.default,) # MI case @@ -68,6 +71,8 @@ class DecomposeExpm1Pass(ArmPass): - exir_ops.edge.aten.logical_and.default """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in edge_expm1_ops: return super().call_operator(op, args, kwargs, meta, updated=False) diff --git a/backends/arm/_passes/decompose_gelu_pass.py b/backends/arm/_passes/decompose_gelu_pass.py index 6e72175e68b..ef6a4753b8c 100644 --- a/backends/arm/_passes/decompose_gelu_pass.py +++ b/backends/arm/_passes/decompose_gelu_pass.py @@ -3,6 +3,8 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.backends.arm._passes.arm_pass_utils import get_node_arg from executorch.exir.dialects._ops import ops as exir_ops @@ -77,6 +79,8 @@ class DecomposeGeluPass(ExportPass): %op7 = mul(%op6, %FULL_0_5) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in torch_gelu + edge_gelu: return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_glu_pass.py b/backends/arm/_passes/decompose_glu_pass.py index 183dc89cf61..6b53609c951 100644 --- a/backends/arm/_passes/decompose_glu_pass.py +++ b/backends/arm/_passes/decompose_glu_pass.py @@ -3,9 +3,12 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass # For FP case @@ -36,6 +39,8 @@ def get_ops(op): class DecomposeGluPass(ArmPass): """Decomposes the GLU operator into hadamard product and sigmoid.""" + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in [edge_glu, aten_glu]: return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_grouped_conv.py b/backends/arm/_passes/decompose_grouped_conv.py index ce9fe9c9937..2f0d7b4d72c 100644 --- a/backends/arm/_passes/decompose_grouped_conv.py +++ b/backends/arm/_passes/decompose_grouped_conv.py @@ -4,6 +4,7 @@ # LICENSE file in the root directory of this source tree. from copy import copy +from typing import Set, Type import torch from executorch.backends.arm._passes.quant_args import QuantArgs @@ -33,6 +34,8 @@ class DecomposeGroupedConv(ExportPass): x = cat(x1, x2) """ + _passes_required_after: Set[Type[ExportPass]] = set() + @staticmethod def _get_decomposition(op): match op: diff --git a/backends/arm/_passes/decompose_groupnorm_pass.py b/backends/arm/_passes/decompose_groupnorm_pass.py index c6cb1b05e40..7f0d7fdeafd 100644 --- a/backends/arm/_passes/decompose_groupnorm_pass.py +++ b/backends/arm/_passes/decompose_groupnorm_pass.py @@ -6,12 +6,13 @@ # pyre-unsafe import operator +from typing import Set, Type import torch from executorch.backends.arm._passes import ArmPass from executorch.backends.arm._passes.arm_pass_utils import create_node from executorch.exir.dialects._ops import ops as exir_ops -from executorch.exir.pass_base import PassResult +from executorch.exir.pass_base import ExportPass, PassResult def get_group_norm_decomposition(op) -> tuple: @@ -57,6 +58,8 @@ class DecomposeGroupNormPass(ArmPass): Source: https://pytorch.org/docs/stable/generated/torch.nn.GroupNorm.html """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call(self, graph_module: torch.fx.GraphModule): modified = False for node in graph_module.graph.nodes: diff --git a/backends/arm/_passes/decompose_layernorm_pass.py b/backends/arm/_passes/decompose_layernorm_pass.py index e6cbdfb91a0..0710ed37b45 100644 --- a/backends/arm/_passes/decompose_layernorm_pass.py +++ b/backends/arm/_passes/decompose_layernorm_pass.py @@ -6,12 +6,13 @@ # pyre-unsafe import operator +from typing import Set, Type import torch from executorch.backends.arm._passes import ArmPass from executorch.backends.arm._passes.arm_pass_utils import create_node from executorch.exir.dialects._ops import ops as exir_ops -from executorch.exir.pass_base import PassResult +from executorch.exir.pass_base import ExportPass, PassResult def get_layer_norm_decomposition(op) -> tuple: @@ -56,6 +57,8 @@ class DecomposeLayerNormPass(ArmPass): Source: https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call(self, graph_module: torch.fx.GraphModule): for node in graph_module.graph.nodes: if node.op != "call_function" or node.target not in ( diff --git a/backends/arm/_passes/decompose_leaky_relu_pass.py b/backends/arm/_passes/decompose_leaky_relu_pass.py index e896cc584be..8ae13a76eb0 100644 --- a/backends/arm/_passes/decompose_leaky_relu_pass.py +++ b/backends/arm/_passes/decompose_leaky_relu_pass.py @@ -6,9 +6,12 @@ # pyre-unsafe +from typing import Set, Type + import torch from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass edge_ops = (exir_ops.edge.aten.leaky_relu.default,) torch_ops = (torch.ops.aten.leaky_relu.default,) @@ -46,6 +49,8 @@ class DecomposeLeakyReLUPass(ArmPass): %op5 = add(%op1,%op4) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in (edge_ops + torch_ops): return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_linalg_vector_norm_pass.py b/backends/arm/_passes/decompose_linalg_vector_norm_pass.py index 9f036c0524f..17441981654 100644 --- a/backends/arm/_passes/decompose_linalg_vector_norm_pass.py +++ b/backends/arm/_passes/decompose_linalg_vector_norm_pass.py @@ -3,6 +3,8 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.exir.pass_base import ExportPass @@ -28,6 +30,8 @@ class DecomposeLinearVectorNormPass(ExportPass): dtype prior, but we dont know this from FX graph. """ + _passes_required_after: Set[Type[ExportPass]] = set() + torch_linalg_vector_norm = (torch.ops.aten.linalg_vector_norm.default,) def call_operator(self, op, args, kwargs, meta): diff --git a/backends/arm/_passes/decompose_linear_pass.py b/backends/arm/_passes/decompose_linear_pass.py index 3d154d9b81e..70268c77a1d 100644 --- a/backends/arm/_passes/decompose_linear_pass.py +++ b/backends/arm/_passes/decompose_linear_pass.py @@ -5,6 +5,8 @@ # pyre-unsafe +from typing import Set, Type + import numpy as np from executorch.backends.arm._passes import ArmPass from executorch.backends.arm._passes.arm_pass_utils import ( @@ -12,7 +14,7 @@ get_first_fake_tensor, ) from executorch.exir.dialects._ops import ops as exir_ops -from executorch.exir.pass_base import PassResult +from executorch.exir.pass_base import ExportPass, PassResult class DecomposeLinearPass(ArmPass): @@ -25,6 +27,8 @@ class DecomposeLinearPass(ArmPass): output = view(conv2d) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call(self, graph_module): for node in graph_module.graph.nodes: if node.op != "call_function": diff --git a/backends/arm/_passes/decompose_logit_pass.py b/backends/arm/_passes/decompose_logit_pass.py index 40e2b22cb54..a82650f0b9e 100644 --- a/backends/arm/_passes/decompose_logit_pass.py +++ b/backends/arm/_passes/decompose_logit_pass.py @@ -3,10 +3,13 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass # For FP case @@ -60,6 +63,8 @@ class DecomposeLogitPass(ArmPass): log(y * reciprocal((-1) * y + 1)) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in [edge_logit, aten_logit]: return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_masked_fill.py b/backends/arm/_passes/decompose_masked_fill.py index fbf3079c92b..ced58aa3920 100644 --- a/backends/arm/_passes/decompose_masked_fill.py +++ b/backends/arm/_passes/decompose_masked_fill.py @@ -6,10 +6,13 @@ # pyre-unsafe +from typing import Set, Type + import torch from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass edge_ops = (exir_ops.edge.aten.masked_fill.Scalar,) @@ -37,6 +40,8 @@ class DecomposeMaskedFill(ArmPass): Decomposed to a where and a full_like operator. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta, updated=False): if op not in (edge_ops + aten_ops): return super().call_operator(op, args, kwargs, meta, updated) diff --git a/backends/arm/_passes/decompose_maxpool2d_with_dilation.py b/backends/arm/_passes/decompose_maxpool2d_with_dilation.py index ff6db260099..1df062ddb57 100644 --- a/backends/arm/_passes/decompose_maxpool2d_with_dilation.py +++ b/backends/arm/_passes/decompose_maxpool2d_with_dilation.py @@ -6,9 +6,11 @@ # pyre-unsafe import operator +from typing import Set, Type from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass # We'll decompose only the EXIR edge max_pool2d ops when dilation > 1 EDGE_MAXPOOL2D = ( @@ -22,6 +24,8 @@ class DecomposeMaxPool2DPass(ArmPass): Decompose dilated max_pool2d (EXIR edge ops) into space-to-batch -> maxpool -> batch-to-space. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): # Only intercept EXIR edge max_pool2d ops if op not in EDGE_MAXPOOL2D: diff --git a/backends/arm/_passes/decompose_meandim_pass.py b/backends/arm/_passes/decompose_meandim_pass.py index a78514b6af5..716924dfbf2 100644 --- a/backends/arm/_passes/decompose_meandim_pass.py +++ b/backends/arm/_passes/decompose_meandim_pass.py @@ -5,12 +5,14 @@ from copy import copy from math import prod +from typing import Set, Type import torch from executorch.backends.arm._passes import ArmPass from executorch.backends.arm._passes.arm_pass_utils import get_node_arg from executorch.exir.backend.utils import WhyNoPartitionReporter from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass def get_meandim_decomposition(op) -> tuple: @@ -62,6 +64,8 @@ class DecomposeMeanDimPass(ArmPass): x = view_copy.default(x, new_shape=(h)) # Squeeze dims since keepdims = False """ + _passes_required_after: Set[Type[ExportPass]] = set() + def __init__(self, graph_module, tosa_spec): super().__init__() self._graph_module = graph_module diff --git a/backends/arm/_passes/decompose_ne_pass.py b/backends/arm/_passes/decompose_ne_pass.py index 16443d5d2fb..3bd4f4540bb 100644 --- a/backends/arm/_passes/decompose_ne_pass.py +++ b/backends/arm/_passes/decompose_ne_pass.py @@ -3,9 +3,12 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass edge_ne_ops = (exir_ops.edge.aten.ne.Tensor,) aten_ne_ops = (torch.ops.aten.ne.Tensor, torch.ops.aten.ne_.Tensor) @@ -53,6 +56,8 @@ class DecomposeNotEqualPass(ArmPass): - followed by aten.logical_not.default or its edge equivalent """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in (edge_ne_ops + aten_ne_ops): return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_round_pass.py b/backends/arm/_passes/decompose_round_pass.py index edfa3817064..35d36e80396 100644 --- a/backends/arm/_passes/decompose_round_pass.py +++ b/backends/arm/_passes/decompose_round_pass.py @@ -3,10 +3,13 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.dialects.edge._ops import EdgeOpOverload +from executorch.exir.pass_base import ExportPass from torch._ops import OpOverload @@ -56,6 +59,8 @@ class DecomposeRoundPass(ArmPass): %result = where(%is_non_negative, %floor, %ceil) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta, updated=False): if op not in (exir_ops.edge.aten.round.default, torch.ops.aten.round.default): return super().call_operator(op, args, kwargs, meta, updated) diff --git a/backends/arm/_passes/decompose_select.py b/backends/arm/_passes/decompose_select.py index 99c89f474ea..9c65cd1c0a8 100644 --- a/backends/arm/_passes/decompose_select.py +++ b/backends/arm/_passes/decompose_select.py @@ -6,6 +6,8 @@ # pyre-unsafe +from typing import Set, Type + import torch from executorch.backends.arm._passes.arm_pass_utils import ( create_node, @@ -20,6 +22,8 @@ class DecomposeSelectPass(ExportPass): This pass decomposes select into slice + squeeze to ensure that Aten and TOSA outputs has the same rank (input rank -1) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call(self, graph_module: torch.fx.GraphModule): for node in graph_module.graph.nodes: diff --git a/backends/arm/_passes/decompose_sign_pass.py b/backends/arm/_passes/decompose_sign_pass.py index 1038ff0f3fa..c4cb964316d 100644 --- a/backends/arm/_passes/decompose_sign_pass.py +++ b/backends/arm/_passes/decompose_sign_pass.py @@ -3,10 +3,13 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass # For MI case @@ -42,6 +45,8 @@ def get_ops(op): class DecomposeSignPass(ArmPass): """Decomposes the sign operator into a sequence of operations that are supported by the Arm backend.""" + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in (edge_sign, aten_sign): return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_silu_pass.py b/backends/arm/_passes/decompose_silu_pass.py index 68ebb3f4515..cb7b55be520 100644 --- a/backends/arm/_passes/decompose_silu_pass.py +++ b/backends/arm/_passes/decompose_silu_pass.py @@ -5,6 +5,8 @@ # pyre-unsafe +from typing import Set, Type + import torch from executorch.exir.pass_base import ExportPass @@ -22,6 +24,8 @@ class DecomposeSiluPass(ExportPass): y = mul(a,x) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in (aten_silu_ops): return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_sinh_pass.py b/backends/arm/_passes/decompose_sinh_pass.py index 7192eb9bf74..473a263e9a5 100644 --- a/backends/arm/_passes/decompose_sinh_pass.py +++ b/backends/arm/_passes/decompose_sinh_pass.py @@ -4,8 +4,11 @@ # LICENSE file in the root directory of this source tree. +from typing import Set, Type + from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass # For MI case @@ -24,6 +27,8 @@ class DecomposeSinhPass(ArmPass): and scalar multiplication. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op is not edge_sinh: return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_softmax_pass.py b/backends/arm/_passes/decompose_softmax_pass.py index a735501f711..47f448ae851 100644 --- a/backends/arm/_passes/decompose_softmax_pass.py +++ b/backends/arm/_passes/decompose_softmax_pass.py @@ -3,6 +3,8 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.pass_base import ExportPass @@ -62,6 +64,8 @@ class DecomposeSoftmaxPass(ExportPass): (in logsoftmax case: %op7 = log(%op6)) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in torch_softmax + edge_softmax: return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_softmax_unstable_pass.py b/backends/arm/_passes/decompose_softmax_unstable_pass.py index b6f5e11b66b..5e704585eb0 100644 --- a/backends/arm/_passes/decompose_softmax_unstable_pass.py +++ b/backends/arm/_passes/decompose_softmax_unstable_pass.py @@ -5,9 +5,12 @@ # pyre-unsafe +from typing import Set, Type + import torch from executorch.backends.arm._passes import ArmPass from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass # For BI case torch_softmax = (torch.ops.aten.softmax.int, torch.ops.aten.log_softmax.int) @@ -57,6 +60,8 @@ class DecomposeSoftmaxUnstablePass(ArmPass): (in logsoftmax case: %op5 = log(%op4)) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in torch_softmax + edge_softmax: return super().call_operator(op, args, kwargs, meta) diff --git a/backends/arm/_passes/decompose_sqrt_pass.py b/backends/arm/_passes/decompose_sqrt_pass.py index 547d0091e90..c93686901d5 100644 --- a/backends/arm/_passes/decompose_sqrt_pass.py +++ b/backends/arm/_passes/decompose_sqrt_pass.py @@ -4,7 +4,7 @@ # LICENSE file in the root directory of this source tree. # pyre-unsafe -from typing import Tuple, Union +from typing import Set, Tuple, Type, Union import torch from executorch.exir.dialects._ops import ops as exir_ops @@ -27,6 +27,7 @@ def get_sqrt_decomposition(op) -> Union[Tuple, torch._ops.OpOverload]: class DecomposeSqrtPass(ExportPass): + _passes_required_after: Set[Type[ExportPass]] = set() def call_operator(self, op, args, kwargs, meta): """ diff --git a/backends/arm/_passes/decompose_sum_pass.py b/backends/arm/_passes/decompose_sum_pass.py index 52b9c10c49f..16027ccec2b 100644 --- a/backends/arm/_passes/decompose_sum_pass.py +++ b/backends/arm/_passes/decompose_sum_pass.py @@ -3,6 +3,8 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.pass_base import ExportPass @@ -40,6 +42,8 @@ class DecomposeSumPass(ExportPass): view(shape = squeezed_shape) -> squeezed_shape """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in [ exir_ops.edge.aten.sum.dim_IntList, diff --git a/backends/arm/_passes/decompose_var_pass.py b/backends/arm/_passes/decompose_var_pass.py index 15872738f3e..f8396da0420 100644 --- a/backends/arm/_passes/decompose_var_pass.py +++ b/backends/arm/_passes/decompose_var_pass.py @@ -7,10 +7,13 @@ # pyre-unsafe +from typing import Set, Type + import torch from executorch.backends.arm._passes import ArmPass from executorch.backends.arm._passes.arm_pass_utils import get_node_arg from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass def get_var_decomposition(op) -> tuple: @@ -47,6 +50,8 @@ class DecomposeVarPass(ArmPass): y = div(sum, max(0, N-correction)) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in ( exir_ops.edge.aten.var.correction, diff --git a/backends/arm/_passes/decorate_fp32_to_int32_casting_pass.py b/backends/arm/_passes/decorate_fp32_to_int32_casting_pass.py index 17a682c0a8e..9d704520302 100644 --- a/backends/arm/_passes/decorate_fp32_to_int32_casting_pass.py +++ b/backends/arm/_passes/decorate_fp32_to_int32_casting_pass.py @@ -6,10 +6,13 @@ # pyre-unsafe +from typing import Set, Type + import torch from executorch.backends.arm._passes import ArmPass from executorch.backends.arm._passes.arm_pass_utils import get_node_arg from executorch.exir.dialects._ops import ops as exir_ops +from executorch.exir.pass_base import ExportPass def _get_decorated_ops(op): @@ -40,6 +43,8 @@ class DecorateFp32toInt32CastingPass(ArmPass): output = to_dim_order_copy(decorated_x, dtype=torch.int32) """ + _passes_required_after: Set[Type[ExportPass]] = set() + targets = [ exir_ops.edge.dim_order_ops._to_dim_order_copy.default, ] diff --git a/backends/arm/_passes/fold_qdq_with_annotated_qparams_pass.py b/backends/arm/_passes/fold_qdq_with_annotated_qparams_pass.py index 491b404f0a4..714543d3908 100644 --- a/backends/arm/_passes/fold_qdq_with_annotated_qparams_pass.py +++ b/backends/arm/_passes/fold_qdq_with_annotated_qparams_pass.py @@ -8,7 +8,7 @@ import copy -from typing import cast, Dict, Set, Tuple +from typing import cast, Dict, Set, Tuple, Type from executorch.backends.arm._passes import ArmPass from executorch.backends.arm._passes.arm_pass_utils import ( @@ -100,6 +100,8 @@ class FoldAndAnnotateQParamsPass(ArmPass): """ + _passes_required_after: Set[Type[ExportPass]] = set() + def fold_and_annotate_arg( self, graph_module: GraphModule, node: Node, arg_list: list[Node], i: int ) -> None: @@ -210,6 +212,8 @@ class QuantizeOperatorArguments(ExportPass): - Makes sure the min and max values to clamp.default are quantized, if it's a quantized operator. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call(self, graph_module: GraphModule) -> PassResult: modified = False # Loop over the graph nodes and find full.default nodes. @@ -257,6 +261,8 @@ class RetraceFoldedDtypesPass(ExportPass): the output type of that matches the type of the output_qparams. """ + _passes_required_after: Set[Type[ExportPass]] = set() + targeted_ops: Set[EdgeOpOverload] = { exir_ops.edge.aten.sum.dim_IntList, } diff --git a/backends/arm/_passes/fuse_batchnorm2d_pass.py b/backends/arm/_passes/fuse_batchnorm2d_pass.py index 2dbdfa84cec..be884585d4d 100644 --- a/backends/arm/_passes/fuse_batchnorm2d_pass.py +++ b/backends/arm/_passes/fuse_batchnorm2d_pass.py @@ -5,6 +5,8 @@ # pyre-unsafe +from typing import Set, Type + import torch from executorch.backends.arm._passes.arm_pass_utils import ( create_node, @@ -28,6 +30,8 @@ class FuseBatchnorm2DPass(ExportPass): the weights and bias of the convolution and removing the batchnorm. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def __init__(self, exported_program: ExportedProgram): self.exported_program = exported_program super().__init__() diff --git a/backends/arm/_passes/fuse_constant_ops_pass.py b/backends/arm/_passes/fuse_constant_ops_pass.py index f49565e3c38..07f3a4af245 100644 --- a/backends/arm/_passes/fuse_constant_ops_pass.py +++ b/backends/arm/_passes/fuse_constant_ops_pass.py @@ -4,6 +4,7 @@ # LICENSE file in the root directory of this source tree. import logging +from typing import Set, Type import torch._export.utils import torch.fx @@ -41,6 +42,8 @@ def f(): return x """ + _passes_required_after: Set[Type[ExportPass]] = set() + def __init__(self, exported_program: ExportedProgram) -> None: super().__init__() self.exported_program = exported_program @@ -168,6 +171,8 @@ def f(node_name_pre_computed): return node_name_pre_computed """ + _passes_required_after: Set[Type[ExportPass]] = set() + targeted_ops = [ exir_ops.edge.aten.full.default, exir_ops.edge.aten.arange.start_step, diff --git a/backends/arm/_passes/fuse_equal_placeholders_pass.py b/backends/arm/_passes/fuse_equal_placeholders_pass.py index 5631e2f32e9..cf1177a0448 100644 --- a/backends/arm/_passes/fuse_equal_placeholders_pass.py +++ b/backends/arm/_passes/fuse_equal_placeholders_pass.py @@ -5,6 +5,7 @@ import hashlib from collections import defaultdict +from typing import Set, Type import torch from executorch.backends.arm._passes.arm_pass_utils import ( @@ -27,6 +28,8 @@ class FuseEqualPlaceholdersPass(ExportPass): with multiple users, using a cache for faster comparison. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def __init__(self, exported_program: ExportedProgram): self.exported_program = exported_program super().__init__() diff --git a/backends/arm/_passes/fuse_quantized_activation_pass.py b/backends/arm/_passes/fuse_quantized_activation_pass.py index 46a7d7f6f98..d39d7135f9c 100644 --- a/backends/arm/_passes/fuse_quantized_activation_pass.py +++ b/backends/arm/_passes/fuse_quantized_activation_pass.py @@ -5,6 +5,8 @@ # pyre-unsafe +from typing import Set, Type + import torch from executorch.backends.arm._passes.quant_args import QuantArgs from executorch.backends.arm.constants import Q_OPS @@ -14,6 +16,8 @@ class FuseQuantizedActivationPass(ExportPass): + _passes_required_after: Set[Type[ExportPass]] = set() + @staticmethod def _is_fuseable_quantized_activation(node: Node): """Fuse activations that have a 0 lower bound and quantized with a qmin zero-point""" diff --git a/backends/arm/_passes/insert_rescales_pass.py b/backends/arm/_passes/insert_rescales_pass.py index 7f75aecf24c..100ac03c2b0 100644 --- a/backends/arm/_passes/insert_rescales_pass.py +++ b/backends/arm/_passes/insert_rescales_pass.py @@ -4,7 +4,7 @@ # LICENSE file in the root directory of this source tree. from copy import copy -from typing import cast +from typing import cast, Set, Type from executorch.backends.arm._passes.arm_pass_utils import create_node from executorch.backends.arm._passes.quant_args import QuantArgs @@ -24,6 +24,8 @@ class InsertRescalePass(ExportPass): in the fake implementation of. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def fold_dq_q_to_rescale(self, node: Node, user: Node, graph_module: GraphModule): dq_args = QuantArgs.from_operator(node.target, node.args) q_args = QuantArgs.from_operator(user.target, user.args) diff --git a/backends/arm/_passes/insert_table_ops.py b/backends/arm/_passes/insert_table_ops.py index fb5d7de5e12..d838ddc823d 100644 --- a/backends/arm/_passes/insert_table_ops.py +++ b/backends/arm/_passes/insert_table_ops.py @@ -6,7 +6,7 @@ # pyre-unsafe from itertools import chain -from typing import Callable, cast, Dict, Iterator, Set +from typing import Callable, cast, Dict, Iterator, Set, Type import torch from executorch.backends.arm._passes.arm_pass_utils import create_node @@ -117,6 +117,8 @@ class InsertTableOpsPass(ExportPass): which will be used to produce the table values in operators/op_table.py. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def __init__(self, exported_program: ExportedProgram) -> None: super().__init__() self.exported_program = exported_program diff --git a/backends/arm/_passes/match_arg_dtype_pass.py b/backends/arm/_passes/match_arg_dtype_pass.py index e7bf3b2d60e..d482614b03f 100644 --- a/backends/arm/_passes/match_arg_dtype_pass.py +++ b/backends/arm/_passes/match_arg_dtype_pass.py @@ -3,6 +3,8 @@ # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. +from typing import Set, Type + import torch from executorch.backends.arm._passes.arm_pass_utils import create_node, get_node_arg from executorch.exir.dialects._ops import ops as exir_ops @@ -38,6 +40,8 @@ class MatchArgDtypePass(ExportPass): """ + _passes_required_after: Set[Type[ExportPass]] = set() + targeted_ops = {exir_ops.edge.aten.sub.Tensor, exir_ops.edge.aten.where.self} def call(self, graph_module: torch.fx.GraphModule): diff --git a/backends/arm/_passes/match_arg_ranks_pass.py b/backends/arm/_passes/match_arg_ranks_pass.py index d6cdfacb612..c411f3b8083 100644 --- a/backends/arm/_passes/match_arg_ranks_pass.py +++ b/backends/arm/_passes/match_arg_ranks_pass.py @@ -7,7 +7,7 @@ # pyre-unsafe -from typing import cast +from typing import cast, Set, Type from executorch.backends.arm._passes.arm_pass_utils import ( create_node, @@ -36,6 +36,8 @@ class MatchArgRanksPass(ExportPass): input2 = shape(1, 3, 1) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def __init__(self, exported_program): super().__init__() self.exported_program = exported_program diff --git a/backends/arm/_passes/mm_to_bmm_pass.py b/backends/arm/_passes/mm_to_bmm_pass.py index 69d8573013e..6be0b9e2ac4 100644 --- a/backends/arm/_passes/mm_to_bmm_pass.py +++ b/backends/arm/_passes/mm_to_bmm_pass.py @@ -6,6 +6,8 @@ # pyre-unsafe +from typing import Set, Type + import torch from executorch.backends.arm._passes.arm_pass_utils import ( create_node, @@ -28,6 +30,8 @@ class ConvertMmToBmmPass(ExportPass): 3) Squeeze output tensor to rank 2. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call(self, graph_module: torch.fx.GraphModule): modified_graph = False graph = graph_module.graph diff --git a/backends/arm/_passes/remove_noop_pass.py b/backends/arm/_passes/remove_noop_pass.py index 623517aac59..55c4f71f0a8 100644 --- a/backends/arm/_passes/remove_noop_pass.py +++ b/backends/arm/_passes/remove_noop_pass.py @@ -7,6 +7,7 @@ # pyre-unsafe import logging +from typing import Set, Type from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.pass_base import ExportPass @@ -17,6 +18,8 @@ class RemoveNoopPass(ExportPass): """Remove no-ops from graph_module""" + _passes_required_after: Set[Type[ExportPass]] = set() + def call_operator(self, op, args, kwargs, meta): if op not in ( exir_ops.edge.dim_order_ops._clone_dim_order.default, diff --git a/backends/arm/_passes/replace_inf_values_pass.py b/backends/arm/_passes/replace_inf_values_pass.py index 8c721eda3d8..506030d82d7 100644 --- a/backends/arm/_passes/replace_inf_values_pass.py +++ b/backends/arm/_passes/replace_inf_values_pass.py @@ -7,6 +7,8 @@ # This pass is based on backends/qualcomm/_passes/replace_inf_values.py # with some modification to replaced inf values. +from typing import Set, Type + import torch from executorch.exir.pass_base import ExportPass, PassResult @@ -16,6 +18,8 @@ class ReplaceInfValues(ExportPass): Due to limitation in Quantizer, we need to change inf/-inf to more quantizable values. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def __init__(self): super(ReplaceInfValues, self).__init__() diff --git a/backends/arm/_passes/replace_scalar_with_tensor_pass.py b/backends/arm/_passes/replace_scalar_with_tensor_pass.py index 249eb9ffd41..f6ef056f677 100644 --- a/backends/arm/_passes/replace_scalar_with_tensor_pass.py +++ b/backends/arm/_passes/replace_scalar_with_tensor_pass.py @@ -6,7 +6,7 @@ # pyre-unsafe -from typing import Dict, Union +from typing import Dict, Set, Type, Union import torch from executorch.backends.transforms.replace_scalar_with_tensor import ( @@ -15,6 +15,7 @@ from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.dialects.edge._ops import EdgeOpOverload +from executorch.exir.pass_base import ExportPass # Operators that are included for both TOSA profiles @@ -56,6 +57,8 @@ class ReplaceScalarWithTensorArgPassTOSAMI(ReplaceScalarWithTensorArgPass): + _passes_required_after: Set[Type[ExportPass]] = set() + scalar_to_tensor_ops = _common_ops | { exir_ops.edge.aten.pow.Tensor_Scalar: exir_ops.edge.aten.pow.Tensor_Tensor, torch.ops.aten.pow.Tensor_Scalar: torch.ops.aten.pow.Tensor_Tensor, @@ -66,6 +69,8 @@ def __init__(self): class ReplaceScalarWithTensorArgPassTOSABI(ReplaceScalarWithTensorArgPass): + _passes_required_after: Set[Type[ExportPass]] = set() + scalar_to_tensor_ops = _common_ops def __init__(self): diff --git a/backends/arm/_passes/scalars_to_attribute_pass.py b/backends/arm/_passes/scalars_to_attribute_pass.py index 89468bff1ff..bb2a02cc679 100644 --- a/backends/arm/_passes/scalars_to_attribute_pass.py +++ b/backends/arm/_passes/scalars_to_attribute_pass.py @@ -6,7 +6,7 @@ # pyre-unsafe -from typing import cast, Union +from typing import cast, Set, Type, Union import torch from executorch.backends.arm._passes.arm_pass_utils import get_first_fake_tensor @@ -22,6 +22,8 @@ class ScalarsToAttributePass(ExportPass): to attribute Nodes that output the same value. """ + _passes_required_after: Set[Type[ExportPass]] = set() + targeted_ops = [ torch.ops.aten.add.Tensor, torch.ops.aten.add_.Tensor, diff --git a/backends/arm/_passes/size_adjust_input_pass.py b/backends/arm/_passes/size_adjust_input_pass.py index e87d65c450f..5eb77dc56df 100644 --- a/backends/arm/_passes/size_adjust_input_pass.py +++ b/backends/arm/_passes/size_adjust_input_pass.py @@ -5,7 +5,7 @@ # pyre-unsafe -from typing import cast, TypeAlias +from typing import cast, Set, Type, TypeAlias import torch.fx from executorch.backends.arm._passes.arm_pass_utils import create_node @@ -185,6 +185,8 @@ class SizeAdjustInputPass(ExportPass): input. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call(self, graph_module: torch.fx.GraphModule) -> PassResult: graph = graph_module.graph modified_graph = False diff --git a/backends/arm/_passes/to_tosa_memory_format_pass.py b/backends/arm/_passes/to_tosa_memory_format_pass.py index ac16cbaf8cb..dcbdfb03f7b 100644 --- a/backends/arm/_passes/to_tosa_memory_format_pass.py +++ b/backends/arm/_passes/to_tosa_memory_format_pass.py @@ -7,6 +7,7 @@ import logging +from typing import Set, Type import torch from executorch.backends.arm._passes.annotate_decomposed_matmul import ( @@ -48,6 +49,14 @@ class ToTosaMemoryFormatPass(ExportPass): The annotated tosa_dim_order is used to permute the node's shape such that it gives a TOSA-compliant shape. """ + _passes_required_after: Set[Type[ExportPass]] = set() + + NHWC_order = (0, 2, 3, 1) + NHWC_inverse_order = (0, 3, 1, 2) + HWCM_order = (2, 3, 0, 1) + NNHWC_order = (0, 1, 3, 4, 2) + NNHWC_inverse_order = (0, 1, 4, 2, 3) + def __init__(self, exported_program: ExportedProgram) -> None: self.exported_program = exported_program super().__init__() diff --git a/backends/arm/_passes/unsqueeze_before_repeat_pass.py b/backends/arm/_passes/unsqueeze_before_repeat_pass.py index 01983baa9ab..66286b6a954 100644 --- a/backends/arm/_passes/unsqueeze_before_repeat_pass.py +++ b/backends/arm/_passes/unsqueeze_before_repeat_pass.py @@ -1,9 +1,11 @@ -# Copyright 2024 Arm Limited and/or its affiliates. +# Copyright 2024-2025 Arm Limited and/or its affiliates. # All rights reserved. # # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. # pyre-unsafe +from typing import Set, Type + import torch import torch.fx from executorch.backends.arm._passes.arm_pass_utils import ( @@ -29,6 +31,8 @@ class UnsqueezeBeforeRepeatPass(ExportPass): repeat(multiples) """ + _passes_required_after: Set[Type[ExportPass]] = set() + def call(self, graph_module: torch.fx.GraphModule): modified_graph = False for node in graph_module.graph.nodes: diff --git a/backends/arm/_passes/unsqueeze_scalar_placeholders_pass.py b/backends/arm/_passes/unsqueeze_scalar_placeholders_pass.py index ccae9b503cf..d3932dd1217 100644 --- a/backends/arm/_passes/unsqueeze_scalar_placeholders_pass.py +++ b/backends/arm/_passes/unsqueeze_scalar_placeholders_pass.py @@ -5,6 +5,8 @@ # pyre-unsafe +from typing import Set, Type + import torch from executorch.exir.pass_base import ExportPass, PassResult from torch._export.utils import is_buffer, is_param @@ -16,6 +18,8 @@ class UnsqueezeScalarPlaceholdersPass(ExportPass): This pass unsqueezes the placeholders to make sure shape is at least (1,). """ + _passes_required_after: Set[Type[ExportPass]] = set() + def __init__(self, exported_program): self.exported_program = exported_program super().__init__() diff --git a/backends/arm/test/misc/test_pass_required_order.py b/backends/arm/test/misc/test_pass_required_order.py new file mode 100644 index 00000000000..2745d25a498 --- /dev/null +++ b/backends/arm/test/misc/test_pass_required_order.py @@ -0,0 +1,95 @@ +# Copyright 2025 Arm Limited and/or its affiliates. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +import re +from typing import List, Set, Type + +import pytest +from executorch.backends.arm._passes.arm_pass_manager import ArmPass, ArmPassManager +from executorch.backends.arm.tosa.specification import TosaSpecification +from executorch.exir.pass_base import ExportPass + + +class PassC(ArmPass): + _passes_required_after: Set[Type[ExportPass]] = set() + + +class PassB(ArmPass): + _passes_required_after = {PassC} + + +class PassA(ArmPass): + _passes_required_after = {PassB, PassC} + + +class IndependentPass(ArmPass): + _passes_required_after: Set[Type[ExportPass]] = set() + + +def _setup_pass_manager(passes: List[ArmPass] | None = None): + tosa_spec = TosaSpecification.create_from_string("TOSA-1.00+INT") + pass_manager = ArmPassManager(tosa_spec) + if passes is not None: + for p in passes: + pass_manager.add_pass(p) + return pass_manager + + +def test_no_passes(): + pass_manager = _setup_pass_manager() + pass_manager.validate_constraints_mandatory() + + +def test_correct_order(): + pass_manager = _setup_pass_manager([PassA(), PassB(), PassC()]) + pass_manager.validate_constraints_mandatory() + + +def test_run_pass_twice(): + pass_manager = _setup_pass_manager([PassA(), PassB(), PassB(), PassC()]) + pass_manager.validate_constraints_mandatory() + + +def test_independent_pass(): + pass_manager = _setup_pass_manager( + [ + IndependentPass(), + PassA(), + IndependentPass(), + PassB(), + IndependentPass(), + PassC(), + IndependentPass(), + ] + ) + pass_manager.validate_constraints_mandatory() + + +def test_duplicated_requiring_pass_put_last(): + error_msg = """The following constraints for passes are not met: + - PassC must run after PassB +""" + pass_manager = _setup_pass_manager([PassA(), PassB(), PassC(), PassB()]) + with pytest.raises(RuntimeError, match=re.escape(error_msg)): + pass_manager.validate_constraints_mandatory() + + +def test_two_passes_wrong_order(): + error_msg = """The following constraints for passes are not met: + - PassC must run after PassB +""" + pass_manager = _setup_pass_manager([PassC(), PassB()]) + with pytest.raises(RuntimeError, match=re.escape(error_msg)): + pass_manager.validate_constraints_mandatory() + + +def test_missing_passes(): + error_msg = """The following constraints for passes are not met: + - PassC must run after PassA + - PassC must run after PassB +""" + pass_manager = _setup_pass_manager([PassA(), PassB()]) + with pytest.raises(RuntimeError, match=re.escape(error_msg)): + pass_manager.validate_constraints_mandatory() diff --git a/backends/transforms/decompose_sdpa.py b/backends/transforms/decompose_sdpa.py index d49e0da0c9b..6c36d1803fc 100644 --- a/backends/transforms/decompose_sdpa.py +++ b/backends/transforms/decompose_sdpa.py @@ -7,6 +7,7 @@ # pyre-strict import math +from typing import Set, Type import torch from executorch.exir.pass_base import ExportPass, PassResult @@ -19,6 +20,8 @@ class DecomposeScaledDotProductAttention(ExportPass): Decompose from scaled_dot_product_attention to multiple nodes. """ + _passes_required_after: Set[Type[ExportPass]] = set() + def __init__(self, allow_non_fake_inputs: bool = True) -> None: super().__init__() # With allow_non_fake_inputs=False, we don't get _unsafe_view ops diff --git a/backends/transforms/fuse_view_copy.py b/backends/transforms/fuse_view_copy.py index c740515cdcc..1972513d2ef 100644 --- a/backends/transforms/fuse_view_copy.py +++ b/backends/transforms/fuse_view_copy.py @@ -7,6 +7,8 @@ # pyre-strict +from typing import Set, Type + import torch from executorch.exir.dialects._ops import ops as exir_ops from executorch.exir.pass_base import ExportPass, PassResult @@ -62,6 +64,8 @@ def remove_noop_view_copy(graph: torch.fx.Graph) -> tuple[torch.fx.Graph, bool]: class FuseViewCopyTransform(ExportPass): + _passes_required_after: Set[Type[ExportPass]] = set() + def call(self, graph_module: torch.fx.GraphModule) -> PassResult: graph_module.graph, merge_modified = merge_view_copy_chains(graph_module.graph) graph_module.graph, noop_modified = remove_noop_view_copy(graph_module.graph) From 02bacccacbec53e2e469c2aad9e7ce76475c3e2a Mon Sep 17 00:00:00 2001 From: Sebastian Larsson <38941629+Sebastian-Larsson@users.noreply.github.com> Date: Fri, 19 Sep 2025 16:51:57 +0200 Subject: [PATCH 045/395] Arm backend: Convert remaining asserts to exceptions in tosa/ (#14369) In `tosa/quant_utils.py`, add message to assert. In `tosa/backend.py` and `tosa/mapping.py` convert asserts to exceptions. Signed-off-by: Sebastian Larsson --- backends/arm/tosa/backend.py | 9 +++++++-- backends/arm/tosa/mapping.py | 3 ++- backends/arm/tosa/quant_utils.py | 9 +++++++-- 3 files changed, 16 insertions(+), 5 deletions(-) diff --git a/backends/arm/tosa/backend.py b/backends/arm/tosa/backend.py index afae6f8163f..7596573be84 100644 --- a/backends/arm/tosa/backend.py +++ b/backends/arm/tosa/backend.py @@ -104,10 +104,15 @@ def _preprocess( # noqa: C901 # const data directly. Path created and data written only in debug builds. tosa_graph = ts.TosaSerializer(artifact_path) - assert ( + if not ( tosa_spec.version.major == ts.TOSA_VERSION_MAJOR and tosa_spec.version.minor == ts.TOSA_VERSION_MINOR - ), f"TOSA serializer version ({ts.TOSA_VERSION_MAJOR}.{ts.TOSA_VERSION_MINOR}) doesn't match specification {tosa_spec}" + ): + raise RuntimeError( + f"TOSA serializer version " + f"({ts.TOSA_VERSION_MAJOR}.{ts.TOSA_VERSION_MINOR}) " + f"doesn't match specification {tosa_spec}" + ) # TODO: Fix the need to lazily import this. from executorch.backends.arm._passes import ArmPassManager diff --git a/backends/arm/tosa/mapping.py b/backends/arm/tosa/mapping.py index a36b4cf3ebc..935d9f8da77 100644 --- a/backends/arm/tosa/mapping.py +++ b/backends/arm/tosa/mapping.py @@ -84,7 +84,8 @@ def extract_tensor_meta(meta, tosa_spec: TosaSpecification): ValueError: If ``meta['val']`` is not a ``FakeTensor``. """ - assert meta.get("val") is not None + if meta.get("val") is None: + raise ValueError("Expected node.meta['val'] to be set to a FakeTensor") val = meta["val"] if type(val) is tuple: # TODO: should use first concrete representation diff --git a/backends/arm/tosa/quant_utils.py b/backends/arm/tosa/quant_utils.py index 86e8e5bad8b..c87424ad0cc 100644 --- a/backends/arm/tosa/quant_utils.py +++ b/backends/arm/tosa/quant_utils.py @@ -245,7 +245,9 @@ def compute_multiplier_and_shift( const_2_power_15_or_31 = 1 << offset shifted_mantissa = round(mantissa * const_2_power_15_or_31) - assert shifted_mantissa <= const_2_power_15_or_31 + assert ( + shifted_mantissa <= const_2_power_15_or_31 + ), f"Mantissa {shifted_mantissa} exceeds limit {const_2_power_15_or_31}" if shifted_mantissa == const_2_power_15_or_31: shifted_mantissa = shifted_mantissa // 2 @@ -255,7 +257,10 @@ def compute_multiplier_and_shift( shift = offset - shift # INT32_MAX, 2^31 - 1 - assert shifted_mantissa <= (const_2_power_15_or_31 - 1) + assert shifted_mantissa <= (const_2_power_15_or_31 - 1), ( + f"Mantissa {shifted_mantissa} exceeds signed max " + f"{const_2_power_15_or_31 - 1}" + ) multiplier = shifted_mantissa From 6fed7624eb37a4033e49dfd825a05b255b84686e Mon Sep 17 00:00:00 2001 From: Rohan Joshi Date: Fri, 19 Sep 2025 08:43:59 -0700 Subject: [PATCH 046/395] Add prefill API to MultimodalRunner (#14429) Add prefill_inputs function to MultimodalRunner, this is useful for example to prefill chat history --- extension/llm/runner/multimodal_runner.cpp | 10 ++++++++++ extension/llm/runner/multimodal_runner.h | 9 +++++++++ 2 files changed, 19 insertions(+) diff --git a/extension/llm/runner/multimodal_runner.cpp b/extension/llm/runner/multimodal_runner.cpp index b63277c82d2..6928a9b2827 100644 --- a/extension/llm/runner/multimodal_runner.cpp +++ b/extension/llm/runner/multimodal_runner.cpp @@ -62,6 +62,16 @@ Error MultimodalRunner::load() { ET_LOG(Info, format, __VA_ARGS__); \ } +Error MultimodalRunner::prefill(std::vector& inputs) { + if (!is_loaded()) { + ET_CHECK_OK_OR_RETURN_ERROR(load()); + } + for (auto& input : inputs) { + ET_UNWRAP(multimodal_prefiller_->prefill(input, pos_)); + } + return Error::Ok; +} + Error MultimodalRunner::generate( const std::vector& inputs, const GenerationConfig& config, diff --git a/extension/llm/runner/multimodal_runner.h b/extension/llm/runner/multimodal_runner.h index fe5d1d7f1d7..4a824fd4d9c 100644 --- a/extension/llm/runner/multimodal_runner.h +++ b/extension/llm/runner/multimodal_runner.h @@ -119,6 +119,15 @@ class ET_EXPERIMENTAL MultimodalRunner { std::function token_callback = {}, std::function stats_callback = {}); + /** + * Prefill multimodal inputs, for example to reload chat history. + * @param inputs A vector of MultimodalInput objects containing images and + * text. + * @return The error code. KV cache position is tracked internally in pos_. + */ + virtual ::executorch::runtime::Error prefill( + std::vector& inputs); + inline void stop() { text_token_generator_->stop(); } From a548635bf7fbcf3b2a1679eae65b5f0a28439c42 Mon Sep 17 00:00:00 2001 From: Abhinayk Date: Fri, 19 Sep 2025 08:44:12 -0700 Subject: [PATCH 047/395] Add android target recipes and extensive model tests using ios and android recipes (#14290) --- .ci/scripts/test_wheel_package_qnn.sh | 1 + backends/qualcomm/_passes/TARGETS | 1 + export/TARGETS | 10 + export/target_recipes.py | 120 ++++-- export/tests/TARGETS | 13 + export/tests/test_target_recipes.py | 513 ++++++++++++++++++++++---- export/utils.py | 51 +++ 7 files changed, 607 insertions(+), 102 deletions(-) create mode 100644 export/utils.py diff --git a/.ci/scripts/test_wheel_package_qnn.sh b/.ci/scripts/test_wheel_package_qnn.sh index 39c52a4a396..4a50b8e2c36 100644 --- a/.ci/scripts/test_wheel_package_qnn.sh +++ b/.ci/scripts/test_wheel_package_qnn.sh @@ -145,6 +145,7 @@ run_core_tests () { echo "=== [$LABEL] Import smoke tests ===" "$PYBIN" -c "import executorch; print('executorch imported successfully')" "$PYBIN" -c "import executorch.backends.qualcomm; print('executorch.backends.qualcomm imported successfully')" + "$PYBIN" -c "from executorch.export.target_recipes import get_android_recipe; recipe = get_android_recipe('android-arm64-snapdragon-fp16'); print(f'executorch.export.target_recipes imported successfully: {recipe}')" echo "=== [$LABEL] List installed executorch/backends/qualcomm/python ===" local SITE_DIR diff --git a/backends/qualcomm/_passes/TARGETS b/backends/qualcomm/_passes/TARGETS index 62a0fc43a78..876b51d3863 100644 --- a/backends/qualcomm/_passes/TARGETS +++ b/backends/qualcomm/_passes/TARGETS @@ -15,5 +15,6 @@ runtime.python_library( "//executorch/backends/transforms:decompose_sdpa", "//executorch/exir/backend:backend_details", "//executorch/exir/backend:compile_spec_schema", + "//executorch/backends/qualcomm/quantizer:quantizer", ], ) diff --git a/export/TARGETS b/export/TARGETS index ae41393d883..50afa6db6ed 100644 --- a/export/TARGETS +++ b/export/TARGETS @@ -117,9 +117,19 @@ runtime.python_library( "target_recipes.py", ], deps = [ + ":export_utils", "fbsource//third-party/pypi/coremltools:coremltools", "//executorch/export:recipe", "//executorch/backends/xnnpack/recipes:xnnpack_recipes", "//executorch/backends/apple/coreml:coreml_recipes", + "//executorch/backends/qualcomm/recipes:qnn_recipes", + ] +) + +runtime.python_library( + name = "export_utils", + srcs = ["utils.py"], + deps = [ + "//caffe2:torch", ] ) diff --git a/export/target_recipes.py b/export/target_recipes.py index 0a5ae9ce754..2d2eba46b0a 100644 --- a/export/target_recipes.py +++ b/export/target_recipes.py @@ -11,31 +11,14 @@ selection and combine multiple backends optimally for target hardware. """ -import sys +import os from typing import Dict, List -if sys.platform != "win32": - import coremltools as ct - from executorch.backends.apple.coreml.recipes import CoreMLRecipeType - -# pyre-ignore from executorch.backends.xnnpack.recipes import XNNPackRecipeType from executorch.export.recipe import ExportRecipe, RecipeType - - -## IOS Target configs -# The following list of recipes are not exhaustive for CoreML; refer to CoreMLRecipeType for more detailed recipes. -IOS_CONFIGS: Dict[str, List[RecipeType]] = ( - { - # pyre-ignore - "ios-arm64-coreml-fp32": [CoreMLRecipeType.FP32, XNNPackRecipeType.FP32], - # pyre-ignore - "ios-arm64-coreml-fp16": [CoreMLRecipeType.FP16], - # pyre-ignore - "ios-arm64-coreml-int8": [CoreMLRecipeType.PT2E_INT8_STATIC], - } - if sys.platform != "win32" - else {} +from executorch.export.utils import ( + is_supported_platform_for_coreml_lowering, + is_supported_platform_for_qnn_lowering, ) @@ -46,7 +29,7 @@ def _create_target_recipe( Create a combined recipe for a target. Args: - target: Human-readable hardware configuration name + target_config: Human-readable hardware configuration name recipes: List of backend recipe types to combine **kwargs: Additional parameters - each backend will use what it needs @@ -67,7 +50,6 @@ def _create_target_recipe( f"Failed to create {recipe_type.value} recipe for {target_config}: {e}" ) from e - # Combine into single recipe if len(backend_recipes) == 1: return backend_recipes[0] @@ -100,8 +82,24 @@ def get_ios_recipe( recipe = get_ios_recipe('ios-arm64-coreml-int8') session = export(model, recipe, example_inputs) """ - if target_config not in IOS_CONFIGS: - supported = list(IOS_CONFIGS.keys()) + + if not is_supported_platform_for_coreml_lowering(): + raise ValueError("CoreML is not supported on this platform") + + import coremltools as ct + from executorch.backends.apple.coreml.recipes import CoreMLRecipeType + + ios_configs: Dict[str, List[RecipeType]] = { + # pyre-ignore + "ios-arm64-coreml-fp32": [CoreMLRecipeType.FP32, XNNPackRecipeType.FP32], + # pyre-ignore + "ios-arm64-coreml-fp16": [CoreMLRecipeType.FP16], + # pyre-ignore + "ios-arm64-coreml-int8": [CoreMLRecipeType.PT2E_INT8_STATIC], + } + + if target_config not in ios_configs: + supported = list(ios_configs.keys()) raise ValueError( f"Unsupported iOS configuration: '{target_config}'. " f"Supported: {supported}" @@ -113,5 +111,75 @@ def get_ios_recipe( if "minimum_deployment_target" not in kwargs: kwargs["minimum_deployment_target"] = ct.target.iOS17 - backend_recipes = IOS_CONFIGS[target_config] + backend_recipes = ios_configs[target_config] + return _create_target_recipe(target_config, backend_recipes, **kwargs) + + +# Android Recipe +def get_android_recipe( + target_config: str = "android-arm64-snapdragon-fp16", **kwargs +) -> ExportRecipe: + """ + Get Android-optimized recipe for specified hardware configuration. + + Supported configurations: + - 'android-arm64-snapdragon-fp16': QNN fp16 recipe + + Args: + target_config: Android configuration string + **kwargs: Additional parameters for backend recipes + + Returns: + ExportRecipe configured for Android deployment + + Raises: + ValueError: If target configuration is not supported + + Example: + recipe = get_android_recipe('android-arm64-snapdragon-fp16') + session = export(model, recipe, example_inputs) + """ + + if not is_supported_platform_for_qnn_lowering(): + raise ValueError( + "QNN is not supported or not properly configured on this platform" + ) + + try: + # Qualcomm QNN backend runs QNN sdk download on first use + # with a pip install, so wrap it in a try/except + # pyre-ignore + from executorch.backends.qualcomm.recipes import QNNRecipeType + + # (1) if this is called from a pip install, the QNN SDK will be available + # (2) if this is called from a source build, check if qnn is available otherwise, had to run build.sh + if os.getenv("QNN_SDK_ROOT", None) is None: + raise ValueError( + "QNN SDK not found, cannot use QNN recipes. First run `./backends/qualcomm/scripts/build.sh`, if building from source" + ) + except Exception as e: + raise ValueError( + "QNN backend is not available. Please ensure the Qualcomm backend " + "is properly installed and configured, " + ) from e + + android_configs: Dict[str, List[RecipeType]] = { + # pyre-ignore + "android-arm64-snapdragon-fp16": [QNNRecipeType.FP16], + } + + if target_config not in android_configs: + supported = list(android_configs.keys()) + raise ValueError( + f"Unsupported Android configuration: '{target_config}'. " + f"Supported: {supported}" + ) + + kwargs = kwargs or {} + + if target_config == "android-arm64-snapdragon-fp16": + if "soc_model" not in kwargs: + kwargs["soc_model"] = "SM8650" + + backend_recipes = android_configs[target_config] return _create_target_recipe(target_config, backend_recipes, **kwargs) diff --git a/export/tests/TARGETS b/export/tests/TARGETS index 71f28b64df7..7b1578ce508 100644 --- a/export/tests/TARGETS +++ b/export/tests/TARGETS @@ -1,4 +1,5 @@ load("@fbsource//xplat/executorch/build:runtime_wrapper.bzl", "runtime") +load("@fbsource//xplat/executorch/backends/qualcomm/qnn_version.bzl", "get_qnn_library_version") oncall("executorch") @@ -37,11 +38,23 @@ runtime.python_test( srcs = [ "test_target_recipes.py", ], + env = { + "LD_LIBRARY_PATH": "$(location fbsource//third-party/qualcomm/qnn/qnn-{0}:qnn_offline_compile_libs)".format(get_qnn_library_version()), + "QNN_SDK_ROOT": "$(location fbsource//third-party/qualcomm/qnn/qnn-{0}:__dir__)".format(get_qnn_library_version()), + "HTTP_PROXY": "http://fwdproxy:8080", + "HTTPS_PROXY": "http://fwdproxy:8080", + }, + labels = ["long_running"], deps = [ "//executorch/export:lib", "//executorch/export:target_recipes", + "//executorch/export:export_utils", "//executorch/runtime:runtime", "//executorch/backends/xnnpack/recipes:xnnpack_recipes", "//executorch/backends/apple/coreml:coreml_recipes", + "//executorch/backends/qualcomm/recipes:qnn_recipes", + "//executorch/examples/models:models", + "//executorch/backends/xnnpack/test/tester:tester", + "fbsource//third-party/pypi/coremltools:coremltools" ] ) diff --git a/export/tests/test_target_recipes.py b/export/tests/test_target_recipes.py index 7a2a7c87342..61725e58f3a 100644 --- a/export/tests/test_target_recipes.py +++ b/export/tests/test_target_recipes.py @@ -7,54 +7,182 @@ # pyre-strict import logging -import sys +import os import unittest +from typing import Any, Dict, List, Optional, Tuple import torch from executorch.backends.xnnpack.recipes.xnnpack_recipe_provider import ( XNNPACKRecipeProvider, ) -from executorch.export import export, recipe_registry -from executorch.export.target_recipes import get_ios_recipe +from executorch.backends.xnnpack.test.tester import Tester +from executorch.examples.models import MODEL_NAME_TO_MODEL +from executorch.examples.models.model_factory import EagerModelFactory +from executorch.exir.schema import DelegateCall, Program +from executorch.export import ( + export, + ExportRecipe, + ExportSession, + recipe_registry, + StageType, +) +from executorch.export.utils import ( + is_fbcode, + is_supported_platform_for_coreml_lowering, + is_supported_platform_for_qnn_lowering, +) from executorch.runtime import Runtime - -if sys.platform != "win32": - from executorch.backends.apple.coreml.recipes import ( # pyre-ignore - CoreMLRecipeProvider, - ) +from torch import nn, Tensor +from torch.testing import FileCheck +from torchao.quantization.utils import compute_error class TestTargetRecipes(unittest.TestCase): """Test target recipes.""" + class Model(torch.nn.Module): + def __init__(self) -> None: + super().__init__() + self.linear1 = torch.nn.Linear(4, 4) + self.linear2 = torch.nn.Linear(4, 2) + + def forward(self, x: Tensor, y: Tensor) -> Tensor: + a = self.linear1(x) + b = a + y + c = b - x + result = self.linear2(c) + return result + def setUp(self) -> None: torch._dynamo.reset() super().setUp() recipe_registry.register_backend_recipe_provider(XNNPACKRecipeProvider()) - if sys.platform != "win32": + if is_supported_platform_for_coreml_lowering(): + from executorch.backends.apple.coreml.recipes import ( # pyre-ignore + CoreMLRecipeProvider, + ) + # pyre-ignore recipe_registry.register_backend_recipe_provider(CoreMLRecipeProvider()) + if is_fbcode() and is_supported_platform_for_qnn_lowering(): + from executorch.backends.qualcomm.recipes import ( # pyre-ignore + QNNRecipeProvider, + ) + + # pyre-ignore + recipe_registry.register_backend_recipe_provider(QNNRecipeProvider()) + self.model = TestTargetRecipes.Model() + def tearDown(self) -> None: super().tearDown() - @unittest.skipIf(sys.platform == "win32", "Core ML is not available on Windows.") + def check_delegated( + self, program: Program, expected_backends: Optional[List[str]] = None + ) -> None: + """Check if the program has been delegated to expected backends.""" + instructions = program.execution_plan[0].chains[0].instructions + assert instructions is not None + + if expected_backends is None: + # Just check that there's at least one delegate call + self.assertGreater(len(instructions), 0) + for instruction in instructions: + self.assertIsInstance(instruction.instr_args, DelegateCall) + else: + # Check for specific backends + delegates = program.execution_plan[0].delegates + delegate_ids = [delegate.id for delegate in delegates] + for expected_backend in expected_backends: + self.assertIn( + expected_backend, + delegate_ids, + f"Expected backend {expected_backend} not found in delegates: {delegate_ids}", + ) + + def check_num_partitions( + self, executorch_program: Program, expected_num_partitions: int + ) -> None: + """Check if the program has the expected number of partitions.""" + self.assertEqual( + len(executorch_program.execution_plan[0].delegates), + expected_num_partitions, + ) + + def _check_lowering_error( + self, + # pyre-ignore[11] + session: ExportSession, + example_inputs: List[Tuple[Tensor]], + model_name: str, + recipe_key: str, + atol: float = 1e-3, + rtol: float = 1e-3, + ) -> None: + """Compare original model output with session output using tolerance.""" + quantized_model = session.get_stage_artifacts()[StageType.QUANTIZE].data[ + "forward" + ] + lowered_output = session.run_method("forward", *example_inputs)[0] + quantized_output = quantized_model(*example_inputs[0]) + + try: + Tester._assert_outputs_equal( + lowered_output, quantized_output, atol=atol, rtol=rtol + ) + logging.info( + f"Tolerance check passed for {model_name} with atol={atol}, rtol={rtol}" + ) + except AssertionError as e: + raise AssertionError( + f"Model '{model_name}' Recipe: {recipe_key}, tolerance check failed" + ) from e + + def _check_quantization_error( + self, + session: ExportSession, + eager_model: nn.Module, + example_inputs: List[Tuple[Tensor]], + model_name: str, + recipe_key: str, + sqnr_threshold: float = 20.0, + ) -> None: + """Compare original model output with session output using SQNR.""" + eager_output = eager_model(*example_inputs[0]) + + # get quantized model from session + all_artifacts = session.get_stage_artifacts() + quantized_model = all_artifacts[StageType.QUANTIZE].data["forward"] + quantized_output = quantized_model(*example_inputs[0]) + + error = compute_error(eager_output, quantized_output) + logging.info(f"SQNR for {model_name}: {error} dB") + self.assertTrue( + error > sqnr_threshold, + f"Model {model_name}, recipe: {recipe_key} SQNR check failed. Expected > {sqnr_threshold}, got {error}", + ) + + def _check_delegation_with_filecheck(self, session: ExportSession) -> None: + """Check that the lowered module contains expected delegate calls.""" + all_artifacts = session.get_stage_artifacts() + edge_program_manager = all_artifacts[StageType.TO_EDGE_TRANSFORM_AND_LOWER].data + lowered_module = edge_program_manager.exported_program().module() + + # Check if model got lowered + FileCheck().check("torch.ops.higher_order.executorch_call_delegate").run( + lowered_module.code + ) + + # pyre-ignore + @unittest.skipIf( + not is_supported_platform_for_coreml_lowering(), + "Skip test, coreml lowering not supported", + ) def test_ios_fp32_recipe_with_xnnpack_fallback(self) -> None: + from executorch.export.target_recipes import get_ios_recipe + # Linear ops skipped by coreml but handled by xnnpack - class Model(torch.nn.Module): - def __init__(self): - super().__init__() - self.linear1 = torch.nn.Linear(4, 4) - self.linear2 = torch.nn.Linear(4, 2) - - def forward(self, x, y): - a = self.linear1(x) - b = a + y - c = b - x - result = self.linear2(c) - return result - - model = Model() + model = self.model model.eval() example_inputs = [(torch.randn(2, 4), torch.randn(2, 4))] @@ -114,65 +242,298 @@ def forward(self, x, y): et_output = session.run_method("forward", example_inputs[0]) logging.info(f"et output {et_output}") - @unittest.skipIf(sys.platform == "win32", "Core ML is not available on Windows.") - def test_ios_quant_recipes(self) -> None: - class Model(torch.nn.Module): - def __init__(self): - super().__init__() - self.linear1 = torch.nn.Linear(4, 4) - self.linear2 = torch.nn.Linear(4, 2) - - def forward(self, x, y): - a = self.linear1(x) - b = a + y - c = b - x - result = self.linear2(c) - return result - - model = Model() - model.eval() + def _test_model_with_target_recipes( + self, + model_name: str, + recipe: ExportRecipe, + expected_backend_name: str, + eager_model: nn.Module, + example_inputs: Tuple[Tensor], + recipe_key: str, + dynamic_shapes: Optional[Dict[str, Tuple[int, ...]]], + atol: Optional[float] = 1e-1, + rtol: Optional[float] = 1e-1, + sqnr_threshold: Optional[int] = 20, + ) -> None: + """Test a model with a specific target recipe and expected backend.""" + logging.info(f"Testing model {model_name} with {expected_backend_name} backend") + + # Export with the provided recipe + session = export( + model=eager_model, + example_inputs=[example_inputs], + export_recipe=recipe, + dynamic_shapes=dynamic_shapes, + ) + logging.info(f"Exporting done for {model_name}-{recipe_key}") - example_inputs = [(torch.randn(2, 4), torch.randn(2, 4))] + executorch_program = session.get_executorch_program() + self.assertIsNotNone( + executorch_program, + f"ExecuTorch program should not be None for {expected_backend_name}", + ) - for recipe in [ - get_ios_recipe("ios-arm64-coreml-fp16"), - get_ios_recipe("ios-arm64-coreml-int8"), - ]: - # Export the model - session = export( - model=model, example_inputs=example_inputs, export_recipe=recipe - ) + # Check delegation for the expected backend + self.check_delegated(executorch_program, [expected_backend_name]) - # Verify we can create executable - executorch_program = session.get_executorch_program() - session.print_delegation_info() + # Check number of partitions created + self.check_num_partitions(executorch_program, 1) - self.assertIsNotNone( - executorch_program, "ExecuTorch program should not be None" - ) + # Run the model if the backend is available + et_runtime: Runtime = Runtime.get() + backend_registry = et_runtime.backend_registry - # Assert there is an execution plan - self.assertTrue(len(executorch_program.execution_plan) == 1) + logging.info( + f"backends registered: {et_runtime.backend_registry.registered_backend_names}" + ) - # Check number of partitions created - self.assertTrue(len(executorch_program.execution_plan[0].delegates) == 1) + if backend_registry.is_available(expected_backend_name): + logging.info(f"Running with {expected_backend_name} backend") + if atol is not None and rtol is not None: + self._check_lowering_error( + session, + [example_inputs], + model_name, + recipe_key, + atol=atol, + rtol=rtol, + ) + logging.info( + f"Accuracy checks passed for {model_name} with {expected_backend_name} with atol={atol}, rtol={rtol}" + ) + + # Test SQNR if specified + if sqnr_threshold is not None: + self._check_quantization_error( + session, + eager_model, + [example_inputs], + model_name, + recipe_key, + sqnr_threshold=sqnr_threshold, + ) + + logging.info( + f"SQNR check passed for {model_name} with {expected_backend_name} with sqnr={sqnr_threshold}" + ) + + @classmethod + def _get_model_test_configs( + cls, + ) -> Dict[str, Dict[str, Tuple[Optional[float], Optional[float], Optional[int]]]]: + """Get model-specific test configurations for different recipes.""" + # Format: {model_name: {target_recipe_name: (atol, rtol, sqnr_threshold)}} + # If a model/recipe combination is present in this config, the model will be lowered for that recipe. + # A value of `None` for any of atol, rtol, or sqnr_threshold means the corresponding accuracy check will be skipped after lowering. + return { + "linear": { + "ios-arm64-coreml-fp16": (1e-3, 1e-3, 20), + "ios-arm64-coreml-int8": (1e-2, 1e-2, 20), + "android-arm64-snapdragon-fp16": (1e-3, 1e-3, None), + }, + "add": { + "ios-arm64-coreml-fp16": (1e-3, 1e-3, 20), + "ios-arm64-coreml-int8": (1e-3, 1e-3, 20), + "android-arm64-snapdragon-fp16": (1e-3, 1e-3, None), + }, + "add_mul": { + "ios-arm64-coreml-fp16": (1e-3, 1e-3, 20), + "ios-arm64-coreml-int8": (1e-3, 1e-3, 20), + "android-arm64-snapdragon-fp16": (1e-3, 1e-3, None), + }, + "ic3": { + "ios-arm64-coreml-fp16": (1e-1, 1.0, 20), + "ios-arm64-coreml-int8": (None, None, None), + "android-arm64-snapdragon-fp16": (5e-1, 1e-1, None), + }, + "ic4": { + "ios-arm64-coreml-fp16": (1e-1, 1e-1, 20), + "ios-arm64-coreml-int8": (None, None, None), + "android-arm64-snapdragon-fp16": (None, None, None), + }, + "mv2": { + "ios-arm64-coreml-fp16": (5e-2, 5e-2, 20), + "ios-arm64-coreml-int8": (2e-1, 2e-1, 20), + "android-arm64-snapdragon-fp16": (1e-2, 5e-2, None), + }, + "mv3": { + "ios-arm64-coreml-fp16": (2e-1, 2e-1, 20), + "ios-arm64-coreml-int8": (None, None, None), + "android-arm64-snapdragon-fp16": (None, None, None), + }, + "resnet18": { + "ios-arm64-coreml-fp16": (1e-1, 1e-1, 20), + "ios-arm64-coreml-int8": (None, None, None), + "android-arm64-snapdragon-fp16": (2e-1, 2e-1, None), + }, + "resnet50": { + "ios-arm64-coreml-fp16": (1e-2, 1e-2, 20), + "ios-arm64-coreml-int8": (None, None, None), + "android-arm64-snapdragon-fp16": (5e-1, 2e-1, None), + }, + "vit": { + "ios-arm64-coreml-fp16": (None, None, None), # only lower + "ios-arm64-coreml-int8": (None, None, None), # only lower + # Couldn't lower it to qnn + # "android-arm64-snapdragon-fp16": (None, None, None), + }, + "w2l": { + "ios-arm64-coreml-fp16": (1e-2, 1e-2, 20), + "ios-arm64-coreml-int8": (1e-1, 1e-1, 20), + "android-arm64-snapdragon-fp16": (1e-2, 1e-2, None), + }, + } + + @classmethod + def _get_recipes(cls) -> Dict[str, Tuple[ExportRecipe, str]]: + """Get available recipes with their configurations based on platform.""" + all_recipes = {} + + # Add iOS recipes + if is_supported_platform_for_coreml_lowering(): + from executorch.export.target_recipes import get_ios_recipe + + all_recipes = { + "ios-arm64-coreml-fp16": (get_ios_recipe(), "CoreMLBackend"), + "ios-arm64-coreml-int8": ( + get_ios_recipe("ios-arm64-coreml-int8"), + "CoreMLBackend", + ), + } + + # Add android recipes + if is_fbcode() and is_supported_platform_for_qnn_lowering(): + from executorch.export.target_recipes import get_android_recipe + + all_recipes["android-arm64-snapdragon-fp16"] = ( + get_android_recipe(), + "QnnBackend", + ) - # Delegate backend is CoreML - self.assertEqual( - executorch_program.execution_plan[0].delegates[0].id, - "CoreMLBackend", + return all_recipes + + def _run_model_with_recipe( + self, + model_name: str, + recipe_key: str, + eager_model: nn.Module, + example_inputs: Tuple[Tensor], + # pyre-ignore + dynamic_shapes: Any, + ) -> None: + model_configs = self._get_model_test_configs() + recipes = self._get_recipes() + + if model_name not in model_configs: + raise ValueError(f"Model {model_name} not found in test configurations") + + if recipe_key not in recipes: + raise ValueError(f"Recipe {recipe_key} not found in recipe configurations") + + recipe_tolerances = model_configs[model_name] + + if recipe_key not in recipe_tolerances: + raise ValueError(f"Model {model_name} does not support recipe {recipe_key}") + + atol, rtol, sqnr_threshold = recipe_tolerances[recipe_key] + recipe, expected_backend = recipes[recipe_key] + + with torch.no_grad(): + logging.info(f"Running model {model_name} with recipe {recipe_key}") + self._test_model_with_target_recipes( + model_name=model_name, + recipe=recipe, + expected_backend_name=expected_backend, + eager_model=eager_model, + example_inputs=example_inputs, + dynamic_shapes=dynamic_shapes, + recipe_key=recipe_key, + atol=atol, + rtol=rtol, + sqnr_threshold=sqnr_threshold, ) - # Check number of instructions - instructions = executorch_program.execution_plan[0].chains[0].instructions - self.assertIsNotNone(instructions) - self.assertEqual(len(instructions), 1) + def _run_model_with_all_recipes(self, model_name: str) -> None: + if model_name not in MODEL_NAME_TO_MODEL: + self.skipTest(f"Model {model_name} not found in MODEL_NAME_TO_MODEL") + return - et_runtime: Runtime = Runtime.get() - backend_registry = et_runtime.backend_registry - logging.info( - f"backends registered: {et_runtime.backend_registry.registered_backend_names}" - ) - if backend_registry.is_available("CoreMLBackend"): - et_output = session.run_method("forward", example_inputs[0]) - logging.info(f"et output {et_output}") + eager_model, example_inputs, _example_kwarg_inputs, dynamic_shapes = ( + EagerModelFactory.create_model(*MODEL_NAME_TO_MODEL[model_name]) + ) + eager_model = eager_model.eval() + + recipes = self._get_recipes() + model_configs = self._get_model_test_configs() + + try: + # Pre-filter recipes to only those supported by the model + supported_recipes = [] + for recipe_key in recipes.keys(): + if ( + model_name in model_configs + and recipe_key in model_configs[model_name] + ): + supported_recipes.append(recipe_key) + + if not supported_recipes: + self.skipTest(f"Model {model_name} has no supported recipes") + return + + for recipe_key in supported_recipes: + with self.subTest(recipe=recipe_key): + self._run_model_with_recipe( + model_name, + recipe_key, + eager_model, + example_inputs, + dynamic_shapes, + ) + finally: + # Clean up dog.jpg file if it exists + if os.path.exists("dog.jpg"): + os.remove("dog.jpg") + + def test_linear_model(self) -> None: + """Test linear model with all applicable recipes.""" + self._run_model_with_all_recipes("linear") + + def test_add_model(self) -> None: + """Test add model with all applicable recipes.""" + self._run_model_with_all_recipes("add") + + def test_add_mul_model(self) -> None: + """Test add_mul model with all applicable recipes.""" + self._run_model_with_all_recipes("add_mul") + + def test_ic3_model(self) -> None: + """Test ic3 model with all applicable recipes.""" + self._run_model_with_all_recipes("ic3") + + def test_ic4_model(self) -> None: + """Test ic4 model with all applicable recipes.""" + self._run_model_with_all_recipes("ic4") + + def test_mv2_model(self) -> None: + """Test mv2 model with all applicable recipes.""" + self._run_model_with_all_recipes("mv2") + + def test_mv3_model(self) -> None: + """Test mv3 model with all applicable recipes.""" + self._run_model_with_all_recipes("mv3") + + def test_resnet18_model(self) -> None: + """Test resnet18 model with all applicable recipes.""" + self._run_model_with_all_recipes("resnet18") + + def test_resnet50_model(self) -> None: + """Test resnet50 model with all applicable recipes.""" + self._run_model_with_all_recipes("resnet50") + + def test_vit_model(self) -> None: + """Test vit model with all applicable recipes.""" + self._run_model_with_all_recipes("vit") + + def test_w2l_model(self) -> None: + """Test w2l model with all applicable recipes.""" + self._run_model_with_all_recipes("w2l") diff --git a/export/utils.py b/export/utils.py new file mode 100644 index 00000000000..da2c30443c4 --- /dev/null +++ b/export/utils.py @@ -0,0 +1,51 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +# pyre-strict +import logging +import platform + +import torch + + +def is_fbcode() -> bool: + return not hasattr(torch.version, "git_version") + + +# Check if lowering for CoreML is supported on the current platform +def is_supported_platform_for_coreml_lowering() -> bool: + system = platform.system() + machine = platform.machine().lower() + + # Check for Linux x86_64 + if system == "Linux" and machine == "x86_64": + return True + + # Check for macOS aarch64 + if system == "Darwin" and machine in ("arm64", "aarch64"): + return True + + logging.info(f"Unsupported platform: {system} {machine}") + + return False + + +# Check if lowering for QNN is supported on the current platform +def is_supported_platform_for_qnn_lowering() -> bool: + system = platform.system() + machine = platform.machine().lower() + + # Check for Linux x86_64 + if platform.system().lower() == "linux" and platform.machine().lower() in ( + "x86_64", + "amd64", + "i386", + "i686", + ): + return True + + logging.error(f"Unsupported platform for QNN lowering: {system} {machine}") + return False From 641e737706138edb38033c32fbeb8eaa076b1e70 Mon Sep 17 00:00:00 2001 From: Gregory Comer Date: Fri, 19 Sep 2025 12:33:09 -0600 Subject: [PATCH 048/395] RPATH Fix for portable_lib Python Extension (#14422) **Note: This is an attempt to cherry-pick Mergen's RPATH fix from #13254 onto main. Fixes https://github.com/pytorch/executorch/issues/14421. Original description below.** Problem: The _portable_lib.so Python extension built on CI couldn't find PyTorch libraries when installed locally because it had hardcoded absolute paths from the CI build environment. Error: ImportError: dlopen(.../_portable_lib.cpython-311-darwin.so, 0x0002): Library not loaded: @rpath/libtorch_python.dylib Referenced from: .../executorch/extension/pybindings/_portable_lib.cpython-311-darwin.so Reason: tried: '/Users/runner/work/_temp/.../torch/lib/libtorch_python.dylib' (no such file) Root Cause: The CMake build was linking to PyTorch libraries using absolute paths from the build environment, without setting proper relative RPATHs for runtime library resolution. Solution: Added platform-specific relative RPATH settings to the portable_lib target in /Users/mnachin/executorch/CMakeLists.txt (lines 657-669): - macOS: Uses @loader_path/../../../torch/lib to find PyTorch libraries relative to the .so file location - Linux: Uses $ORIGIN/../../../torch/lib for the same purpose - Sets both BUILD_RPATH and INSTALL_RPATH to ensure consistency Impact: This allows the wheel-packaged _portable_lib.so to find PyTorch libraries regardless of the installation location, fixing the runtime linking issue when using ExecutorTorch wheels built on CI. Note: The same fix may be needed for _training_lib if it experiences similar issues. Test Plan: ``` # Build the wheel locally python setup.py bdist_wheel # create fresh conda env conda create -yn executorch_test_11 python=3.11.0 && conda activate executorch_test_11 # install pip install ./dist/executorch-*.whl # Verify python -c "from executorch.extension.pybindings._portable_lib import _load_for_executorch; print('Success!')" ``` Co-authored-by: Mergen Nachin --- .ci/scripts/wheel/test_base.py | 12 ++++++++++++ CMakeLists.txt | 15 +++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/.ci/scripts/wheel/test_base.py b/.ci/scripts/wheel/test_base.py index f8a7309a6c2..278e46fe75a 100644 --- a/.ci/scripts/wheel/test_base.py +++ b/.ci/scripts/wheel/test_base.py @@ -41,6 +41,18 @@ class ModelTest: def run_tests(model_tests: List[ModelTest]) -> None: + # Test that we can import the portable_lib module - verifies RPATH is correct + print("Testing portable_lib import...") + try: + from executorch.extension.pybindings._portable_lib import ( # noqa: F401 + _load_for_executorch, + ) + + print("✓ Successfully imported _load_for_executorch from portable_lib") + except ImportError as e: + print(f"✗ Failed to import portable_lib: {e}") + raise + # Why are we doing this envvar shenanigans? Since we build the testers, which # uses buck, we cannot run as root. This is a sneaky of getting around that # test. diff --git a/CMakeLists.txt b/CMakeLists.txt index fc427d517a9..e419a45a879 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -869,6 +869,21 @@ if(EXECUTORCH_BUILD_PYBIND) target_compile_options(portable_lib PUBLIC ${_pybind_compile_options}) target_link_libraries(portable_lib PRIVATE ${_dep_libs}) + # Set RPATH to find PyTorch libraries relative to the installation location + # This goes from executorch/extension/pybindings up to site-packages, then to + # torch/lib + if(APPLE) + set_target_properties( + portable_lib PROPERTIES BUILD_RPATH "@loader_path/../../../torch/lib" + INSTALL_RPATH "@loader_path/../../../torch/lib" + ) + else() + set_target_properties( + portable_lib PROPERTIES BUILD_RPATH "$ORIGIN/../../../torch/lib" + INSTALL_RPATH "$ORIGIN/../../../torch/lib" + ) + endif() + install( TARGETS portable_lib EXPORT ExecuTorchTargets From 3f17a936ec1ef3a5106bea5a6aaf9e7c0c8d9cbf Mon Sep 17 00:00:00 2001 From: Anthony Shoumikhin Date: Fri, 19 Sep 2025 12:57:38 -0700 Subject: [PATCH 049/395] Run logging test in debug mode only (#14441) --- runtime/platform/test/CMakeLists.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/runtime/platform/test/CMakeLists.txt b/runtime/platform/test/CMakeLists.txt index fee7566da3d..dd480ee0953 100644 --- a/runtime/platform/test/CMakeLists.txt +++ b/runtime/platform/test/CMakeLists.txt @@ -33,8 +33,9 @@ et_cxx_test( # # et_cxx_test(platform_death_test SOURCES executor_pal_death_test.cpp) -# No weak function symbols Windows/MSVC, thus PAL intercept is not supported. -if(NOT WIN32) +# No weak function symbols on Windows/MSVC, thus PAL intercept doesn't work. +# Skip logging tests in Release mode. +if(NOT WIN32 AND NOT CMAKE_BUILD_TYPE STREQUAL "Release") et_cxx_test(logging_test SOURCES logging_test.cpp stub_platform.cpp) set_source_files_properties( logging_test.cpp PROPERTIES COMPILE_DEFINITIONS "ET_MIN_LOG_LEVEL=Debug" From c780f05c4dac3a155bf52988b03927b79b4d0917 Mon Sep 17 00:00:00 2001 From: Siddartha Pothapragada Date: Fri, 19 Sep 2025 13:39:03 -0700 Subject: [PATCH 050/395] Summary: MCU Tests: Add two basic qadd , qlinear qdq tests (#14440) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Enable on CI for consistent signals - Note that this is interim solution until a proper MCU testing standalone pipeline is ready Test Plan: - examples/arm/run_mcu_models_fvp.sh --target=cortex-m55 --models=qadd,qlinear ════════════════════════════════════════════════════════════════ 🏁 MCU MODEL VALIDATION SUMMARY - TARGET: cortex-m55 ════════════════════════════════════════════════════════════════ qadd : ✅ Passed qlinear : ✅ Passed Reviewers: Subscribers: Tasks: Tags: ### Summary [PLEASE REMOVE] See [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests) for ExecuTorch PR guidelines. [PLEASE REMOVE] If this PR closes an issue, please add a `Fixes #` line. [PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests). ### Test plan [PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable. Co-authored-by: Github Executorch --- examples/arm/aot_arm_compiler.py | 17 +++++++++++++++++ examples/arm/run_mcu_models_fvp.sh | 5 +++-- 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/examples/arm/aot_arm_compiler.py b/examples/arm/aot_arm_compiler.py index 5513529509e..8b6e1d4b85e 100644 --- a/examples/arm/aot_arm_compiler.py +++ b/examples/arm/aot_arm_compiler.py @@ -297,6 +297,19 @@ def forward(self, x: torch.Tensor, y: torch.Tensor): can_delegate = True +class QuantLinearTest(torch.nn.Module): + def __init__(self): + super().__init__() + # Define a simple linear layer + self.linear = torch.nn.Linear(61, 37) + + def forward(self, x): + return self.linear(x) + + example_input = (torch.randn([8, 61], dtype=torch.float32),) + can_delegate = True + + models = { "add": AddModule, "add2": AddModule2, @@ -306,6 +319,9 @@ def forward(self, x: torch.Tensor, y: torch.Tensor): "qops": QuantOpTest, "softmax": SoftmaxModule, "MultipleOutputsModule": MultipleOutputsModule, + # TODO: Remove this from here, once we have dedicated MCU test pipeline ready. This is an interim solution. + # See https://github.com/pytorch/executorch/discussions/13944 + "qlinear": QuantLinearTest, } calibration_data = { @@ -330,6 +346,7 @@ def forward(self, x: torch.Tensor, y: torch.Tensor): torch.randn(32, 2, 1) * 1000, ), "softmax": (torch.randn(32, 2, 2),), + "qlinear": (torch.randn(37, 61),), } evaluators = { diff --git a/examples/arm/run_mcu_models_fvp.sh b/examples/arm/run_mcu_models_fvp.sh index 68d5ec03003..3fa980c506b 100755 --- a/examples/arm/run_mcu_models_fvp.sh +++ b/examples/arm/run_mcu_models_fvp.sh @@ -24,9 +24,9 @@ VALID_TARGETS=( ) # Default models for MCU validation with portable kernels -DEFAULT_MODELS=(mv2 mv3 lstm) +DEFAULT_MODELS=(mv2 mv3 lstm qadd qlinear) # Available models (on FVP) -AVAILABLE_MODELS=(mv2 mv3 lstm) +AVAILABLE_MODELS=(mv2 mv3 lstm qadd qlinear) # Add the following models if you want to enable them later (atm they are not working on FVP) # edsr w2l ic3 ic4 resnet18 resnet50 @@ -257,6 +257,7 @@ for model in "${MODELS[@]}"; do -m "$model" \ --target="$ETHOS_TARGET" \ --quantize \ + --enable_qdq_fusion_pass \ --output="arm_test/$model"; then echo "❌ AOT compilation failed for $model" MODEL_SUCCESS=false From 8b114180ef143abb06b0441c0788edec5461e5ad Mon Sep 17 00:00:00 2001 From: Mengwei Liu Date: Fri, 19 Sep 2025 14:51:52 -0700 Subject: [PATCH 051/395] [multimodal] Let Audio take float data blob (#14427) If the processed audio went through Mel transform, the spectrogram are float values. We should allow `Audio` class to be able to take this, since multimodal runner pybind API will have to be able to take processed input. Once we have the pybind API we can do something like: ```python model_id = "mistralai/Voxtral-Mini-3B-2507" processor = AutoProcessor.from_pretrained(model_id) audio_url = "https://huggingface.co/datasets/eustlb/audio-samples/resolve/main/dude_where_is_my_car.wav" conversation = [ { "role": "user", "content": [ {"type": "audio", "url": audio_url}, { "type": "text", "text": "What can you tell me about this audio?", }, ], }, ] inputs = processor.apply_chat_template(conversation, tokenize=True, return_dict=True, return_tensors="pt") inputs_combined = [ make_text_input("[INST][BEGIN_AUDIO]"), make_audio_input(inputs["input_features"]), make_text_input("\nWhat can you tell me about this audio?[/INST]"), ] runner = MultimodalRunner("voxtral.pte", "tekken.json", None) config = GenerationConfig() config.max_new_tokens = 100 runner.generate(inputs_combined, config) ``` ### Summary [PLEASE REMOVE] See [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests) for ExecuTorch PR guidelines. [PLEASE REMOVE] If this PR closes an issue, please add a `Fixes #` line. [PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests). ### Test plan [PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable. --- examples/models/voxtral/multimodal.cpp | 47 +++---- extension/llm/runner/audio.h | 129 +++++++++++++++++- extension/llm/runner/multimodal_prefiller.cpp | 29 ++-- 3 files changed, 158 insertions(+), 47 deletions(-) diff --git a/examples/models/voxtral/multimodal.cpp b/examples/models/voxtral/multimodal.cpp index 17013df96e1..081df27cd67 100644 --- a/examples/models/voxtral/multimodal.cpp +++ b/examples/models/voxtral/multimodal.cpp @@ -103,15 +103,13 @@ MultimodalInput loadPreprocessedAudio(const std::string& audio_path) { ET_LOG(Info, "audio_data len = %zu", n_floats); - // Create Audio multimodal input - auto audio = std::make_unique<::executorch::extension::llm::Audio>(); - audio->batch_size = batch_size; - audio->n_bins = n_bins; - audio->n_frames = n_frames; - audio->data.resize(n_floats * sizeof(float)); - f.read(reinterpret_cast(audio->data.data()), n_floats * sizeof(float)); + std::vector audio_data(n_floats); + f.read(reinterpret_cast(audio_data.data()), n_floats * sizeof(float)); f.close(); - return ::executorch::extension::llm::make_audio_input(std::move(*audio)); + + auto audio = ::executorch::extension::llm::Audio( + std::move(audio_data), batch_size, n_bins, n_frames); + return ::executorch::extension::llm::make_audio_input(std::move(audio)); } /** @@ -206,32 +204,21 @@ MultimodalInput processRawAudioFile( static_cast(sizes[2])); // Create Audio multimodal input from processed features - auto processed_audio = - std::make_unique<::executorch::extension::llm::Audio>(); - processed_audio->batch_size = - static_cast(sizes[0]); // Note: batching for s > 30 doesn't work - // yet, so this will just be = 1. - processed_audio->n_bins = static_cast(sizes[1]); - processed_audio->n_frames = - static_cast(sizes[2]); // And this will just be = 3000. - - size_t total_elements = processed_audio->batch_size * - processed_audio->n_bins * processed_audio->n_frames; - processed_audio->data.resize(total_elements * sizeof(float)); - std::memcpy( - processed_audio->data.data(), - processed_data, - total_elements * sizeof(float)); - + int32_t batch_size = static_cast(sizes[0]); + int32_t n_bins = static_cast(sizes[1]); + int32_t n_frames = static_cast(sizes[2]); + size_t total_elements = batch_size * n_bins * n_frames; + std::vector audio_vec(processed_data, processed_data + total_elements); + auto processed_audio = ::executorch::extension::llm::Audio( + std::move(audio_vec), batch_size, n_bins, n_frames); ET_LOG( Info, "Created processed Audio: batch_size=%d, n_bins=%d, n_frames=%d", - processed_audio->batch_size, - processed_audio->n_bins, - processed_audio->n_frames); - + batch_size, + n_bins, + n_frames); return ::executorch::extension::llm::make_audio_input( - std::move(*processed_audio)); + std::move(processed_audio)); } /** diff --git a/extension/llm/runner/audio.h b/extension/llm/runner/audio.h index 868765950af..ce71513ed17 100644 --- a/extension/llm/runner/audio.h +++ b/extension/llm/runner/audio.h @@ -11,8 +11,11 @@ #pragma once #include #include +#include #include +#include + namespace executorch { namespace extension { namespace llm { @@ -29,14 +32,126 @@ struct ET_EXPERIMENTAL RawAudio { }; /** - * Pre-processed audio inputs, ready to feed directly into an audio - * encoder. + * Pre-processed audio inputs, ready to feed directly into an audio encoder. + * + * The data can be either uint8_t or float. If the audio has gone through a Mel + * transform, we expect the data type to be float (i.e., std::vector), as + * Mel spectrograms are typically represented as floating point values. For raw + * or quantized audio, uint8_t may be used instead. */ -struct ET_EXPERIMENTAL Audio { - std::vector data; - int32_t batch_size; - int32_t n_bins; - int32_t n_frames; +class ET_EXPERIMENTAL Audio final { + public: + // Default constructor + Audio() : batch_size_(0), n_bins_(0), n_frames_(0) {} + + // Constructor for uint8_t data + Audio( + std::vector&& data, + int32_t batch_size, + int32_t n_bins, + int32_t n_frames) + : data_(std::move(data)), + batch_size_(batch_size), + n_bins_(n_bins), + n_frames_(n_frames) { + ET_CHECK_MSG( + data_.index() == 0 && + std::get>(data_).size() == + static_cast(batch_size * n_bins * n_frames), + "data.size() (%zu) does not match batch_size * n_bins * n_frames (%d)", + std::get>(data_).size(), + batch_size * n_bins * n_frames); + } + + // Constructor for float data + Audio( + std::vector&& data, + int32_t batch_size, + int32_t n_bins, + int32_t n_frames) + : data_(std::move(data)), + batch_size_(batch_size), + n_bins_(n_bins), + n_frames_(n_frames) { + ET_CHECK_MSG( + data_.index() == 1 && + std::get>(data_).size() == + static_cast(batch_size * n_bins * n_frames), + "data.size() (%zu) does not match batch_size * n_bins * n_frames (%d)", + std::get>(data_).size(), + batch_size * n_bins * n_frames); + } + + // Type checkers + bool is_uint8() const { + return std::holds_alternative>(data_); + } + + bool is_float() const { + return std::holds_alternative>(data_); + } + + // Data access + const std::vector& get_uint8_data() const& { + return std::get>(data_); + } + + std::vector& get_uint8_data() & { + return std::get>(data_); + } + + const std::vector& get_float_data() const& { + return std::get>(data_); + } + + std::vector& get_float_data() & { + return std::get>(data_); + } + + int32_t get_batch_size() const { + return batch_size_; + } + int32_t get_n_bins() const { + return n_bins_; + } + int32_t get_n_frames() const { + return n_frames_; + } + /** + * Convert the audio data to a TensorPtr, with optional batch dimension. + * The tensor will have shape (batch_size, n_bins, n_frames) or (1, + * batch_size, n_bins, n_frames) if with_batch is true. + */ + executorch::runtime::Result toTensor( + bool with_batch = false) const { + std::vector sizes = { + get_batch_size(), get_n_bins(), get_n_frames()}; + if (with_batch) { + sizes.insert(sizes.begin(), 1); + } + if (is_float()) { + return executorch::extension::from_blob( + const_cast(get_float_data().data()), + sizes, + ::executorch::aten::ScalarType::Float); + } else if (is_uint8()) { + return executorch::extension::from_blob( + const_cast(get_uint8_data().data()), + sizes, + ::executorch::aten::ScalarType::Byte); + } + ET_LOG( + Error, + "Shouldn't reach here, audio data is not initialized with uint8_t or float vector."); + return ::executorch::runtime::Error::NotSupported; + } + + private: + // Members + std::variant, std::vector> data_; + int32_t batch_size_; + int32_t n_bins_; + int32_t n_frames_; }; } // namespace llm diff --git a/extension/llm/runner/multimodal_prefiller.cpp b/extension/llm/runner/multimodal_prefiller.cpp index f9645667f24..824fdf943a9 100644 --- a/extension/llm/runner/multimodal_prefiller.cpp +++ b/extension/llm/runner/multimodal_prefiller.cpp @@ -47,8 +47,9 @@ Result MultimodalPrefiller::prefill( "Failed to get method_meta for %s", kVisionEncoderMethod); - ET_CHECK_MSG( + ET_CHECK_OR_RETURN_ERROR( method_meta.num_inputs() > 0, + InvalidArgument, "Image encoder should have at least 1 input"); auto input_meta = ET_UNWRAP( method_meta.input_tensor_meta(0), @@ -56,12 +57,14 @@ Result MultimodalPrefiller::prefill( auto expected_dtype = input_meta.scalar_type(); if (expected_dtype == ::executorch::aten::ScalarType::Float) { - ET_CHECK_MSG( + ET_CHECK_OR_RETURN_ERROR( image.is_float(), + InvalidArgument, "Model expects float image data, but image has uint8_t data."); } else if (expected_dtype == ::executorch::aten::ScalarType::Byte) { - ET_CHECK_MSG( + ET_CHECK_OR_RETURN_ERROR( image.is_uint8(), + InvalidArgument, "Model expects uint8_t image data, but image has float data."); } else { ET_LOG( @@ -77,7 +80,11 @@ Result MultimodalPrefiller::prefill( auto image_tensor = ET_UNWRAP( image.toTensor(/*with_batch*/ expected_dims.size() == 4), "Failed to convert image to tensor"); - + ET_LOG( + Info, + "Image tensor dim: %zu, dtype: %s", + image_tensor->dim(), + ::executorch::runtime::toString(image_tensor->scalar_type())); // Run image encoder auto image_encoder_outputs = ET_UNWRAP(module_->execute(kVisionEncoderMethod, image_tensor)); @@ -86,12 +93,14 @@ Result MultimodalPrefiller::prefill( } else if (input.is_audio()) { Audio audio = input.get_audio(); - // Use the original tensor shape as intended - auto audio_tensor = executorch::extension::from_blob( - audio.data.data(), - {audio.batch_size, audio.n_bins, audio.n_frames}, - ::executorch::aten::ScalarType::Float); - + // Use Audio::toTensor() for tensor creation + auto audio_tensor = + ET_UNWRAP(audio.toTensor(), "Failed to convert audio to tensor"); + ET_LOG( + Info, + "Audio tensor dim: %zu, dtype: %s", + audio_tensor->dim(), + ::executorch::runtime::toString(audio_tensor->scalar_type())); // Run audio encoder auto audio_encoder_result = module_->execute(kAudioEncoderMethod, audio_tensor); From 07d1092dd06537c55a8345a0e1994670fa748fac Mon Sep 17 00:00:00 2001 From: Kimish Patel Date: Fri, 19 Sep 2025 14:52:23 -0700 Subject: [PATCH 052/395] Add selective build support for prim ops Differential Revision: D81648030 Pull Request resolved: https://github.com/pytorch/executorch/pull/14332 --- codegen/tools/combine_prim_ops_headers.py | 164 +++++++++++++++++++ codegen/tools/gen_all_oplist.py | 20 ++- codegen/tools/gen_oplist.py | 20 ++- codegen/tools/gen_selected_prim_ops.py | 96 +++++++++++ codegen/tools/targets.bzl | 41 +++++ codegen/tools/test/test_gen_oplist.py | 11 +- examples/selective_build/targets.bzl | 114 +++++++++++++ kernels/prim_ops/register_prim_ops.cpp | 91 +++++++++- kernels/prim_ops/selective_build_prim_ops.h | 12 ++ kernels/prim_ops/targets.bzl | 27 ++- shim_et/xplat/executorch/codegen/codegen.bzl | 160 +++++++++++++++++- 11 files changed, 734 insertions(+), 22 deletions(-) create mode 100644 codegen/tools/combine_prim_ops_headers.py create mode 100644 codegen/tools/gen_selected_prim_ops.py create mode 100644 kernels/prim_ops/selective_build_prim_ops.h diff --git a/codegen/tools/combine_prim_ops_headers.py b/codegen/tools/combine_prim_ops_headers.py new file mode 100644 index 00000000000..b579de2047d --- /dev/null +++ b/codegen/tools/combine_prim_ops_headers.py @@ -0,0 +1,164 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +""" +Script to combine multiple selected_prim_ops.h header files into a single header. +This is used by selected_prim_operators_genrule to merge prim ops headers from dependencies. +""" + +import argparse +import os +import sys +from pathlib import Path +from typing import List, Set + + +def read_header_file(file_path: Path) -> Set[str]: + """ + Read a selected_prim_ops.h file and extract the macros and comments. + + Args: + file_path: Path to the header file + + Returns: + macros_set where macros_set contains unique macro defines + """ + macros = set() + + try: + with open(file_path, "r") as f: + for line in f: + line = line.strip() + + # Extract #define statements for prim ops + if line.startswith("#define INCLUDE_") and not line.startswith( + "#define EXECUTORCH_ENABLE" + ): + macros.add(line) + except FileNotFoundError: + print(f"Warning: Header file not found: {file_path}", file=sys.stderr) + except Exception as e: + print(f"Error reading {file_path}: {e}", file=sys.stderr) + + return macros + + +def combine_prim_ops_headers(header_file_paths: List[str], output_path: str) -> None: + """ + Combine multiple selected_prim_ops.h files into a single header. + + Args: + header_files: List of paths to header files to combine + output_path: Path to output the combined header + """ + all_macros = set() + has_selective_build = False + + # Read all header files and collect unique macros + for header_file_path in header_file_paths: + header_file = Path(header_file_path) / "selected_prim_ops.h" + if os.path.exists(header_file): + macros = read_header_file(header_file) + all_macros.update(macros) + if len(all_macros) > 0: + has_selective_build = True + else: + print( + f"Warning: Header file does not exist: {header_file}", file=sys.stderr + ) + + # Generate combined header + header_content = [ + "// Combined header for selective prim ops build", + "// This file is auto-generated by combining multiple selected_prim_ops.h files", + "// Do not edit manually.", + "", + "#pragma once", + "", + ] + + if all_macros and has_selective_build: + header_content.extend( + [ + "// Enable selective build for prim ops", + "#define EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD", + "", + "// Combined prim ops macros from all dependencies", + ] + ) + + # Sort macros for deterministic output + sorted_macros = sorted(all_macros) + header_content.extend(sorted_macros) + else: + header_content.extend( + [ + "// No prim ops found in dependencies - all prim ops will be included", + "// Selective build is disabled", + ] + ) + + header_content.append("") + + # Write the combined header + os.makedirs(os.path.dirname(output_path), exist_ok=True) + with open(output_path, "w") as f: + f.write("\n".join(header_content)) + + +def _get_header_file_paths_from_query_output(query_output_file: str) -> List[str]: + """ + Parse the output of a Buck query command to extract header file paths. + + Args: + query_output_file: Path to the file containing the query output + + Returns: + List of header file paths + """ + header_file_paths = [] + assert ( + query_output_file[0] == "@" + ), "query_output_file is not a valid file path, or it doesn't start with '@'." + query_output_file = query_output_file[1:] + + with open(query_output_file, "r") as f: + for line in f: + # Extract the header file path from the query output + header_file_paths += line.split() + return header_file_paths + + +def main(): + parser = argparse.ArgumentParser( + description="Combine multiple selected_prim_ops.h header files" + ) + parser.add_argument( + "--header_files", + required=True, + help="Comma-separated list of header file paths", + ) + parser.add_argument( + "--output_dir", required=True, help="Output directory for combined header" + ) + + args = parser.parse_args() + import os + + header_file_paths = _get_header_file_paths_from_query_output(args.header_files) + + if not header_file_paths: + print("Error: No header files provided", file=sys.stderr) + sys.exit(1) + + # Generate output path + output_path = os.path.join(args.output_dir, "selected_prim_ops.h") + + combine_prim_ops_headers(header_file_paths, output_path) + + +if __name__ == "__main__": + main() diff --git a/codegen/tools/gen_all_oplist.py b/codegen/tools/gen_all_oplist.py index 5cb93bb9153..f33c3dc935d 100644 --- a/codegen/tools/gen_all_oplist.py +++ b/codegen/tools/gen_all_oplist.py @@ -10,7 +10,7 @@ import sys from functools import reduce from pathlib import Path -from typing import Any, List +from typing import Any, Dict, List import yaml from torchgen.selective_build.selector import ( @@ -72,6 +72,19 @@ def _raise_if_check_prim_ops_fail(options): raise Exception(error) +def _selected_ops_model_dict_is_empty(model_dict: Dict[str, Any]) -> bool: + return ( + not model_dict.get("build_features", []) + and not model_dict.get("custom_classes", []) + and not model_dict.get("et_kernel_metadata", None) + and not model_dict.get("include_all_non_op_selectives", False) + and not model_dict.get("include_all_operators", False) + and not model_dict.get("kernel_metadata", {}) + and not model_dict.get("operators", {}) + ) + + +# flake8: noqa: C901 def main(argv: List[Any]) -> None: """This binary generates 3 files: @@ -171,6 +184,11 @@ def main(argv: List[Any]) -> None: ), f"{model_file_name} is not a valid file path. This is likely a BUCK issue." with open(model_file_name, "rb") as model_file: model_dict = yaml.safe_load(model_file) + # It is possible that we created an empty yaml file. + # This is because et_operator_library may only contain prim ops. + # In that case selected_operators.yaml will be empty. + if _selected_ops_model_dict_is_empty(model_dict): + continue resolved = resolve_model_file_path_to_buck_target(model_file_name) for op in model_dict["operators"]: model_dict["operators"][op]["debug_info"] = [resolved] diff --git a/codegen/tools/gen_oplist.py b/codegen/tools/gen_oplist.py index cca5bf1b1d2..28506050a8e 100644 --- a/codegen/tools/gen_oplist.py +++ b/codegen/tools/gen_oplist.py @@ -9,6 +9,7 @@ import os import sys from enum import IntEnum +from pathlib import Path from typing import Any, Dict, List, Optional, Set import yaml @@ -158,7 +159,7 @@ def _get_et_kernel_metadata_from_ops_yaml(ops_yaml_path: str) -> Dict[str, List[ def _dump_yaml( op_list: List[str], - output_path: str, + output_path: Path, model_name: Optional[str] = None, et_kernel_metadata: Optional[Dict[str, List[str]]] = None, include_all_operators: bool = False, @@ -212,20 +213,23 @@ def create_kernel_key(maybe_kernel_key: str) -> str: def gen_oplist( - output_path: str, + output_path: Path, model_file_path: Optional[str] = None, ops_schema_yaml_path: Optional[str] = None, root_ops: Optional[str] = None, ops_dict: Optional[str] = None, include_all_operators: bool = False, ): - assert ( + if not ( model_file_path or ops_schema_yaml_path or root_ops or ops_dict or include_all_operators - ), "Need to provide either model_file_path or ops_schema_yaml_path or root_ops or ops_dict or include_all_operators." + ): + # dump empty yaml file + _dump_yaml([], output_path) + return assert output_path, "Need to provide output_path for dumped yaml file." op_set = set() @@ -326,9 +330,15 @@ def main(args: List[Any]) -> None: ) options = parser.parse_args(args) + # check if the output_path is a directory, then generate operators + # under selected_operators.yaml + if Path(options.output_path).is_dir(): + output_path = Path(options.output_path) / "selected_operators.yaml" + else: + output_path = Path(options.output_path) try: gen_oplist( - output_path=options.output_path, + output_path=output_path, model_file_path=options.model_file_path, ops_schema_yaml_path=options.ops_schema_yaml_path, root_ops=options.root_ops, diff --git a/codegen/tools/gen_selected_prim_ops.py b/codegen/tools/gen_selected_prim_ops.py new file mode 100644 index 00000000000..4535ffaa57a --- /dev/null +++ b/codegen/tools/gen_selected_prim_ops.py @@ -0,0 +1,96 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +# pyre-unsafe + +import argparse +import os +import sys +from typing import Any, List + +from torchgen.code_template import CodeTemplate # type: ignore[import-not-found] + + +selected_prim_ops_h_template_str = """#pragma once +/** + * Generated by executorch/codegen/tools/gen_selected_prim_ops.py + */ + +$defines +""" +selected_prim_ops_h_template = CodeTemplate(selected_prim_ops_h_template_str) + + +def normalize_op_name(op_name: str) -> str: + """ + Normalize an operator name to a macro-safe format. + Convert op names like "executorch_prim::et_view.default" to "EXECUTORCH_PRIM_ET_VIEW_DEFAULT" + or "aten::sym_size.int" to "ATEN_SYM_SIZE_INT" + """ + # Remove namespace separator and replace with underscore + normalized = op_name.replace("::", "_") + # Replace dots with underscores + normalized = normalized.replace(".", "_") + # Convert to uppercase + normalized = normalized.upper() + # Add INCLUDE_ prefix + normalized = f"INCLUDE_{normalized}" + return normalized + + +def write_selected_prim_ops(prim_op_names: List[str], output_dir: str) -> None: + """ + Generate selected_prim_ops.h from a list of prim op names. + + Args: + prim_op_names: List of prim op names like ["executorch_prim::et_view.default", "aten::sym_size.int"] + output_dir: Directory where to write selected_prim_ops.h + """ + # Generate #define statements for each op + defines = [] + for op_name in prim_op_names: + macro_name = normalize_op_name(op_name) + defines.append(f"#define {macro_name}") + + # Join all defines with newlines + defines_str = "\n".join(defines) + + # Generate header content + header_contents = selected_prim_ops_h_template.substitute(defines=defines_str) + + # Write to file + selected_prim_ops_path = os.path.join(output_dir, "selected_prim_ops.h") + with open(selected_prim_ops_path, "wb") as out_file: + out_file.write(header_contents.encode("utf-8")) + + +def main(argv: List[Any]) -> None: + parser = argparse.ArgumentParser(description="Generate selected prim ops header") + parser.add_argument( + "--prim-op-names", + "--prim_op_names", + help="Comma-separated list of prim op names to include", + required=True, + ) + parser.add_argument( + "--output-dir", + "--output_dir", + help="The directory to store the output header file (selected_prim_ops.h)", + required=True, + ) + + options = parser.parse_args(argv) + + # Parse comma-separated prim op names + prim_op_names = [ + name.strip() for name in options.prim_op_names.split(",") if name.strip() + ] + + write_selected_prim_ops(prim_op_names, options.output_dir) + + +if __name__ == "__main__": + main(sys.argv[1:]) diff --git a/codegen/tools/targets.bzl b/codegen/tools/targets.bzl index acea3370e7d..d594b7178b8 100644 --- a/codegen/tools/targets.bzl +++ b/codegen/tools/targets.bzl @@ -103,6 +103,26 @@ def define_common_targets(is_fbcode = False): _is_external_target = True, ) + runtime.python_library( + name = "combine_prim_ops_headers_lib", + srcs = ["combine_prim_ops_headers.py"], + base_module = "executorch.codegen.tools", + visibility = ["//executorch/..."], + ) + + runtime.python_binary( + name = "combine_prim_ops_headers", + main_module = "executorch.codegen.tools.combine_prim_ops_headers", + package_style = "inplace", + visibility = [ + "PUBLIC", + ], + deps = [ + ":combine_prim_ops_headers_lib", + ], + _is_external_target = True, + ) + runtime.python_test( name = "test_gen_all_oplist", srcs = [ @@ -155,6 +175,27 @@ def define_common_targets(is_fbcode = False): _is_external_target = True, ) + runtime.python_library( + name = "gen_selected_prim_ops_lib", + srcs = ["gen_selected_prim_ops.py"], + base_module = "executorch.codegen.tools", + visibility = ["//executorch/..."], + external_deps = ["torchgen"], + ) + + runtime.python_binary( + name = "gen_selected_prim_ops", + main_module = "executorch.codegen.tools.gen_selected_prim_ops", + package_style = "inplace", + visibility = [ + "PUBLIC", + ], + deps = [ + ":gen_selected_prim_ops_lib", + ], + _is_external_target = True, + ) + if not runtime.is_oss: runtime.cxx_python_extension( name = "selective_build", diff --git a/codegen/tools/test/test_gen_oplist.py b/codegen/tools/test/test_gen_oplist.py index f5c6829d6a0..18689cd2505 100644 --- a/codegen/tools/test/test_gen_oplist.py +++ b/codegen/tools/test/test_gen_oplist.py @@ -8,6 +8,7 @@ import os import tempfile import unittest +from pathlib import Path from typing import Dict, List from unittest.mock import NonCallableMock, patch @@ -77,7 +78,7 @@ def test_gen_op_list_with_valid_root_ops( gen_oplist.main(args) mock_dump_yaml.assert_called_once_with( ["aten::add", "aten::mul"], - output_path, + Path(output_path), None, {"aten::add": ["default"], "aten::mul": ["default"]}, False, @@ -100,7 +101,7 @@ def test_gen_op_list_with_root_ops_and_dtypes( gen_oplist.main(args) mock_dump_yaml.assert_called_once_with( ["aten::add", "aten::mul"], - output_path, + Path(output_path), None, { "aten::add": [ @@ -129,7 +130,7 @@ def test_gen_op_list_with_both_op_list_and_ops_schema_yaml_merges( gen_oplist.main(args) mock_dump_yaml.assert_called_once_with( ["aten::add.out", "aten::mul.out", "aten::relu.out"], - output_path, + Path(output_path), test_path, { "aten::relu.out": ["default"], @@ -153,7 +154,7 @@ def test_gen_op_list_with_include_all_operators( gen_oplist.main(args) mock_dump_yaml.assert_called_once_with( ["aten::add", "aten::mul"], - output_path, + Path(output_path), None, {"aten::add": ["default"], "aten::mul": ["default"]}, True, @@ -164,7 +165,7 @@ def test_get_custom_build_selector_with_both_allowlist_and_yaml( ) -> None: op_list = ["aten::add", "aten::mul"] filename = os.path.join(self.temp_dir.name, "selected_operators.yaml") - gen_oplist._dump_yaml(op_list, filename, "model.pte") + gen_oplist._dump_yaml(op_list, Path(filename), "model.pte") self.assertTrue(os.path.isfile(filename)) with open(filename) as f: es = yaml.safe_load(f) diff --git a/examples/selective_build/targets.bzl b/examples/selective_build/targets.bzl index 72639fef842..bd11a53e3e0 100644 --- a/examples/selective_build/targets.bzl +++ b/examples/selective_build/targets.bzl @@ -1,6 +1,118 @@ load("@fbsource//xplat/executorch/build:runtime_wrapper.bzl", "get_oss_build_kwargs", "is_xplat", "runtime") load("@fbsource//xplat/executorch/codegen:codegen.bzl", "et_operator_library", "executorch_generated_lib", "ScalarType") +def define_selective_build_prim_ops_example(): + """ + Example showing how selected_prim_operators_genrule works to combine + prim ops headers from multiple dependencies. + """ + + # Define several operator libraries with automatic prim ops extraction + et_operator_library( + name = "model_a_ops", + ops = [ + "aten::add.out", + "aten::mul.out", + "executorch_prim::et_view.default", # Auto-extracted to prim ops + "aten::sym_size.int", # Auto-extracted to prim ops + ], + visibility = ["//executorch/..."], + ) + # This creates: "model_a_ops" + "model_a_ops_selected_prim_ops" + + et_operator_library( + name = "model_b_ops", + ops = [ + "aten::sub.out", + "aten::div.out", + "executorch_prim::add.Scalar", # Auto-extracted to prim ops + "aten::sym_numel.int", # Auto-extracted to prim ops + ], + visibility = ["//executorch/..."], + ) + # This creates: "model_b_ops" + "model_b_ops_selected_prim_ops" + + # Define a manual prim ops target as well + et_operator_library( + name = "extra_prim_ops", + ops = [ + "executorch_prim::mul.Scalar", + "executorch_prim::sym_max.Scalar", + ], + visibility = ["//executorch/..."], + ) + # Use the combined header in an executorch_generated_lib + executorch_generated_lib( + name = "library_with_combined_prim_ops", + deps = [ + ":model_a_ops", + ":model_b_ops", + ":extra_prim_ops", + ], + kernel_deps = [ + "//executorch/kernels/portable:operators", + ], + functions_yaml_target = "//executorch/kernels/portable:functions.yaml", + aten_mode = False, + visibility = ["PUBLIC"], + include_all_prim_ops = False, + ) + + # Prim ops selected separately + et_operator_library( + name = "model_b_ops_no_prim_ops", + ops = [ + "aten::sub.out", + "aten::div.out", + ], + visibility = ["//executorch/..."], + ) + + # Use the combined header in an executorch_generated_lib + executorch_generated_lib( + name = "library_with_combined_prim_ops_1", + deps = [ + ":model_b_ops_no_prim_ops", + ":extra_prim_ops", + ], + kernel_deps = [ + "//executorch/kernels/portable:operators", + ], + functions_yaml_target = "//executorch/kernels/portable:functions.yaml", + aten_mode = False, + visibility = ["PUBLIC"], + include_all_prim_ops = False, + ) + + # No prim ops selected. So include all prim ops. + executorch_generated_lib( + name = "library_with_combined_prim_ops_2", + deps = [ + ":model_b_ops_no_prim_ops", + ], + kernel_deps = [ + "//executorch/kernels/portable:operators", + ], + functions_yaml_target = "//executorch/kernels/portable:functions.yaml", + aten_mode = False, + visibility = ["PUBLIC"], + include_all_prim_ops = False, + ) + + # default to selecting all prim ops + executorch_generated_lib( + name = "library_with_all_prim_ops", + deps = [ + ":model_b_ops", + ], + kernel_deps = [ + "//executorch/kernels/portable:operators", + ], + functions_yaml_target = "//executorch/kernels/portable:functions.yaml", + aten_mode = False, + visibility = ["PUBLIC"], + ) + def define_common_targets(): """Defines targets that should be shared between fbcode and xplat. @@ -165,3 +277,5 @@ def define_common_targets(): define_static_target = True, **get_oss_build_kwargs() ) + + define_selective_build_prim_ops_example() diff --git a/kernels/prim_ops/register_prim_ops.cpp b/kernels/prim_ops/register_prim_ops.cpp index 8607c36204d..dc6ed9ac26f 100644 --- a/kernels/prim_ops/register_prim_ops.cpp +++ b/kernels/prim_ops/register_prim_ops.cpp @@ -12,6 +12,18 @@ #include #include +/* +For internal builds using buck rules, the target that depends on +selective prim ops, will manage its own artifacts. It is in the +artifacts directory where the geneated selected_prim_ops.h resides +and thus compilation sources must be copied there including +selective_build_prim_ops.h. Hence it does not have fully qualified +name unlike the header files above. +*/ +#ifdef ET_PRIM_OPS_SELECTIVE_BUILD +#include "selective_build_prim_ops.h" +#endif + #include #include @@ -87,6 +99,8 @@ void floor_div_double(double a, double b, EValue& out) { } static Kernel prim_ops[] = { +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_ATEN_SYM_SIZE_INT) // aten::sym_size.int(Tensor self, int dim) -> SymInt Kernel( "aten::sym_size.int", @@ -108,6 +122,9 @@ static Kernel prim_ops[] = { int64_t size = self_tensor.size(dim_val); out = EValue(size); }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_ATEN_LOCAL_SCALAR_DENSE) // aten::_local_scalar_dense(Tensor self) -> Scalar Kernel( "aten::_local_scalar_dense", @@ -134,6 +151,9 @@ static Kernel prim_ops[] = { out = EValue(Scalar(self_tensor.const_data_ptr()[0])); }); }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_ATEN_SYM_NUMEL) // aten::sym_numel(Tensor self) -> SymInt Kernel( "aten::sym_numel", @@ -153,6 +173,9 @@ static Kernel prim_ops[] = { int64_t numel = self_tensor.numel(); out = EValue(numel); }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_SYM_MAX_SCALAR) // executorch_prim::sym_max.Scalar(SymInt a, SymInt b) -> SymInt Kernel( "executorch_prim::sym_max.Scalar", @@ -182,6 +205,9 @@ static Kernel prim_ops[] = { (size_t)b.tag); } }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_SYM_MIN_SCALAR) // executorch_prim::sym_min.Scalar(SymInt a, SymInt b) -> SymInt Kernel( "executorch_prim::sym_min.Scalar", @@ -210,27 +236,39 @@ static Kernel prim_ops[] = { (size_t)b.tag); } }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_ADD_SCALAR) // executorch_prim::add.Scalar(Scalar, Scalar) -> Scalar Kernel( "executorch_prim::add.Scalar", [](KernelRuntimeContext& context, Span stack) { ALGEBRA_ET_PRIM_OP(+, stack, context); }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_SUB_SCALAR) // executorch_prim::sub.Scalar(Scalar, Scalar) -> Scalar Kernel( "executorch_prim::sub.Scalar", [](KernelRuntimeContext& context, Span stack) { ALGEBRA_ET_PRIM_OP(-, stack, context); }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_MUL_SCALAR) // executorch_prim::mul.Scalar(Scalar, Scalar) -> Scalar Kernel( "executorch_prim::mul.Scalar", [](KernelRuntimeContext& context, Span stack) { ALGEBRA_ET_PRIM_OP(*, stack, context); }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_FLOORDIV_SCALAR) /** * Python's __floordiv__ operator is more complicated than just floor(a / * b). It aims to maintain the property: a == (a // b) * b + remainder(a, b) @@ -280,8 +318,11 @@ static Kernel prim_ops[] = { (size_t)b.tag); } }), +#endif - // executorch_prim::floordiv.Scalar(Scalar, Scalar) -> Scalar +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_TRUEDIV_SCALAR) + // executorch_prim::truediv.Scalar(Scalar, Scalar) -> Scalar Kernel( "executorch_prim::truediv.Scalar", [](KernelRuntimeContext& context, Span stack) { @@ -318,7 +359,10 @@ static Kernel prim_ops[] = { (size_t)b.tag); } }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_SYM_FLOAT_SCALAR) // executorch_prim::sym_float.Scalar(Scalar) -> Scalar Kernel( "executorch_prim::sym_float.Scalar", @@ -346,41 +390,60 @@ static Kernel prim_ops[] = { context, false, InvalidType, /* void */, "%zu", (size_t)a.tag); } }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_EQ_SCALAR) // executorch_prim::eq.Scalar(Scalar, Scalar) -> bool Kernel( "executorch_prim::eq.Scalar", [](KernelRuntimeContext& context, Span stack) { BOOLEAN_ET_PRIM_OP(==, stack, context); }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_GT_SCALAR) // executorch_prim::gt.Scalar(Scalar, Scalar) -> bool Kernel( "executorch_prim::gt.Scalar", [](KernelRuntimeContext& context, Span stack) { BOOLEAN_ET_PRIM_OP(>, stack, context); }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_LT_SCALAR) // executorch_prim::lt.Scalar(Scalar, Scalar) -> bool Kernel( "executorch_prim::lt.Scalar", [](KernelRuntimeContext& context, Span stack) { BOOLEAN_ET_PRIM_OP(<, stack, context); }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_GE_SCALAR) // executorch_prim::ge.Scalar(Scalar, Scalar) -> bool Kernel( "executorch_prim::ge.Scalar", [](KernelRuntimeContext& context, Span stack) { BOOLEAN_ET_PRIM_OP(>=, stack, context); }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_LE_SCALAR) // executorch_prim::le.Scalar(Scalar, Scalar) -> bool Kernel( "executorch_prim::le.Scalar", [](KernelRuntimeContext& context, Span stack) { BOOLEAN_ET_PRIM_OP(<=, stack, context); }), +#endif + +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_NEG_SCALAR) // executorch_prim::neg.Scalar(Scalar) -> Scalar Kernel( "executorch_prim::neg.Scalar", @@ -404,7 +467,10 @@ static Kernel prim_ops[] = { context, false, InvalidType, /* void */, "%zu", (size_t)a.tag); } }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_FLOORDIV_INT) // executorch_prim::floordiv.int(int, int) -> int Kernel( "executorch_prim::floordiv.int", @@ -422,7 +488,10 @@ static Kernel prim_ops[] = { EValue& out = *stack[2]; out = EValue(a.toInt() / b.toInt()); }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_MOD_INT) // executorch_prim::mod.int(int, int) -> int Kernel( "executorch_prim::mod.int", @@ -440,7 +509,10 @@ static Kernel prim_ops[] = { EValue& out = *stack[2]; out = EValue(a.toInt() % b.toInt()); }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_MOD_SCALAR) // executorch_prim::mod.Scalar(Scalar, Scalar) -> Scalar Kernel( "executorch_prim::mod.Scalar", @@ -469,7 +541,10 @@ static Kernel prim_ops[] = { (size_t)b.tag); } }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_CEIL_SCALAR) // ceil.Scalar(Scalar a) -> Scalar Kernel( "executorch_prim::ceil.Scalar", @@ -496,7 +571,10 @@ static Kernel prim_ops[] = { (size_t)a.tag); } }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_ROUND_SCALAR) // round.Scalar(Scalar a) -> Scalar Kernel( "executorch_prim::round.Scalar", @@ -540,7 +618,10 @@ static Kernel prim_ops[] = { (size_t)a.tag); } }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_TRUNC_SCALAR) // trunc.Scalar(Scalar a) -> Scalar Kernel( "executorch_prim::trunc.Scalar", @@ -562,19 +643,27 @@ static Kernel prim_ops[] = { context, false, InvalidType, /* void */, "%zu", (size_t)a.tag); } }), +#endif +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_ET_COPY_INDEX_TENSOR) // executorch_prim::et_copy_index.tensor(tensor, tensor) -> tensor Kernel( "executorch_prim::et_copy_index.tensor", [](KernelRuntimeContext& context, Span stack) { et_copy_index(context, stack); }), +#endif + +#if !defined(EXECUTORCH_ENABLE_PRIM_OPS_SELECTIVE_BUILD) || \ + defined(INCLUDE_EXECUTORCH_PRIM_ET_VIEW_DEFAULT) // executorch_prim::et_view.default(Tensor, int[]) -> Tensor Kernel( "executorch_prim::et_view.default", [](KernelRuntimeContext& context, Span stack) { et_view(context, stack); }), +#endif }; diff --git a/kernels/prim_ops/selective_build_prim_ops.h b/kernels/prim_ops/selective_build_prim_ops.h new file mode 100644 index 00000000000..78181405b11 --- /dev/null +++ b/kernels/prim_ops/selective_build_prim_ops.h @@ -0,0 +1,12 @@ +#pragma once +/** + * Generated by executorch/kernels/prim_ops/selective_build_prim_ops.h + * This header conditionally includes selected_prim_ops.h when selective build + * for prim ops is enabled. + */ + +// If no prim ops are selected, then the header is empty. +// that would mean all prim ops are enabled. +#ifdef ET_PRIM_OPS_SELECTIVE_BUILD +#include "selected_prim_ops.h" +#endif diff --git a/kernels/prim_ops/targets.bzl b/kernels/prim_ops/targets.bzl index 8bdc44fe553..eea66c1afa7 100644 --- a/kernels/prim_ops/targets.bzl +++ b/kernels/prim_ops/targets.bzl @@ -7,13 +7,31 @@ def define_common_targets(): TARGETS and BUCK files that call this function. """ + # Define the filegroup once outside the loop since it doesn't vary by aten mode + runtime.filegroup( + name = "prim_ops_sources", + srcs = ["register_prim_ops.cpp"], + visibility = ["//executorch/...", "@EXECUTORCH_CLIENTS"], + ) + + runtime.filegroup( + name = "selective_build_prim_ops.h", + srcs = ["selective_build_prim_ops.h"], + visibility = ["//executorch/...", "@EXECUTORCH_CLIENTS"], + ) + for aten_mode in get_aten_mode_options(): aten_suffix = ("_aten" if aten_mode else "") runtime.cxx_library( name = "et_copy_index" + aten_suffix, srcs = ["et_copy_index.cpp"], - visibility = [], # Private + # To allow for selective prim ops to depend on this library. + # Used by selective_build.bzl + visibility = [ + "//executorch/...", + "@EXECUTORCH_CLIENTS", + ], exported_headers = ["et_copy_index.h"], deps = [ "//executorch/runtime/kernel:kernel_includes" + aten_suffix, @@ -28,7 +46,12 @@ def define_common_targets(): runtime.cxx_library( name = "et_view" + aten_suffix, srcs = ["et_view.cpp"], - visibility = [], # Private + # To allow for selective prim ops to depend on this library. + # Used by selective_build.bzl + visibility = [ + "//executorch/...", + "@EXECUTORCH_CLIENTS", + ], exported_headers = ["et_view.h"], deps = [ "//executorch/runtime/kernel:kernel_includes" + aten_suffix, diff --git a/shim_et/xplat/executorch/codegen/codegen.bzl b/shim_et/xplat/executorch/codegen/codegen.bzl index ae6b42e2d8f..3546b64cdb6 100644 --- a/shim_et/xplat/executorch/codegen/codegen.bzl +++ b/shim_et/xplat/executorch/codegen/codegen.bzl @@ -7,6 +7,7 @@ load( "get_vec_deps", "get_vec_preprocessor_flags", ) +load("@fbsource//xplat/executorch/kernels/prim_ops:selective_build.bzl", "prim_ops_registry_selective") # Headers that declare the function signatures of the C++ functions that # map to entries in functions.yaml and custom_ops.yaml. @@ -81,6 +82,83 @@ ScalarType = enum( "Uint64", ) +def _get_prim_ops_registry_target(name, deps, aten_suffix, platforms): + """ + Helper function to determine which prim ops registry target to use. + + Args: + name: Base name for creating selective registry target + deps: List of dependencies for the selective registry target, it will filter out + the deps with label et_operator_library + aten_suffix: Suffix for aten mode (e.g. "_aten") + platforms: Platforms configuration + + Returns: + String: Target name for the appropriate prim ops registry + """ + # If selective build targets are specified, create a selective prim ops registry + # Create a selective prim ops registry using the existing function + selective_prim_ops_registry_name = name + "_selected_prim_ops_registry" + combined_prim_ops_header_target_name = name + "_combined_prim_ops_header" + selected_prim_operators_genrule(combined_prim_ops_header_target_name, deps, platforms) + # Use the existing prim_ops_registry_selective function + prim_ops_registry_selective( + name = selective_prim_ops_registry_name, + selected_prim_ops_header_target = ":"+combined_prim_ops_header_target_name, + aten_suffix = aten_suffix, + platforms = platforms, + ) + + # Return the selective registry target + return ":" + selective_prim_ops_registry_name + +def _extract_prim_ops_from_lists(ops, ops_dict): + """ + Utility function to extract prim ops from ops list and ops_dict. + + Args: + ops: List of operator names + ops_dict: Dictionary mapping ops to metadata + + Returns: + Tuple of (prim_ops, remaining_ops, remaining_ops_dict) + """ + def _is_aten_prim_op(op_name): + if not op_name.startswith("aten::"): + return False + for prim_suffix in [ + "sym_size", "sym_numel", "sym_max", "sym_min", "sym_float" + ]: + if prim_suffix in op_name: + return True + return False + + def _is_prim_op(op_name): + """Check if an operator is a primitive operation.""" + return op_name.startswith("executorch_prim::") or ( + _is_aten_prim_op(op_name) + ) + + prim_ops = [] + remaining_ops = [] + remaining_ops_dict = {} + + # Extract from ops list + for op in ops: + if _is_prim_op(op): + prim_ops.append(op) + else: + remaining_ops.append(op) + + # Extract from ops_dict + for op, metadata in ops_dict.items(): + if _is_prim_op(op): + prim_ops.append(op) + else: + remaining_ops_dict[op] = metadata + + return prim_ops, remaining_ops, remaining_ops_dict + # Hide the dependency to caffe2 internally. def et_operator_library( name, @@ -91,6 +169,27 @@ def et_operator_library( ops_schema_yaml_target = None, server_generated_yaml_target = None, **kwargs): + + # Check if we should extract prim ops from the operator lists + # Note that selective build for prim ops doesnt support model or ops_schema_yaml_target or server_generated_yaml_target + # TODO: Add support for selective build for prim ops with model or ops_schema_yaml_target or server_generated_yaml_target + should_extract_prim_ops = (ops or ops_dict) and not (model or ops_schema_yaml_target or server_generated_yaml_target or include_all_operators) + + if should_extract_prim_ops: + # Extract prim ops from ops and ops_dict + prim_ops, remaining_ops, remaining_ops_dict = _extract_prim_ops_from_lists(ops, ops_dict) + # Use the remaining ops (with prim ops removed) for the main et_operator_library + final_ops = remaining_ops + final_ops_dict = remaining_ops_dict + else: + # No prim ops extraction needed - use original ops and ops_dict + prim_ops = [] + final_ops = ops + final_ops_dict = ops_dict + + selected_operator_yaml_filename = "selected_operators.yaml" + selected_prim_ops_filename = "selected_prim_ops.h" + # Generate the main operator library with the final ops # do a dummy copy if server_generated_yaml_target is set if server_generated_yaml_target: if include_all_operators or ops_schema_yaml_target or model or ops or ops_dict: @@ -98,7 +197,7 @@ def et_operator_library( genrule_cmd = [ "cp", "$(location {})".format(server_generated_yaml_target), - "$OUT", + "$OUT/{}".format(selected_operator_yaml_filename), ] else: genrule_cmd = [ @@ -109,12 +208,12 @@ def et_operator_library( genrule_cmd.append( "--ops_schema_yaml_path=$(location {})".format(ops_schema_yaml_target), ) - if ops: + if final_ops: genrule_cmd.append( - "--root_ops=" + ",".join(ops), + "--root_ops=" + ",".join(final_ops), ) - if ops_dict: - ops_dict_json = struct_to_json(ops_dict) + if final_ops_dict: + ops_dict_json = struct_to_json(final_ops_dict) genrule_cmd.append( "--ops_dict='{}'".format(ops_dict_json), ) @@ -127,6 +226,15 @@ def et_operator_library( "--include_all_operators", ) + prim_ops_genrule_cmd = [ + "$(exe //executorch/codegen/tools:gen_selected_prim_ops)", + "--prim_op_names=" + ",".join(prim_ops), + "--output_dir=${OUT}", + ] + # Here we generate the selected_prim_ops.h and the selected_operators.yaml file + # both with single genrule + genrule_cmd = genrule_cmd + [" && "] + prim_ops_genrule_cmd + # TODO(larryliu0820): Remove usages of this flag. if "define_static_targets" in kwargs: kwargs.pop("define_static_targets") @@ -134,7 +242,8 @@ def et_operator_library( name = name, macros_only = False, cmd = " ".join(genrule_cmd), - out = "selected_operators.yaml", + outs = {selected_operator_yaml_filename: [selected_operator_yaml_filename], selected_prim_ops_filename: [selected_prim_ops_filename]}, + default_outs = ["."], labels = ["et_operator_library"], **kwargs ) @@ -615,6 +724,31 @@ def selected_operators_genrule( platforms = platforms, ) +def selected_prim_operators_genrule( + name, + deps, + platforms = get_default_executorch_platforms(), +): + """Generates selected_prim_ops.h from the list of deps. We look into the transitive closure of all the deps, + and look for targets with label `et_operator_library`. + + `combine_prim_ops_headers` is the python binary we use to aggregate all the `selected_prim_ops.h` headers + from `et_prim_ops_library` targets into a single combined `selected_prim_ops.h` file. + + This file can be used to enable selective build for prim ops across multiple dependencies. + """ + cmd = ("$(exe //executorch/codegen/tools:combine_prim_ops_headers) " + + "--header_files $(@query_outputs \'attrfilter(labels, et_operator_library, deps(set({deps})))\') " + + "--output_dir $OUT ").format(deps = " ".join(["\"{}\"".format(d) for d in deps])) + runtime.genrule( + name = name, + macros_only = False, + cmd = cmd, + outs = {"selected_prim_ops.h": ["selected_prim_ops.h"]}, + default_outs = ["."], + platforms = platforms, + ) + def dtype_header_genrule( name, visibility, @@ -677,7 +811,8 @@ def executorch_generated_lib( dtype_selective_build = False, feature = None, expose_operator_symbols = False, - support_exceptions = True): + support_exceptions = True, + include_all_prim_ops = True): """Emits 0-3 C++ library targets (in fbcode or xplat) containing code to dispatch the operators specified in the provided yaml files. @@ -738,6 +873,9 @@ def executorch_generated_lib( support_exceptions: enable try/catch wrapper around operator implementations to make sure exceptions thrown will not bring down the process. Disable if your use case disables exceptions in the build. + include_all_prim_ops: If true, include all prim ops in the generated library. This option + allows for selecting only some prim ops to reduce code size for extremely constrained + environments. For selecting only some prim ops, see examples in //executorch/examples/selective_build """ if functions_yaml_target and aten_mode: fail("{} is providing functions_yaml_target in ATen mode, it will be ignored. `native_functions.yaml` will be the source of truth.".format(name)) @@ -903,6 +1041,12 @@ def executorch_generated_lib( if name in libs: lib_name = name + + if include_all_prim_ops: + prim_ops_registry_target = "//executorch/kernels/prim_ops:prim_ops_registry" + aten_suffix + else: + prim_ops_registry_target = _get_prim_ops_registry_target(name, deps, aten_suffix, platforms) + runtime.cxx_library( name = lib_name, srcs = [ @@ -927,7 +1071,7 @@ def executorch_generated_lib( }) + compiler_flags, deps = [ "//executorch/runtime/kernel:operator_registry" + aten_suffix, - "//executorch/kernels/prim_ops:prim_ops_registry" + aten_suffix, + prim_ops_registry_target, # Use the appropriate prim ops registry "//executorch/runtime/core:evalue" + aten_suffix, "//executorch/codegen:macros", ] + deps + kernel_deps, From 8c74545866772621b79217d5b25c7dac2c6fa2c0 Mon Sep 17 00:00:00 2001 From: Hansong Zhang <107070759+kirklandsign@users.noreply.github.com> Date: Fri, 19 Sep 2025 16:47:34 -0700 Subject: [PATCH 053/395] Add a CP category examples and test/ci. (#14386) Guideline says we can CP `Bug fixes in demos/examples. No new features/experiments` and `Test/CI fixes` --- .github/scripts/cherry_pick.py | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/.github/scripts/cherry_pick.py b/.github/scripts/cherry_pick.py index 1239ee030dd..8de5279f51b 100755 --- a/.github/scripts/cherry_pick.py +++ b/.github/scripts/cherry_pick.py @@ -39,7 +39,15 @@ def parse_args() -> Any: ) parser.add_argument( "--classification", - choices=["regression", "critical", "fixnewfeature", "docs", "release"], + choices=[ + "regression", + "critical", + "fixnewfeature", + "docs", + "release", + "examples", + "testci", + ], required=True, help="the cherry pick category", ) From 088836c53b7217314dd25d84958b54a8333cc6b5 Mon Sep 17 00:00:00 2001 From: tmsl Date: Fri, 19 Sep 2025 17:00:17 -0700 Subject: [PATCH 054/395] getMethodMetadata should contain used backend name (#14397) ### Summary This change added the metadata for used backend name for API getMethodMetadata() ### Test plan Run E2E test locally --------- Co-authored-by: Haiting Pu --- .../java/org/pytorch/executorch/ModuleE2ETest.kt | 3 ++- .../src/main/java/org/pytorch/executorch/Module.java | 8 +++++++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/ModuleE2ETest.kt b/extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/ModuleE2ETest.kt index e269f4aa38f..45476dac43f 100644 --- a/extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/ModuleE2ETest.kt +++ b/extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/ModuleE2ETest.kt @@ -10,7 +10,6 @@ package org.pytorch.executorch import android.Manifest import android.graphics.Bitmap import android.graphics.BitmapFactory -import androidx.test.InstrumentationRegistry import androidx.test.ext.junit.runners.AndroidJUnit4 import androidx.test.rule.GrantPermissionRule import java.io.File @@ -18,6 +17,7 @@ import java.io.IOException import java.net.URISyntaxException import org.apache.commons.io.FileUtils import org.junit.Assert +import org.junit.Assert.assertArrayEquals import org.junit.Rule import org.junit.Test import org.junit.runner.RunWith @@ -70,6 +70,7 @@ class ModuleE2ETest { val module = Module.load(getTestFilePath("/mv3_xnnpack_fp32.pte")) val expectedBackends = arrayOf("XnnpackBackend") + assertArrayEquals(expectedBackends, module.getMethodMetadata("forward").backends) } @Test diff --git a/extension/android/executorch_android/src/main/java/org/pytorch/executorch/Module.java b/extension/android/executorch_android/src/main/java/org/pytorch/executorch/Module.java index 5a546eb18bc..3ad02f50d13 100644 --- a/extension/android/executorch_android/src/main/java/org/pytorch/executorch/Module.java +++ b/extension/android/executorch_android/src/main/java/org/pytorch/executorch/Module.java @@ -203,7 +203,13 @@ public MethodMetadata getMethodMetadata(String name) { if (!mMethodMetadata.containsKey(name)) { throw new RuntimeException("method " + name + "does not exist for this module"); } - return mMethodMetadata.get(name); + + MethodMetadata methodMetadata =mMethodMetadata.get(name); + if (methodMetadata != null) { + methodMetadata.setBackends(getUsedBackends(name)); + + } + return methodMetadata; } /** Retrieve the in-memory log buffer, containing the most recent ExecuTorch log entries. */ From 9e7a264b9b4dbf6900fceb556437e8d01641a846 Mon Sep 17 00:00:00 2001 From: Mengwei Liu Date: Fri, 19 Sep 2025 17:51:34 -0700 Subject: [PATCH 055/395] [multimodal] Add token support to MultimodalInput (#14451) This pull request adds support for tokenizer-encoded input (as vectors of token IDs) to the `MultimodalInput` class, enabling more flexible and efficient handling of multimodal data. The update includes new constructors, type checks, getters, and factory functions for token inputs, as well as unit tests to ensure correct behavior and compatibility with existing code paths. **MultimodalInput class changes:** * Added a new `TOKENS` type to the `MultimodalInput::Type` enum and updated the internal `std::variant` to support storing `std::vector` as token data. [[1]](diffhunk://#diff-db31b7448019ab4684675434f5b6e8054ff5d995ffa18e7adee15b5a694a7fb1R34-R73) [[2]](diffhunk://#diff-db31b7448019ab4684675434f5b6e8054ff5d995ffa18e7adee15b5a694a7fb1L290-R367) * Implemented new constructors, type checks (`is_tokens()`), getters (`get_tokens()`), and safe accessors (`try_get_tokens()`) for token inputs, along with static and instance methods for type name conversion. [[1]](diffhunk://#diff-db31b7448019ab4684675434f5b6e8054ff5d995ffa18e7adee15b5a694a7fb1R34-R73) [[2]](diffhunk://#diff-db31b7448019ab4684675434f5b6e8054ff5d995ffa18e7adee15b5a694a7fb1R101-R107) [[3]](diffhunk://#diff-db31b7448019ab4684675434f5b6e8054ff5d995ffa18e7adee15b5a694a7fb1R151-R159) [[4]](diffhunk://#diff-db31b7448019ab4684675434f5b6e8054ff5d995ffa18e7adee15b5a694a7fb1R187-R201) [[5]](diffhunk://#diff-db31b7448019ab4684675434f5b6e8054ff5d995ffa18e7adee15b5a694a7fb1R319-R328) * Added factory functions `make_token_input` for easily creating token-based inputs. **Integration and logging:** * Updated `MultimodalPrefiller::prefill` to handle both text and token inputs, bypassing tokenization when tokens are provided directly. * Added logging in `MultimodalRunner::generate` to include the type name of each input for easier debugging. **Tests:** * Introduced a comprehensive suite of unit tests covering construction, type checking, getters, copy/move semantics, and edge cases for the new token input functionality in `MultimodalInput`. --- extension/llm/runner/multimodal_input.h | 89 +++++- extension/llm/runner/multimodal_prefiller.cpp | 12 +- extension/llm/runner/multimodal_runner.cpp | 6 + .../llm/runner/test/test_multimodal_input.cpp | 255 ++++++++++++++++++ 4 files changed, 357 insertions(+), 5 deletions(-) diff --git a/extension/llm/runner/multimodal_input.h b/extension/llm/runner/multimodal_input.h index 728d8aef08f..737821f51e9 100644 --- a/extension/llm/runner/multimodal_input.h +++ b/extension/llm/runner/multimodal_input.h @@ -14,8 +14,10 @@ #include #include #include +#include #include #include +#include namespace executorch::extension::llm { @@ -29,15 +31,46 @@ class ET_EXPERIMENTAL MultimodalInput { /// Type of multimodal input data enum class Type { TEXT, ///< Text string input + TOKENS, ///< Pre-tokenized input (vector of token IDs) IMAGE, ///< Processed image input AUDIO, ///< Processed audio input RAW_AUDIO, ///< Raw unprocessed audio input (straight from audio file) UNSUPPORTED ///< Unsupported input type }; + /** + * Return a human-readable name for a MultimodalInput::Type. + * Preferred for logging and debugging; returns string literals. + */ + static constexpr const char* TypeName(Type t) noexcept { + switch (t) { + case Type::TEXT: + return "text"; + case Type::TOKENS: + return "tokens"; + case Type::IMAGE: + return "image"; + case Type::AUDIO: + return "audio"; + case Type::RAW_AUDIO: + return "raw_audio"; + default: + return "unknown"; + } + } + + /** Convenience wrapper that returns a std::string. */ + static inline std::string TypeToString(Type t) { + return TypeName(t); + } + // Constructors explicit MultimodalInput(const std::string& text) : data_(text) {} explicit MultimodalInput(std::string&& text) : data_(std::move(text)) {} + explicit MultimodalInput(const std::vector& tokens) + : data_(tokens) {} + explicit MultimodalInput(std::vector&& tokens) + : data_(std::move(tokens)) {} explicit MultimodalInput(const Image& image) : data_(image) {} explicit MultimodalInput(Image&& image) : data_(std::move(image)) {} explicit MultimodalInput(const Audio& audio) : data_(audio) {} @@ -65,6 +98,13 @@ class ET_EXPERIMENTAL MultimodalInput { return std::holds_alternative(data_); } + /** + * Check if this input contains pre-tokenized data. + */ + bool is_tokens() const noexcept { + return std::holds_alternative>(data_); + } + /** * Check if this input contains image data. * @return true if this input contains an image, false otherwise. @@ -97,6 +137,8 @@ class ET_EXPERIMENTAL MultimodalInput { Type get_type() const noexcept { if (is_text()) return Type::TEXT; + if (is_tokens()) + return Type::TOKENS; if (is_image()) return Type::IMAGE; if (is_audio()) @@ -106,6 +148,15 @@ class ET_EXPERIMENTAL MultimodalInput { return Type::UNSUPPORTED; } + /** + * Get a human-readable name for the contained input type. + * Returns one of: "text", "tokens", "image", "audio", "raw_audio", or + * "unknown". + */ + const char* type_name() const noexcept { + return TypeName(get_type()); + } + /** * Get the text data from this input. * @return Reference to the stored text string. @@ -133,6 +184,21 @@ class ET_EXPERIMENTAL MultimodalInput { return std::get(std::move(data_)); } + /** + * Get the token vector from this input. + */ + const std::vector& get_tokens() const& { + return std::get>(data_); + } + + std::vector& get_tokens() & { + return std::get>(data_); + } + + std::vector&& get_tokens() && { + return std::get>(std::move(data_)); + } + /** * Get the image data from this input. * @return Reference to the stored Image object. @@ -250,6 +316,16 @@ class ET_EXPERIMENTAL MultimodalInput { return std::get_if(&data_); } + /** Try to get the tokens from this input safely. */ + const std::vector* try_get_tokens() const noexcept { + return std::get_if>(&data_); + } + + /** Try to get the tokens from this input safely (mutable). */ + std::vector* try_get_tokens() noexcept { + return std::get_if>(&data_); + } + /** * Try to get the audio data from this input safely. * @return Pointer to the Audio object if this input contains audio, @@ -287,7 +363,8 @@ class ET_EXPERIMENTAL MultimodalInput { } private: - std::variant data_; + std::variant, Image, Audio, RawAudio> + data_; }; // Convenience factory functions @@ -307,6 +384,16 @@ inline MultimodalInput make_image_input(Image&& image) noexcept { return MultimodalInput(std::move(image)); } +inline MultimodalInput make_token_input( + const std::vector& tokens) noexcept { + return MultimodalInput(tokens); +} + +inline MultimodalInput make_token_input( + std::vector&& tokens) noexcept { + return MultimodalInput(std::move(tokens)); +} + inline MultimodalInput make_audio_input(const Audio& audio) noexcept { return MultimodalInput(audio); } diff --git a/extension/llm/runner/multimodal_prefiller.cpp b/extension/llm/runner/multimodal_prefiller.cpp index 824fdf943a9..2c83df24f55 100644 --- a/extension/llm/runner/multimodal_prefiller.cpp +++ b/extension/llm/runner/multimodal_prefiller.cpp @@ -110,10 +110,14 @@ Result MultimodalPrefiller::prefill( auto audio_encoder_outputs = audio_encoder_result.get(); encoder_output = audio_encoder_outputs[0]; - } else if (input.is_text()) { - auto& text = input.get_text(); - std::vector tokens = - ET_UNWRAP_TOKENIZER(tokenizer_->encode(text)); + } else if (input.is_text() || input.is_tokens()) { + std::vector tokens; + if (input.is_text()) { + auto& text = input.get_text(); + tokens = ET_UNWRAP_TOKENIZER(tokenizer_->encode(text)); + } else { + tokens = input.get_tokens(); + } auto text_tensor = executorch::extension::from_blob( tokens.data(), diff --git a/extension/llm/runner/multimodal_runner.cpp b/extension/llm/runner/multimodal_runner.cpp index 6928a9b2827..a5de59cbe98 100644 --- a/extension/llm/runner/multimodal_runner.cpp +++ b/extension/llm/runner/multimodal_runner.cpp @@ -116,6 +116,12 @@ Error MultimodalRunner::generate( // Process multimodal inputs in order for (size_t i = 0; i < inputs.size(); ++i) { const MultimodalInput& input = inputs[i]; + ET_LOG( + Info, + "Prefilling input %zu/%zu, type: %s", + i, + inputs.size(), + input.type_name()); if (config.echo && i == inputs.size() - 1 && input.is_text()) { wrapped_callback(input.get_text()); } diff --git a/extension/llm/runner/test/test_multimodal_input.cpp b/extension/llm/runner/test/test_multimodal_input.cpp index 486515175e8..85d45d69173 100644 --- a/extension/llm/runner/test/test_multimodal_input.cpp +++ b/extension/llm/runner/test/test_multimodal_input.cpp @@ -14,6 +14,7 @@ using namespace ::testing; using executorch::extension::llm::Image; using executorch::extension::llm::make_image_input; using executorch::extension::llm::make_text_input; +using executorch::extension::llm::make_token_input; using executorch::extension::llm::MultimodalInput; class MultimodalInputTest : public Test { @@ -415,3 +416,257 @@ TEST_F(MultimodalInputTest, AssignmentBetweenTypes) { EXPECT_TRUE(input.is_text()); EXPECT_EQ(input.get_text(), text); } + +// Token-related tests +class MultimodalInputTokenTest : public Test { + protected: + std::vector createTestTokens() { + return {1, 2, 3, 4, 5}; + } +}; + +// Test token constructors +TEST_F(MultimodalInputTokenTest, TokenConstructorFromVector) { + std::vector tokens = createTestTokens(); + MultimodalInput input(tokens); + + EXPECT_TRUE(input.is_tokens()); + EXPECT_FALSE(input.is_text()); + EXPECT_FALSE(input.is_image()); + EXPECT_EQ(input.get_type(), MultimodalInput::Type::TOKENS); + EXPECT_EQ(input.get_tokens(), tokens); + EXPECT_EQ(input.get_tokens().size(), 5); +} + +TEST_F(MultimodalInputTokenTest, TokenConstructorFromRvalueVector) { + std::vector tokens = createTestTokens(); + std::vector original_tokens = tokens; + MultimodalInput input(std::move(tokens)); + + EXPECT_TRUE(input.is_tokens()); + EXPECT_FALSE(input.is_text()); + EXPECT_FALSE(input.is_image()); + EXPECT_EQ(input.get_type(), MultimodalInput::Type::TOKENS); + EXPECT_EQ(input.get_tokens(), original_tokens); + EXPECT_EQ(input.get_tokens().size(), 5); +} + +// Test token type checking +TEST_F(MultimodalInputTokenTest, TokenTypeChecking) { + std::vector tokens = createTestTokens(); + MultimodalInput input(tokens); + + EXPECT_TRUE(input.is_tokens()); + EXPECT_FALSE(input.is_text()); + EXPECT_FALSE(input.is_image()); + EXPECT_FALSE(input.is_audio()); + EXPECT_FALSE(input.is_raw_audio()); + EXPECT_EQ(input.get_type(), MultimodalInput::Type::TOKENS); + EXPECT_STREQ(input.type_name(), "tokens"); +} + +// Test token getters +TEST_F(MultimodalInputTokenTest, GetTokensWithTokenInput) { + std::vector tokens = createTestTokens(); + MultimodalInput input(tokens); + + // Test const lvalue reference version + const MultimodalInput& const_input = input; + EXPECT_EQ(const_input.get_tokens(), tokens); + EXPECT_EQ(const_input.get_tokens().size(), 5); + + // Test mutable lvalue reference version + std::vector& mutable_tokens = input.get_tokens(); + mutable_tokens.push_back(6); + EXPECT_EQ(input.get_tokens().size(), 6); + EXPECT_EQ(input.get_tokens().back(), 6); + + // Test rvalue reference version + std::vector moved_tokens = std::move(input).get_tokens(); + EXPECT_EQ(moved_tokens.size(), 6); + EXPECT_EQ(moved_tokens.back(), 6); +} + +// Test token getters with wrong types (should throw) +TEST_F(MultimodalInputTokenTest, GetTokensWithTextInputThrows) { + std::string text = "Hello"; + MultimodalInput input(text); + + EXPECT_THROW(input.get_tokens(), std::bad_variant_access); + EXPECT_THROW(std::move(input).get_tokens(), std::bad_variant_access); +} + +TEST_F(MultimodalInputTokenTest, GetTextWithTokenInputThrows) { + std::vector tokens = createTestTokens(); + MultimodalInput input(tokens); + + EXPECT_THROW(input.get_text(), std::bad_variant_access); + EXPECT_THROW(std::move(input).get_text(), std::bad_variant_access); +} + +// Test safe token getters (try_get_*) +TEST_F(MultimodalInputTokenTest, TryGetTokensWithTokenInput) { + std::vector tokens = createTestTokens(); + MultimodalInput input(tokens); + + // Test const version + const MultimodalInput& const_input = input; + const std::vector* tokens_ptr = const_input.try_get_tokens(); + ASSERT_NE(tokens_ptr, nullptr); + EXPECT_EQ(*tokens_ptr, tokens); + + // Test mutable version + std::vector* mutable_tokens_ptr = input.try_get_tokens(); + ASSERT_NE(mutable_tokens_ptr, nullptr); + EXPECT_EQ(*mutable_tokens_ptr, tokens); + + // Modify through pointer + mutable_tokens_ptr->push_back(100); + EXPECT_EQ(input.get_tokens().size(), 6); + EXPECT_EQ(input.get_tokens().back(), 100); +} + +TEST_F(MultimodalInputTokenTest, TryGetTokensWithTextInput) { + std::string text = "Hello"; + MultimodalInput input(text); + + // Should return nullptr for wrong type + EXPECT_EQ(input.try_get_tokens(), nullptr); + + const MultimodalInput& const_input = input; + EXPECT_EQ(const_input.try_get_tokens(), nullptr); +} + +// Test token convenience factory functions +TEST_F(MultimodalInputTokenTest, MakeTokenInputFromVector) { + std::vector tokens = createTestTokens(); + MultimodalInput input = make_token_input(tokens); + + EXPECT_TRUE(input.is_tokens()); + EXPECT_EQ(input.get_tokens(), tokens); + EXPECT_EQ(input.get_tokens().size(), 5); +} + +TEST_F(MultimodalInputTokenTest, MakeTokenInputFromRvalueVector) { + std::vector tokens = createTestTokens(); + std::vector original_tokens = tokens; + MultimodalInput input = make_token_input(std::move(tokens)); + + EXPECT_TRUE(input.is_tokens()); + EXPECT_EQ(input.get_tokens(), original_tokens); + EXPECT_EQ(input.get_tokens().size(), 5); +} + +// Test token copy semantics +TEST_F(MultimodalInputTokenTest, TokenCopyConstructor) { + std::vector tokens = createTestTokens(); + MultimodalInput original(tokens); + MultimodalInput copy(original); + + EXPECT_TRUE(copy.is_tokens()); + EXPECT_EQ(copy.get_tokens(), tokens); + EXPECT_EQ(original.get_tokens(), tokens); // Original should be unchanged + + // Modify copy, original should be unaffected + copy.get_tokens().push_back(999); + EXPECT_EQ(copy.get_tokens().size(), 6); + EXPECT_EQ(original.get_tokens().size(), 5); +} + +TEST_F(MultimodalInputTokenTest, TokenCopyAssignment) { + std::vector tokens = createTestTokens(); + MultimodalInput original(tokens); + MultimodalInput copy("initial text"); // Start with different type + + copy = original; + + EXPECT_TRUE(copy.is_tokens()); + EXPECT_EQ(copy.get_tokens(), tokens); + EXPECT_EQ(original.get_tokens(), tokens); // Original should be unchanged +} + +// Test token move semantics +TEST_F(MultimodalInputTokenTest, TokenMoveConstructor) { + std::vector tokens = createTestTokens(); + std::vector original_tokens = tokens; + MultimodalInput original(std::move(tokens)); + MultimodalInput moved(std::move(original)); + + EXPECT_TRUE(moved.is_tokens()); + EXPECT_EQ(moved.get_tokens(), original_tokens); +} + +TEST_F(MultimodalInputTokenTest, TokenMoveAssignment) { + std::vector tokens = createTestTokens(); + std::vector original_tokens = tokens; + MultimodalInput original(std::move(tokens)); + MultimodalInput moved("initial text"); // Start with different type + + moved = std::move(original); + + EXPECT_TRUE(moved.is_tokens()); + EXPECT_EQ(moved.get_tokens(), original_tokens); +} + +// Test TypeName and TypeToString static methods for TOKENS +TEST_F(MultimodalInputTokenTest, TypeNameAndToString) { + EXPECT_STREQ( + MultimodalInput::TypeName(MultimodalInput::Type::TOKENS), "tokens"); + EXPECT_EQ( + MultimodalInput::TypeToString(MultimodalInput::Type::TOKENS), "tokens"); + + std::vector tokens = createTestTokens(); + MultimodalInput input(tokens); + EXPECT_STREQ(input.type_name(), "tokens"); +} + +// Test assignment between token and other types +TEST_F(MultimodalInputTokenTest, AssignmentBetweenTokensAndOtherTypes) { + std::vector tokens = createTestTokens(); + std::string text = "Hello"; + + MultimodalInput input(tokens); + EXPECT_TRUE(input.is_tokens()); + + // Assign text to token input + input = MultimodalInput(text); + EXPECT_TRUE(input.is_text()); + EXPECT_EQ(input.get_text(), text); + + // Assign tokens back to text input + input = MultimodalInput(tokens); + EXPECT_TRUE(input.is_tokens()); + EXPECT_EQ(input.get_tokens(), tokens); +} + +// Test token values with specific patterns +TEST_F(MultimodalInputTokenTest, SpecificTokenValues) { + std::vector tokens = { + 0, 1, 2, 65535, 4294967295ULL, 18446744073709551615ULL}; + MultimodalInput input(tokens); + + EXPECT_TRUE(input.is_tokens()); + EXPECT_EQ(input.get_tokens().size(), 6); + EXPECT_EQ(input.get_tokens()[0], 0); + EXPECT_EQ(input.get_tokens()[1], 1); + EXPECT_EQ(input.get_tokens()[2], 2); + EXPECT_EQ(input.get_tokens()[3], 65535); + EXPECT_EQ(input.get_tokens()[4], 4294967295ULL); + EXPECT_EQ(input.get_tokens()[5], 18446744073709551615ULL); // Max uint64_t +} + +// Test token modification through reference +TEST_F(MultimodalInputTokenTest, TokenModificationThroughReference) { + std::vector tokens = createTestTokens(); + MultimodalInput input(tokens); + + // Get mutable reference and modify + std::vector& token_ref = input.get_tokens(); + token_ref[0] = 999; + token_ref.push_back(1000); + + // Verify changes + EXPECT_EQ(input.get_tokens()[0], 999); + EXPECT_EQ(input.get_tokens().size(), 6); + EXPECT_EQ(input.get_tokens().back(), 1000); +} From 18498bf9c9527380f729ef3ede04a4a6130cb384 Mon Sep 17 00:00:00 2001 From: Rohan Joshi Date: Fri, 19 Sep 2025 18:04:46 -0700 Subject: [PATCH 056/395] Fix eval_llama_qnn (#14439) Reviewed By: cccclai Differential Revision: D82790290 --- .../oss_scripts/llama/decoder_utils.py | 4 +- .../oss_scripts/llama/eval_llama_qnn.py | 37 +++++++++---------- 2 files changed, 20 insertions(+), 21 deletions(-) diff --git a/examples/qualcomm/oss_scripts/llama/decoder_utils.py b/examples/qualcomm/oss_scripts/llama/decoder_utils.py index 76cf85c6e9c..ab13912f5b3 100644 --- a/examples/qualcomm/oss_scripts/llama/decoder_utils.py +++ b/examples/qualcomm/oss_scripts/llama/decoder_utils.py @@ -494,8 +494,8 @@ def prefill_inference( if collect_logits: result_logits = logits[:, :pos] pos += 1 - - logging.info(f"prefill inference result:\n{tokenizer.decode(token_list)}") + if isinstance(prompt, str): + logging.info(f"prefill inference result:\n{tokenizer.decode(token_list)}") return result_logits diff --git a/examples/qualcomm/oss_scripts/llama/eval_llama_qnn.py b/examples/qualcomm/oss_scripts/llama/eval_llama_qnn.py index 5fa0cd3fedf..9af9cdf9549 100644 --- a/examples/qualcomm/oss_scripts/llama/eval_llama_qnn.py +++ b/examples/qualcomm/oss_scripts/llama/eval_llama_qnn.py @@ -108,7 +108,7 @@ def prepare_tokenizer(args): args.tokenizer_bin is not None ), "Please provide tokenizer_bin for stories." runtime_tokenizer_path = args.tokenizer_bin - elif args.decoder_model == "llama3_2": + elif "llama3_2" in args.decoder_model: tokenizer = get_tokenizer(args.tokenizer_model) assert isinstance( tokenizer, TiktokenTokenizer @@ -240,7 +240,7 @@ def prequant_algorithm(model, prefill_config, args): if args.range_setting == "mse_with_act_loss": wrapped_model = WrappedLlamaModel( - model, atten_mask, args.use_kv_cache, args.max_seq_length, args.device + model, *atten_mask, args.use_kv_cache, args.max_seq_length, args.device ) act_bits, weight_bits = { "8a8w": (8, 8), @@ -355,20 +355,20 @@ def eval_llm(args): logging.info("Quantizing the model...") model = convert_pt2e(model) - logging.info("Quantization complete! Here is some sample generated text:") - - graph_module_inference( - use_kv_cache=False, - get_example_inputs=lambda use_kv_cache=False: inputs, - module=model, - tokenizer=tokenizer, - ar_len=args.max_seq_len, - max_seq_len=args.max_seq_len, - kv_updater=args.kv_updater, - prompt="Can you tell me about Facebook?", - use_i64_token=use_i64_token, - event_name="convert_pt2e_prompt", - ) + # logging.info("Quantization complete! Here is some sample generated text:") + + # graph_module_inference( + # use_kv_cache=False, + # get_example_inputs=lambda use_kv_cache=False: inputs, + # module=model, + # tokenizer=tokenizer, + # ar_len=args.max_seq_len, + # max_seq_len=args.max_seq_len, + # kv_updater=args.kv_updater, + # prompt="Can you tell me about Facebook?", + # use_i64_token=use_i64_token, + # event_name="convert_pt2e_prompt", + # ) logging.info("Evaluation of QDQ model:") graph_module_inference( @@ -380,6 +380,7 @@ def eval_llm(args): max_seq_len=args.max_seq_len, kv_updater=args.kv_updater, tasks=["wikitext"], + tasks_limit=0.1, use_i64_token=use_i64_token, event_name="convert_pt2e_prompt", ) @@ -424,9 +425,7 @@ def main() -> None: ) parser.add_argument( "--decoder_model", - choices=["stories260k", "stories110m", "llama3_2"] - + list(SUPPORTED_LLM_MODELS.keys()), - help=f"The Llama model to export. Current available options are: [stories260k, stories110m, llama3_2] + {SUPPORTED_LLM_MODELS.keys()}", + help=f"The Llama model to export. Current available options are: {SUPPORTED_LLM_MODELS.keys()}", required=True, ) parser.add_argument( From 46d7591b9410684e8222279c9c73d5393d8ae4f8 Mon Sep 17 00:00:00 2001 From: mcremon-meta <134334895+mcremon-meta@users.noreply.github.com> Date: Fri, 19 Sep 2025 18:14:27 -0700 Subject: [PATCH 057/395] Introduce strongly typed quant/dequant ops Differential Revision: D82183474 Pull Request resolved: https://github.com/pytorch/executorch/pull/14268 --- backends/cadence/aot/TARGETS | 1 - backends/cadence/aot/functions.yaml | 48 ++++++ backends/cadence/aot/functions_hifi.yaml | 49 ++++++ backends/cadence/aot/ops_registrations.py | 148 ++++++++++++++++++ backends/cadence/aot/type_dispatch.py | 24 +++ .../operators/dequantize_per_tensor.cpp | 67 +++++++- .../generic/operators/quantize_per_tensor.cpp | 85 ++++++++-- .../cadence/generic/operators/targets.bzl | 1 + .../operators/op_dequantize_per_tensor.cpp | 45 ++++++ .../op_dequantize_per_tensor_asym8s.cpp | 40 +++++ .../hifi/operators/op_quantize_per_tensor.cpp | 63 ++++++-- .../op_quantize_per_tensor_asym8s.cpp | 44 ++++++ backends/cadence/hifi/operators/targets.bzl | 2 + 13 files changed, 594 insertions(+), 23 deletions(-) create mode 100644 backends/cadence/hifi/operators/op_dequantize_per_tensor_asym8s.cpp create mode 100644 backends/cadence/hifi/operators/op_quantize_per_tensor_asym8s.cpp diff --git a/backends/cadence/aot/TARGETS b/backends/cadence/aot/TARGETS index 9b2bd087d8e..94ab6de0e29 100644 --- a/backends/cadence/aot/TARGETS +++ b/backends/cadence/aot/TARGETS @@ -144,7 +144,6 @@ executorch_generated_lib( visibility = ["PUBLIC"], deps = [ "//executorch/backends/cadence/generic/kernels:cadence_kernels", - # Individual operator targets instead of combined cadence_generic_ops "//executorch/backends/cadence/generic/operators:op_requantize_out", "//executorch/backends/cadence/generic/operators:im2row_out", "//executorch/backends/cadence/generic/operators:dequantize_per_tensor", diff --git a/backends/cadence/aot/functions.yaml b/backends/cadence/aot/functions.yaml index 95c35055e9c..2e9e187168f 100644 --- a/backends/cadence/aot/functions.yaml +++ b/backends/cadence/aot/functions.yaml @@ -184,12 +184,60 @@ - arg_meta: null kernel_name: impl::generic::quantize_per_tensor_out +- func: cadence::quantize_per_tensor_asym8s.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::generic::quantize_per_tensor_asym8s_out + +- func: cadence::quantize_per_tensor_asym8u.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::generic::quantize_per_tensor_asym8u_out + +- func: cadence::quantize_per_tensor_asym16s.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::generic::quantize_per_tensor_asym16s_out + +- func: cadence::quantize_per_tensor_asym16u.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::generic::quantize_per_tensor_asym16u_out + - func: cadence::dequantize_per_tensor.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) variants: function kernels: - arg_meta: null kernel_name: impl::generic::dequantize_per_tensor_out +- func: cadence::dequantize_per_tensor_asym8s.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::generic::dequantize_per_tensor_asym8s_out + +- func: cadence::dequantize_per_tensor_asym8u.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::generic::dequantize_per_tensor_asym8u_out + +- func: cadence::dequantize_per_tensor_asym16s.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::generic::dequantize_per_tensor_asym16s_out + +- func: cadence::dequantize_per_tensor_asym16u.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::generic::dequantize_per_tensor_asym16u_out + - func: cadence::quantized_conv2d_nchw.out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null diff --git a/backends/cadence/aot/functions_hifi.yaml b/backends/cadence/aot/functions_hifi.yaml index a0e84d94300..c48aac8686a 100644 --- a/backends/cadence/aot/functions_hifi.yaml +++ b/backends/cadence/aot/functions_hifi.yaml @@ -284,12 +284,61 @@ - arg_meta: null kernel_name: impl::HiFi::quantize_per_tensor_out +- func: cadence::quantize_per_tensor_asym8s.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::HiFi::quantize_per_tensor_asym8s_out + +- func: cadence::quantize_per_tensor_asym8u.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::HiFi::quantize_per_tensor_asym8u_out + +- func: cadence::quantize_per_tensor_asym16s.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::HiFi::quantize_per_tensor_asym16s_out + +- func: cadence::quantize_per_tensor_asym16u.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::HiFi::quantize_per_tensor_asym16s_out + + - func: cadence::dequantize_per_tensor.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) variants: function kernels: - arg_meta: null kernel_name: impl::HiFi::dequantize_per_tensor_out +- func: cadence::dequantize_per_tensor_asym8s.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::HiFi::dequantize_per_tensor_asym8s_out + +- func: cadence::dequantize_per_tensor_asym8u.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::HiFi::dequantize_per_tensor_asym8u_out + +- func: cadence::dequantize_per_tensor_asym16s.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::HiFi::dequantize_per_tensor_asym16s_out + +- func: cadence::dequantize_per_tensor_asym16u.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) + variants: function + kernels: + - arg_meta: null + kernel_name: impl::HiFi::dequantize_per_tensor_asym16u_out + - func: cadence::quantized_conv2d_nchw.out(Tensor input, Tensor weight, Tensor bias, int[] stride, SymInt[] padding, int[] dilation, int groups, int input_zero_point, Tensor weight_zero_point, Tensor bias_scale, float out_scale, int out_zero_point, Tensor out_multiplier, Tensor out_shift, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null diff --git a/backends/cadence/aot/ops_registrations.py b/backends/cadence/aot/ops_registrations.py index e483bea79d1..567d86af457 100644 --- a/backends/cadence/aot/ops_registrations.py +++ b/backends/cadence/aot/ops_registrations.py @@ -28,12 +28,64 @@ "quantize_per_tensor.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!)" ) +lib.define( + "quantize_per_tensor_asym8s(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype) -> (Tensor Z)" +) +lib.define( + "quantize_per_tensor_asym8s.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!)" +) + +lib.define( + "quantize_per_tensor_asym8u(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype) -> (Tensor Z)" +) +lib.define( + "quantize_per_tensor_asym8u.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!)" +) + +lib.define( + "quantize_per_tensor_asym16s(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype) -> (Tensor Z)" +) +lib.define( + "quantize_per_tensor_asym16s.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!)" +) + +lib.define( + "quantize_per_tensor_asym16u(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype) -> (Tensor Z)" +) +lib.define( + "quantize_per_tensor_asym16u.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!)" +) + lib.define( "dequantize_per_tensor(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype) -> (Tensor Z)" ) lib.define( "dequantize_per_tensor.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!)" ) +lib.define( + "dequantize_per_tensor_asym8s(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype) -> (Tensor Z)" +) +lib.define( + "dequantize_per_tensor_asym8s.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!)" +) +lib.define( + "dequantize_per_tensor_asym8u(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype) -> (Tensor Z)" +) +lib.define( + "dequantize_per_tensor_asym8u.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!)" +) +lib.define( + "dequantize_per_tensor_asym16s(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype) -> (Tensor Z)" +) +lib.define( + "dequantize_per_tensor_asym16s.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!)" +) +lib.define( + "dequantize_per_tensor_asym16u(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype) -> (Tensor Z)" +) +lib.define( + "dequantize_per_tensor_asym16u.out(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!)" +) lib.define( "quantized_layer_norm(Tensor X, Tensor X_scale, Tensor X_zero_point, int[] normalized_shape, Tensor weight, Tensor bias, float eps, float output_scale, int output_zero_point) -> (Tensor Y)" @@ -541,6 +593,54 @@ def quantize_per_tensor_meta( return input.new_empty(input.size(), dtype=dtype) +@register_fake("cadence::quantize_per_tensor_asym8s") +def quantize_per_tensor_asym8s_meta( + input: torch.Tensor, + scale: float, + zero_point: int, + quant_min: int, + quant_max: int, + dtype: torch.dtype, +) -> torch.Tensor: + return input.new_empty(input.size(), dtype=dtype) + + +@register_fake("cadence::quantize_per_tensor_asym8u") +def quantize_per_tensor_asym8u_meta( + input: torch.Tensor, + scale: float, + zero_point: int, + quant_min: int, + quant_max: int, + dtype: torch.dtype, +) -> torch.Tensor: + return input.new_empty(input.size(), dtype=dtype) + + +@register_fake("cadence::quantize_per_tensor_asym16s") +def quantize_per_tensor_asym16s_meta( + input: torch.Tensor, + scale: float, + zero_point: int, + quant_min: int, + quant_max: int, + dtype: torch.dtype, +) -> torch.Tensor: + return input.new_empty(input.size(), dtype=dtype) + + +@register_fake("cadence::quantize_per_tensor_asym16u") +def quantize_per_tensor_asym16u_meta( + input: torch.Tensor, + scale: float, + zero_point: int, + quant_min: int, + quant_max: int, + dtype: torch.dtype, +) -> torch.Tensor: + return input.new_empty(input.size(), dtype=dtype) + + @register_fake("cadence::dequantize_per_tensor") def dequantize_per_tensor_meta( input: torch.Tensor, @@ -553,6 +653,54 @@ def dequantize_per_tensor_meta( return input.new_empty(input.size(), dtype=torch.float) +@register_fake("cadence::dequantize_per_tensor_asym8s") +def dequantize_per_tensor_asym8s_meta( + input: torch.Tensor, + scale: float, + zero_point: int, + quant_min: int, + quant_max: int, + dtype: torch.dtype, +) -> torch.Tensor: + return input.new_empty(input.size(), dtype=torch.float) + + +@register_fake("cadence::dequantize_per_tensor_asym8u") +def dequantize_per_tensor_asym8u_meta( + input: torch.Tensor, + scale: float, + zero_point: int, + quant_min: int, + quant_max: int, + dtype: torch.dtype, +) -> torch.Tensor: + return input.new_empty(input.size(), dtype=torch.float) + + +@register_fake("cadence::dequantize_per_tensor_asym16s") +def dequantize_per_tensor_asym16s_meta( + input: torch.Tensor, + scale: float, + zero_point: int, + quant_min: int, + quant_max: int, + dtype: torch.dtype, +) -> torch.Tensor: + return input.new_empty(input.size(), dtype=torch.float) + + +@register_fake("cadence::dequantize_per_tensor_asym16u") +def dequantize_per_tensor_asym16u_meta( + input: torch.Tensor, + scale: float, + zero_point: int, + quant_min: int, + quant_max: int, + dtype: torch.dtype, +) -> torch.Tensor: + return input.new_empty(input.size(), dtype=torch.float) + + @register_fake("cadence::quantized_add") def quantized_add_meta( X: torch.Tensor, diff --git a/backends/cadence/aot/type_dispatch.py b/backends/cadence/aot/type_dispatch.py index 3bf86ad2e50..97a25938e8d 100644 --- a/backends/cadence/aot/type_dispatch.py +++ b/backends/cadence/aot/type_dispatch.py @@ -27,6 +27,7 @@ class OpConfig: base_name: str type_dispatch_suffixes: dict[tuple[torch.dtype, ...], str] weight_arg_idx: Optional[int] = None + is_quant_op: bool = False variant: str = "per_tensor" @@ -100,6 +101,27 @@ class CompileTimeTypeDispatchPass(ExportPass): }, variant="default", ), + exir_ops.edge.cadence.quantize_per_tensor.default: OpConfig( + "quantize_per_tensor", + type_dispatch_suffixes={ + (torch.int8,): "asym8s", + (torch.uint8,): "asym8u", + (torch.int16,): "asym16s", + (torch.uint16,): "asym16s", + }, + variant="default", + is_quant_op=True, + ), + exir_ops.edge.cadence.dequantize_per_tensor.default: OpConfig( + "dequantize_per_tensor", + type_dispatch_suffixes={ + (torch.int8,): "asym8s", + (torch.uint8,): "asym8u", + (torch.int16,): "asym16s", + (torch.uint16,): "asym16s", + }, + variant="default", + ), } def call_operator( @@ -120,6 +142,8 @@ def call_operator( if config.weight_arg_idx is not None: weight_dtype = args[config.weight_arg_idx].to_tensor().dtype dtype_key = (input_dtype, weight_dtype) + elif config.is_quant_op: + dtype_key = (args[5],) else: dtype_key = (input_dtype,) diff --git a/backends/cadence/generic/operators/dequantize_per_tensor.cpp b/backends/cadence/generic/operators/dequantize_per_tensor.cpp index 1481981ee0b..aedc6e10309 100644 --- a/backends/cadence/generic/operators/dequantize_per_tensor.cpp +++ b/backends/cadence/generic/operators/dequantize_per_tensor.cpp @@ -18,7 +18,7 @@ using ::executorch::aten::Tensor; using ::executorch::runtime::KernelRuntimeContext; using ::impl::generic::kernels::dequantize; -void dequantize_per_tensor_out( +Tensor& dequantize_per_tensor_out( KernelRuntimeContext& context, const Tensor& input, double scale, @@ -50,6 +50,71 @@ void dequantize_per_tensor_out( "Unhandled input dtype %hhd", static_cast(input.scalar_type())); } + return out; +} + +Tensor& dequantize_per_tensor_asym8s_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + float* out_data = out.mutable_data_ptr(); + size_t numel = out.numel(); + const int8_t* input_data = input.const_data_ptr(); + dequantize(out_data, input_data, scale, zero_point, numel); + return out; +} + +Tensor& dequantize_per_tensor_asym8u_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + float* out_data = out.mutable_data_ptr(); + size_t numel = out.numel(); + const uint8_t* input_data = input.const_data_ptr(); + dequantize(out_data, input_data, scale, zero_point, numel); + return out; +} + +Tensor& dequantize_per_tensor_asym16s_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + float* out_data = out.mutable_data_ptr(); + size_t numel = out.numel(); + const int16_t* input_data = input.const_data_ptr(); + dequantize(out_data, input_data, scale, zero_point, numel); + return out; +} + +Tensor& dequantize_per_tensor_asym16u_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + float* out_data = out.mutable_data_ptr(); + size_t numel = out.numel(); + const uint16_t* input_data = input.const_data_ptr(); + dequantize(out_data, input_data, scale, zero_point, numel); + return out; } } // namespace native diff --git a/backends/cadence/generic/operators/quantize_per_tensor.cpp b/backends/cadence/generic/operators/quantize_per_tensor.cpp index 29b233dab09..f2a413be35d 100644 --- a/backends/cadence/generic/operators/quantize_per_tensor.cpp +++ b/backends/cadence/generic/operators/quantize_per_tensor.cpp @@ -20,7 +20,7 @@ using ::impl::generic::kernels::quantize; // Quantize the input tensor (PT2 version). Note that quant_ are not // used in any computation. -void quantize_per_tensor_out( +Tensor& quantize_per_tensor_out( KernelRuntimeContext& context, const Tensor& input, double scale, @@ -34,30 +34,91 @@ void quantize_per_tensor_out( if (out.scalar_type() == ScalarType::Byte) { uint8_t* out_data = out.mutable_data_ptr(); - impl::generic::kernels::quantize( - out_data, input_data, 1. / scale, zero_point, numel); + quantize(out_data, input_data, 1. / scale, zero_point, numel); } else if (out.scalar_type() == ScalarType::Char) { int8_t* out_data = out.mutable_data_ptr(); - impl::generic::kernels::quantize( - out_data, input_data, 1. / scale, zero_point, numel); + quantize(out_data, input_data, 1. / scale, zero_point, numel); } else if ( out.scalar_type() == ScalarType::Bits16 || out.scalar_type() == ScalarType::UInt16) { uint16_t* out_data = out.mutable_data_ptr(); - impl::generic::kernels::quantize( - out_data, input_data, 1. / scale, zero_point, numel); + quantize(out_data, input_data, 1. / scale, zero_point, numel); } else if (out.scalar_type() == ScalarType::Short) { int16_t* out_data = out.mutable_data_ptr(); - impl::generic::kernels::quantize( - out_data, input_data, 1. / scale, zero_point, numel); + quantize(out_data, input_data, 1. / scale, zero_point, numel); } else { ET_CHECK_MSG( false, "Unhandled input dtype %hhd", static_cast(out.scalar_type())); } + return out; } -} // namespace native -} // namespace generic -} // namespace impl +Tensor& quantize_per_tensor_asym8s_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + const float* input_data = input.const_data_ptr(); + size_t numel = out.numel(); + int8_t* out_data = out.mutable_data_ptr(); + quantize(out_data, input_data, 1. / scale, zero_point, numel); + return out; +} + +Tensor& quantize_per_tensor_asym8u_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + const float* input_data = input.const_data_ptr(); + size_t numel = out.numel(); + uint8_t* out_data = out.mutable_data_ptr(); + quantize(out_data, input_data, 1. / scale, zero_point, numel); + return out; +} + +Tensor& quantize_per_tensor_asym16s_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + const float* input_data = input.const_data_ptr(); + size_t numel = out.numel(); + int16_t* out_data = out.mutable_data_ptr(); + quantize(out_data, input_data, 1. / scale, zero_point, numel); + return out; +} + +Tensor& quantize_per_tensor_asym16u_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + const float* input_data = input.const_data_ptr(); + size_t numel = out.numel(); + uint16_t* out_data = out.mutable_data_ptr(); + quantize(out_data, input_data, 1. / scale, zero_point, numel); + return out; +} + +}; // namespace native +}; // namespace generic +}; // namespace impl diff --git a/backends/cadence/generic/operators/targets.bzl b/backends/cadence/generic/operators/targets.bzl index 193b43c2b6d..fa0f128b229 100644 --- a/backends/cadence/generic/operators/targets.bzl +++ b/backends/cadence/generic/operators/targets.bzl @@ -44,6 +44,7 @@ def define_common_targets(): ], visibility = [ "//executorch/backends/cadence/...", + "@EXECUTORCH_CLIENTS", ], ) diff --git a/backends/cadence/hifi/operators/op_dequantize_per_tensor.cpp b/backends/cadence/hifi/operators/op_dequantize_per_tensor.cpp index f416082b10f..317e7ed8ef9 100644 --- a/backends/cadence/hifi/operators/op_dequantize_per_tensor.cpp +++ b/backends/cadence/hifi/operators/op_dequantize_per_tensor.cpp @@ -53,6 +53,51 @@ void dequantize_per_tensor_out( } } +void dequantize_per_tensor_asym8u_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + float* out_data = out.mutable_data_ptr(); + size_t numel = out.numel(); + const uint8_t* input_data = input.const_data_ptr(); + dequantize(out_data, input_data, scale, zero_point, numel); +} + +void dequantize_per_tensor_asym16s_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + float* out_data = out.mutable_data_ptr(); + size_t numel = out.numel(); + const int16_t* input_data = input.const_data_ptr(); + dequantize(out_data, input_data, scale, zero_point, numel); +} + +void dequantize_per_tensor_asym16u_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + float* out_data = out.mutable_data_ptr(); + size_t numel = out.numel(); + const uint16_t* input_data = input.const_data_ptr(); + dequantize(out_data, input_data, scale, zero_point, numel); +} + } // namespace native } // namespace HiFi } // namespace impl diff --git a/backends/cadence/hifi/operators/op_dequantize_per_tensor_asym8s.cpp b/backends/cadence/hifi/operators/op_dequantize_per_tensor_asym8s.cpp new file mode 100644 index 00000000000..d1099b1a4db --- /dev/null +++ b/backends/cadence/hifi/operators/op_dequantize_per_tensor_asym8s.cpp @@ -0,0 +1,40 @@ +/* + * Copyright (c) Meta Platforms, Inc. and affiliates. + * All rights reserved. + * + * This source code is licensed under the BSD-style license found in the + * LICENSE file in the root directory of this source tree. + */ + +#include + +#include +#include + +namespace impl { +namespace HiFi { +namespace native { + +using ::executorch::aten::ScalarType; +using ::executorch::aten::Tensor; +using ::executorch::runtime::KernelRuntimeContext; + +void dequantize_per_tensor_asym8s_out( + KernelRuntimeContext& ctx, + const Tensor& input, + double scale, + int64_t zero_point, + __ET_UNUSED int64_t quant_min, + __ET_UNUSED int64_t quant_max, + ScalarType dtype, + Tensor& out) { + float* out_data = out.mutable_data_ptr(); + const size_t numel = out.numel(); + const int8_t* input_data = input.const_data_ptr(); + xa_nn_elm_dequantize_asym8s_f32( + out_data, input_data, zero_point, scale, numel); +} + +}; // namespace native +}; // namespace HiFi +}; // namespace impl diff --git a/backends/cadence/hifi/operators/op_quantize_per_tensor.cpp b/backends/cadence/hifi/operators/op_quantize_per_tensor.cpp index b2f47619f05..9bc3d48699e 100644 --- a/backends/cadence/hifi/operators/op_quantize_per_tensor.cpp +++ b/backends/cadence/hifi/operators/op_quantize_per_tensor.cpp @@ -19,10 +19,13 @@ namespace impl { namespace HiFi { namespace native { + namespace { + using ::executorch::aten::ScalarType; using ::executorch::aten::Tensor; using ::executorch::runtime::KernelRuntimeContext; +using ::impl::HiFi::kernels::quantize; // Add checks for dtype quant min/max bounds. template @@ -92,22 +95,19 @@ void quantize_per_tensor_out( const size_t numel = out.numel(); if (out.scalar_type() == ScalarType::Byte) { uint8_t* out_data = out.mutable_data_ptr(); - impl::HiFi::kernels::quantize( - out_data, input_data, 1. / scale, zero_point, numel); + quantize(out_data, input_data, 1. / scale, zero_point, numel); } else if (out.scalar_type() == ScalarType::Char) { int8_t* out_data = out.mutable_data_ptr(); xa_nn_elm_quantize_f32_asym8s( out_data, input_data, scale, zero_point, numel); } else if (out.scalar_type() == ScalarType::Short) { int16_t* out_data = out.mutable_data_ptr(); - impl::HiFi::kernels::quantize( - out_data, input_data, 1. / scale, zero_point, numel); + quantize(out_data, input_data, 1. / scale, zero_point, numel); } else if ( out.scalar_type() == ScalarType::Bits16 || out.scalar_type() == ScalarType::UInt16) { uint16_t* out_data = out.mutable_data_ptr(); - impl::HiFi::kernels::quantize( - out_data, input_data, 1. / scale, zero_point, numel); + quantize(out_data, input_data, 1. / scale, zero_point, numel); } else { ET_KERNEL_CHECK_MSG( ctx, @@ -119,6 +119,51 @@ void quantize_per_tensor_out( } } -} // namespace native -} // namespace HiFi -} // namespace impl +void quantize_per_tensor_asym8u_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + const float* input_data = input.const_data_ptr(); + size_t numel = out.numel(); + uint8_t* out_data = out.mutable_data_ptr(); + quantize(out_data, input_data, 1. / scale, zero_point, numel); +} + +void quantize_per_tensor_asym16s_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + const float* input_data = input.const_data_ptr(); + size_t numel = out.numel(); + int16_t* out_data = out.mutable_data_ptr(); + quantize(out_data, input_data, 1. / scale, zero_point, numel); +} + +void quantize_per_tensor_asym16u_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + const float* input_data = input.const_data_ptr(); + size_t numel = out.numel(); + uint16_t* out_data = out.mutable_data_ptr(); + quantize(out_data, input_data, 1. / scale, zero_point, numel); +} + +}; // namespace native +}; // namespace HiFi +}; // namespace impl diff --git a/backends/cadence/hifi/operators/op_quantize_per_tensor_asym8s.cpp b/backends/cadence/hifi/operators/op_quantize_per_tensor_asym8s.cpp new file mode 100644 index 00000000000..552b6acf150 --- /dev/null +++ b/backends/cadence/hifi/operators/op_quantize_per_tensor_asym8s.cpp @@ -0,0 +1,44 @@ +/* + * Copyright (c) Meta Platforms, Inc. and affiliates. + * All rights reserved. + * + * This source code is licensed under the BSD-style license found in the + * LICENSE file in the root directory of this source tree. + */ + +#include + +#include + +#include +#include +#include +#include +#include + +namespace impl { +namespace HiFi { +namespace native { + +using ::executorch::aten::ScalarType; +using ::executorch::aten::Tensor; +using ::executorch::runtime::KernelRuntimeContext; + +void quantize_per_tensor_asym8s_out( + KernelRuntimeContext& context, + const Tensor& input, + double scale, + int64_t zero_point, + int64_t quant_min, + int64_t quant_max, + ScalarType dtype, + Tensor& out) { + const float* input_data = input.const_data_ptr(); + size_t numel = out.numel(); + int8_t* out_data = out.mutable_data_ptr(); + xa_nn_elm_quantize_f32_asym8s(out_data, input_data, scale, zero_point, numel); +} + +} // namespace native +} // namespace HiFi +} // namespace impl diff --git a/backends/cadence/hifi/operators/targets.bzl b/backends/cadence/hifi/operators/targets.bzl index ca474e8183b..1f9814c4a4e 100644 --- a/backends/cadence/hifi/operators/targets.bzl +++ b/backends/cadence/hifi/operators/targets.bzl @@ -44,6 +44,7 @@ OPERATORS = [ "cat", "clamp", "dequantize_per_tensor", + "dequantize_per_tensor_asym8s", "div", "embedding", "eq", @@ -95,6 +96,7 @@ OPERATORS = [ "quantized_relu_asym8s_asym8s_per_tensor_out", "quantized_relu_asym8u_asym8u_per_tensor_out", "quantize_per_tensor", + "quantize_per_tensor_asym8s", "remainder", "rsqrt", "select_copy", From cf1c4bc65d61b0dbfed687ef8ba399e3668f5ec3 Mon Sep 17 00:00:00 2001 From: Gregory Comer Date: Fri, 19 Sep 2025 23:17:32 -0600 Subject: [PATCH 058/395] Use weight cache for quantized tensor scale data Differential Revision: D82862629 Pull Request resolved: https://github.com/pytorch/executorch/pull/14448 --- backends/xnnpack/runtime/XNNCompiler.cpp | 65 ++++++++++++------------ 1 file changed, 33 insertions(+), 32 deletions(-) diff --git a/backends/xnnpack/runtime/XNNCompiler.cpp b/backends/xnnpack/runtime/XNNCompiler.cpp index 78eaaf6d039..1ed7db80d84 100644 --- a/backends/xnnpack/runtime/XNNCompiler.cpp +++ b/backends/xnnpack/runtime/XNNCompiler.cpp @@ -174,13 +174,12 @@ payload (deprecated) or via offsets to the constant_data_ptr. If no constant data associated with the tensor value, then returns nullptr. */ const uint8_t* getConstantDataPtr( - const fb_xnnpack::XNNTensorValue* tensor_value, + uint32_t buffer_idx, GraphPtr flatbuffer_graph, const uint8_t* constant_data_ptr, const NamedDataMap* named_data_map, std::vector& freeable_buffers, XNNWeightsCache* weights_cache) { - auto buffer_idx = tensor_value->constant_buffer_idx(); if (buffer_idx) { if (!constant_data_ptr) { // TODO(T172265611): Remove constant_buffer in flatbuffer path after BC @@ -230,6 +229,22 @@ const uint8_t* getConstantDataPtr( return nullptr; } +const uint8_t* getConstantDataPtr( + const fb_xnnpack::XNNTensorValue* tensor_value, + GraphPtr flatbuffer_graph, + const uint8_t* constant_data_ptr, + const NamedDataMap* named_data_map, + std::vector& freeable_buffers, + XNNWeightsCache* weights_cache) { + return getConstantDataPtr( + tensor_value->constant_buffer_idx(), + flatbuffer_graph, + constant_data_ptr, + named_data_map, + freeable_buffers, + weights_cache); +} + /** Define serialized tensor value into the subgraph. While also keeping track of the remapped ids from @@ -434,22 +449,15 @@ Error defineTensor( const float* scale = qparams->scale()->data(); if (qparams->scale_buffer_idx() != 0) { - // if scales are stored in named data, then retrieve it - ConstantDataOffsetPtr scale_buffer_offset = - flatbuffer_graph->constant_data()->Get( - qparams->scale_buffer_idx()); - const std::string& data_name = - scale_buffer_offset->named_key()->str(); - Result scale_buffer = - named_data_map->get_data(data_name.c_str()); + scale = reinterpret_cast(getConstantDataPtr( + qparams->scale_buffer_idx(), + flatbuffer_graph, + constant_data_ptr, + named_data_map, + freeable_buffers, + weights_cache)); ET_CHECK_OR_RETURN_ERROR( - scale_buffer.ok(), - Internal, - "Failed to get constant data for key %s from named_data_map. Error code: %u", - data_name.c_str(), - static_cast(scale_buffer.error())); - scale = reinterpret_cast(scale_buffer.get().data()); - freeable_buffers.push_back(std::move(scale_buffer.get())); + scale != nullptr, Internal, "Failed to load scale data."); } status = xnn_define_channelwise_quantized_tensor_value_v2( /*subgraph=*/subgraph_ptr, @@ -483,22 +491,15 @@ Error defineTensor( // Block scales are preferably serialized as bf16 but can also be // serialized as fp32 for backwards compatability. if (qparams->scale_buffer_idx() != 0) { - ConstantDataOffsetPtr scale_buffer_offset = - flatbuffer_graph->constant_data()->Get( - qparams->scale_buffer_idx()); - const std::string& data_name = - scale_buffer_offset->named_key()->str(); - Result scale_buffer = - named_data_map->get_data(data_name.c_str()); + scale_data = reinterpret_cast(getConstantDataPtr( + qparams->scale_buffer_idx(), + flatbuffer_graph, + constant_data_ptr, + named_data_map, + freeable_buffers, + weights_cache)); ET_CHECK_OR_RETURN_ERROR( - scale_buffer.ok(), - Internal, - "Failed to get constant data for key %s from named_data_map. Error code: %u", - data_name.c_str(), - static_cast(scale_buffer.error())); - scale_data = - reinterpret_cast(scale_buffer.get().data()); - freeable_buffers.push_back(std::move(scale_buffer.get())); + scale_data != nullptr, Internal, "Failed to load scale data."); scale_numel = qparams->num_scales(); } else { // Read fp32 scales, convert to bf16. From ce8916fd6639a0a5ce6f698e2c2f9d174f44eda3 Mon Sep 17 00:00:00 2001 From: Hansong Zhang <107070759+kirklandsign@users.noreply.github.com> Date: Fri, 19 Sep 2025 22:24:55 -0700 Subject: [PATCH 059/395] Remove Android LlamaDemo Differential Revision: D82868456 Pull Request resolved: https://github.com/pytorch/executorch/pull/14450 --- .github/workflows/_android.yml | 15 - .github/workflows/lint.yml | 2 - README-wheel.md | 2 +- docs/source/index.md | 4 +- docs/source/llm/getting-started.md | 2 +- docs/source/llm/llama-demo-android.md | 2 - docs/source/using-executorch-android.md | 4 +- .../demo-apps/android/LlamaDemo/.gitignore | 12 - .../demo-apps/android/LlamaDemo/README.md | 174 ---- .../LlamaDemo/SDK-quick-setup-guide.md | 94 -- .../android/LlamaDemo/app/.gitignore | 1 - .../android/LlamaDemo/app/build.gradle.kts | 103 --- .../android/LlamaDemo/app/proguard-rules.pro | 21 - .../example/executorchllamademo/PerfTest.java | 92 -- .../app/src/main/AndroidManifest.xml | 85 -- .../android/LlamaDemo/app/src/main/BUCK | 67 -- .../example/executorchllamademo/AppLog.java | 49 - .../executorchllamademo/BackendType.java | 7 - .../DemoSharedPreferences.java | 90 -- .../example/executorchllamademo/ETImage.java | 126 --- .../executorchllamademo/ETLogging.java | 54 -- .../LlmBenchmarkRunner.java | 223 ----- .../executorchllamademo/LogsActivity.java | 92 -- .../executorchllamademo/LogsAdapter.java | 45 - .../executorchllamademo/MainActivity.java | 847 ------------------ .../example/executorchllamademo/Message.java | 94 -- .../executorchllamademo/MessageAdapter.java | 135 --- .../executorchllamademo/MessageType.java | 15 - .../executorchllamademo/ModelRunner.java | 109 --- .../ModelRunnerCallback.java | 24 - .../executorchllamademo/ModelType.java | 18 - .../executorchllamademo/ModelUtils.java | 47 - .../executorchllamademo/PromptFormat.java | 162 ---- .../executorchllamademo/SettingsActivity.java | 463 ---------- .../executorchllamademo/SettingsFields.java | 148 --- .../src/main/res/drawable/banner_shape.xml | 5 - .../src/main/res/drawable/baseline_add_24.xml | 5 - .../baseline_add_photo_alternate_24.xml | 5 - .../main/res/drawable/baseline_article_24.xml | 6 - .../main/res/drawable/baseline_close_24.xml | 6 - .../drawable/baseline_delete_forever_24.xml | 5 - .../res/drawable/baseline_lightbulb_24.xml | 5 - .../res/drawable/baseline_restart_alt_24.xml | 6 - .../main/res/drawable/baseline_send_24.xml | 6 - .../res/drawable/baseline_settings_24.xml | 11 - .../main/res/drawable/baseline_stop_24.xml | 6 - .../main/res/drawable/blue_lightbulb_24.xml | 5 - .../app/src/main/res/drawable/btn.xml | 8 - .../src/main/res/drawable/chat_background.xml | 21 - .../main/res/drawable/custom_button_round.xml | 7 - .../main/res/drawable/expand_circle_down.xml | 9 - .../res/drawable/ic_launcher_background.xml | 170 ---- .../res/drawable/ic_launcher_foreground.xml | 30 - .../main/res/drawable/input_text_shape.xml | 7 - .../app/src/main/res/drawable/logo.png | Bin 33036 -> 0 bytes .../main/res/drawable/outline_add_box_48.xml | 6 - .../res/drawable/outline_camera_alt_48.xml | 5 - .../main/res/drawable/outline_image_48.xml | 5 - .../src/main/res/drawable/prompt_shape.xml | 6 - .../main/res/drawable/received_message.xml | 6 - .../src/main/res/drawable/sent_message.xml | 6 - .../app/src/main/res/drawable/three_dots.xml | 5 - .../main/res/layout/activity_benchmarking.xml | 16 - .../app/src/main/res/layout/activity_logs.xml | 55 -- .../app/src/main/res/layout/activity_main.xml | 241 ----- .../src/main/res/layout/activity_settings.xml | 338 ------- .../app/src/main/res/layout/logs_message.xml | 16 - .../src/main/res/layout/received_message.xml | 70 -- .../app/src/main/res/layout/sent_message.xml | 63 -- .../src/main/res/layout/system_message.xml | 23 - .../res/mipmap-anydpi-v26/ic_launcher.xml | 6 - .../mipmap-anydpi-v26/ic_launcher_round.xml | 6 - .../src/main/res/mipmap-hdpi/ic_launcher.webp | Bin 1404 -> 0 bytes .../res/mipmap-hdpi/ic_launcher_round.webp | Bin 2898 -> 0 bytes .../src/main/res/mipmap-mdpi/ic_launcher.webp | Bin 982 -> 0 bytes .../res/mipmap-mdpi/ic_launcher_round.webp | Bin 1772 -> 0 bytes .../main/res/mipmap-xhdpi/ic_launcher.webp | Bin 1900 -> 0 bytes .../res/mipmap-xhdpi/ic_launcher_round.webp | Bin 3918 -> 0 bytes .../main/res/mipmap-xxhdpi/ic_launcher.webp | Bin 2884 -> 0 bytes .../res/mipmap-xxhdpi/ic_launcher_round.webp | Bin 5914 -> 0 bytes .../main/res/mipmap-xxxhdpi/ic_launcher.webp | Bin 3844 -> 0 bytes .../res/mipmap-xxxhdpi/ic_launcher_round.webp | Bin 7778 -> 0 bytes .../app/src/main/res/values/colors.xml | 10 - .../app/src/main/res/values/strings.xml | 7 - .../app/src/main/res/values/styles.xml | 14 - .../app/src/main/res/values/themes.xml | 4 - .../app/src/main/res/xml/backup_rules.xml | 13 - .../main/res/xml/data_extraction_rules.xml | 19 - .../android/LlamaDemo/build.gradle.kts | 13 - .../docs/delegates/mediatek_README.md | 185 ---- .../docs/delegates/qualcomm_README.md | 243 ----- .../docs/delegates/xnnpack_README.md | 199 ---- .../LlamaDemo/download_prebuilt_lib.sh | 19 - .../android/LlamaDemo/gradle.properties | 23 - .../gradle/wrapper/gradle-wrapper.jar | Bin 43583 -> 0 bytes .../gradle/wrapper/gradle-wrapper.properties | 7 - examples/demo-apps/android/LlamaDemo/gradlew | 252 ------ .../demo-apps/android/LlamaDemo/gradlew.bat | 94 -- .../LlamaDemo/run_instrumentation_test.sh | 27 - .../android/LlamaDemo/settings.gradle.kts | 27 - .../android/LlamaDemo/setup-with-qnn.sh | 19 - examples/demo-apps/android/LlamaDemo/setup.sh | 17 - examples/models/llama/README.md | 2 +- examples/models/llava/README.md | 2 +- 104 files changed, 8 insertions(+), 5812 deletions(-) delete mode 100644 docs/source/llm/llama-demo-android.md delete mode 100644 examples/demo-apps/android/LlamaDemo/.gitignore delete mode 100644 examples/demo-apps/android/LlamaDemo/README.md delete mode 100644 examples/demo-apps/android/LlamaDemo/SDK-quick-setup-guide.md delete mode 100644 examples/demo-apps/android/LlamaDemo/app/.gitignore delete mode 100644 examples/demo-apps/android/LlamaDemo/app/build.gradle.kts delete mode 100644 examples/demo-apps/android/LlamaDemo/app/proguard-rules.pro delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/androidTest/java/com/example/executorchllamademo/PerfTest.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/AndroidManifest.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/BUCK delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/AppLog.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/BackendType.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/DemoSharedPreferences.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ETImage.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ETLogging.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/LlmBenchmarkRunner.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/LogsActivity.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/LogsAdapter.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/MainActivity.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/Message.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/MessageAdapter.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/MessageType.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelRunner.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelRunnerCallback.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelType.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelUtils.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/PromptFormat.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/SettingsActivity.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/SettingsFields.java delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/banner_shape.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_add_24.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_add_photo_alternate_24.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_article_24.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_close_24.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_delete_forever_24.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_lightbulb_24.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_restart_alt_24.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_send_24.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_settings_24.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_stop_24.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/blue_lightbulb_24.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/btn.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/chat_background.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/custom_button_round.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/expand_circle_down.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/ic_launcher_background.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/ic_launcher_foreground.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/input_text_shape.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/logo.png delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/outline_add_box_48.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/outline_camera_alt_48.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/outline_image_48.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/prompt_shape.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/received_message.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/sent_message.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/three_dots.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_benchmarking.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_logs.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_main.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_settings.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/logs_message.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/received_message.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/sent_message.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/system_message.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/mipmap-anydpi-v26/ic_launcher.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/mipmap-anydpi-v26/ic_launcher_round.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/mipmap-hdpi/ic_launcher.webp delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/mipmap-hdpi/ic_launcher_round.webp delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/mipmap-mdpi/ic_launcher.webp delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/mipmap-mdpi/ic_launcher_round.webp delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/mipmap-xhdpi/ic_launcher.webp delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/mipmap-xhdpi/ic_launcher_round.webp delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/mipmap-xxhdpi/ic_launcher.webp delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/mipmap-xxhdpi/ic_launcher_round.webp delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/mipmap-xxxhdpi/ic_launcher.webp delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/mipmap-xxxhdpi/ic_launcher_round.webp delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/values/colors.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/values/strings.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/values/styles.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/values/themes.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/xml/backup_rules.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/app/src/main/res/xml/data_extraction_rules.xml delete mode 100644 examples/demo-apps/android/LlamaDemo/build.gradle.kts delete mode 100644 examples/demo-apps/android/LlamaDemo/docs/delegates/mediatek_README.md delete mode 100644 examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md delete mode 100644 examples/demo-apps/android/LlamaDemo/docs/delegates/xnnpack_README.md delete mode 100644 examples/demo-apps/android/LlamaDemo/download_prebuilt_lib.sh delete mode 100644 examples/demo-apps/android/LlamaDemo/gradle.properties delete mode 100644 examples/demo-apps/android/LlamaDemo/gradle/wrapper/gradle-wrapper.jar delete mode 100644 examples/demo-apps/android/LlamaDemo/gradle/wrapper/gradle-wrapper.properties delete mode 100755 examples/demo-apps/android/LlamaDemo/gradlew delete mode 100644 examples/demo-apps/android/LlamaDemo/gradlew.bat delete mode 100644 examples/demo-apps/android/LlamaDemo/run_instrumentation_test.sh delete mode 100644 examples/demo-apps/android/LlamaDemo/settings.gradle.kts delete mode 100644 examples/demo-apps/android/LlamaDemo/setup-with-qnn.sh delete mode 100644 examples/demo-apps/android/LlamaDemo/setup.sh diff --git a/.github/workflows/_android.yml b/.github/workflows/_android.yml index 2449e94b2af..94e3cc84f1e 100644 --- a/.github/workflows/_android.yml +++ b/.github/workflows/_android.yml @@ -48,19 +48,6 @@ jobs: bash examples/models/llama/install_requirements.sh bash ".ci/scripts/test_llama.sh" -model stories110M -build_tool cmake -dtype fp16 -mode portable -upload ${ARTIFACTS_DIR_NAME}/fp32-xnnpack-custom - mkdir -p examples/demo-apps/android/LlamaDemo/app/libs - cp aar-out/executorch.aar examples/demo-apps/android/LlamaDemo/app/libs - pushd examples/demo-apps/android/LlamaDemo - ANDROID_HOME="${ANDROID_SDK:-/opt/android/sdk}" ./gradlew build assembleAndroidTest - popd - - DEMO_APP_DIR="${ARTIFACTS_DIR_NAME}/llm_demo" - # The app directory is named using its build flavor as a suffix. - mkdir -p "${DEMO_APP_DIR}" - # Collect the app and its test suite - cp examples/demo-apps/android/LlamaDemo/app/build/outputs/apk/debug/*.apk "${DEMO_APP_DIR}" - cp examples/demo-apps/android/LlamaDemo/app/build/outputs/apk/androidTest/debug/*.apk "${DEMO_APP_DIR}" - # Running Android emulator directly on the runner and not using Docker run-emulator: needs: build-llm-demo @@ -103,8 +90,6 @@ jobs: shell: bash run: | set -eux - curl -O https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifacts/llm_demo/app-debug.apk - curl -O https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifacts/llm_demo/app-debug-androidTest.apk curl -O https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifacts/fp32-xnnpack-custom/model.zip curl -o android-test-debug-androidTest.apk https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifacts/library_test_dir/executorch_android-debug-androidTest.apk unzip model.zip diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml index ac9d1c7e6a0..a9d0f466e55 100644 --- a/.github/workflows/lint.yml +++ b/.github/workflows/lint.yml @@ -148,8 +148,6 @@ jobs: extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/*.java \ extension/android/executorch_android/src/main/java/org/pytorch/executorch/annotations/*.java \ extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/*.java \ - examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/*.java \ - examples/demo-apps/android/LlamaDemo/app/src/androidTest/java/com/example/executorchllamademo/*.java \ extension/benchmark/android/benchmark/app/src/main/java/org/pytorch/minibench/*.java \ extension/benchmark/android/benchmark/app/src/androidTest/java/org/pytorch/minibench/*.java) if [ -n "$FILES_NEEDS_FORMAT" ]; then diff --git a/README-wheel.md b/README-wheel.md index a59af8ea05f..7ae9b0aa2e0 100644 --- a/README-wheel.md +++ b/README-wheel.md @@ -25,6 +25,6 @@ tutorials and documentation. Here are some starting points: * [Exporting to ExecuTorch](https://pytorch.org/executorch/main/tutorials/export-to-executorch-tutorial) * Learn the fundamentals of exporting a PyTorch `nn.Module` to ExecuTorch, and optimizing its performance using quantization and hardware delegation. -* Running etLLM on [iOS](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/apple) and [Android](docs/source/llm/llama-demo-android.md) devices. +* Running etLLM on [iOS](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/apple) and [Android](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/android) devices. * Build and run LLaMA in a demo mobile app, and learn how to integrate models with your own apps. diff --git a/docs/source/index.md b/docs/source/index.md index 1c2fdbcc110..b308041b609 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -93,7 +93,7 @@ ExecuTorch provides support for: - [Exporting LLMs](llm/export-llm.md) - [Exporting custom LLMs](llm/export-custom-llm.md) - [Running with C++](llm/run-with-c-plus-plus.md) -- [Running on Android (XNNPack)](llm/llama-demo-android.md) +- [Running on Android (XNNPack)](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/android) - [Running on Android (QNN)](llm/build-run-llama3-qualcomm-ai-engine-direct-backend.md) - [Running on iOS](llm/run-on-ios.md) #### Backend Development @@ -251,7 +251,7 @@ Getting Started Exporting LLMs with export_llm Exporting custom LLMs Running with C++ -Running on Android +Running on Android Running on Android Running on iOS ``` diff --git a/docs/source/llm/getting-started.md b/docs/source/llm/getting-started.md index 849418342b6..6b6f9d96df7 100644 --- a/docs/source/llm/getting-started.md +++ b/docs/source/llm/getting-started.md @@ -21,6 +21,6 @@ Deploying LLMs to ExecuTorch can be boiled down to a two-step process: (1) expor - [Exporting LLMs](export-llm.md) - [Exporting custom LLMs](export-custom-llm.md) - [Running with C++](run-with-c-plus-plus.md) -- [Running on Android (XNNPack)](llama-demo-android.md) +- [Running on Android (XNNPack)](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/android) - [Running on Android (Qualcomm)](build-run-llama3-qualcomm-ai-engine-direct-backend.md) - [Running on iOS](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/apple) diff --git a/docs/source/llm/llama-demo-android.md b/docs/source/llm/llama-demo-android.md deleted file mode 100644 index 023f82baf33..00000000000 --- a/docs/source/llm/llama-demo-android.md +++ /dev/null @@ -1,2 +0,0 @@ -```{include} ../../../examples/demo-apps/android/LlamaDemo/README.md -``` diff --git a/docs/source/using-executorch-android.md b/docs/source/using-executorch-android.md index 6f0c5dad736..7b89baa4d4a 100644 --- a/docs/source/using-executorch-android.md +++ b/docs/source/using-executorch-android.md @@ -88,7 +88,7 @@ implementation("com.facebook.fbjni:fbjni:0.7.0") ### Example usage -In your app working directory, such as executorch/examples/demo-apps/android/LlamaDemo, +In your app working directory, such as executorch-examples/llm/android/LlamaDemo, ``` mkdir -p app/libs curl https://ossci-android.s3.amazonaws.com/executorch/release/${executorch_version}/executorch.aar -o app/libs/executorch.aar @@ -202,7 +202,7 @@ adb push extension/module/test/resources/add.pte /data/local/tmp/ This example loads an ExecuTorch module, prepares input data, runs inference, and processes the output data. Please use [DeepLabV3AndroidDemo](https://github.com/meta-pytorch/executorch-examples/tree/main/dl3/android/DeepLabV3Demo) -and [LlamaDemo](https://github.com/pytorch/executorch/tree/main/examples/demo-apps/android/LlamaDemo) for the code examples +and [LlamaDemo](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/android/LlamaDemo) for the code examples using ExecuTorch AAR package. ## Java API reference diff --git a/examples/demo-apps/android/LlamaDemo/.gitignore b/examples/demo-apps/android/LlamaDemo/.gitignore deleted file mode 100644 index 41853c0472c..00000000000 --- a/examples/demo-apps/android/LlamaDemo/.gitignore +++ /dev/null @@ -1,12 +0,0 @@ -*.iml -.gradle -/local.properties -.idea -.DS_Store -/build -/captures -.externalNativeBuild -.cxx -local.properties -*.so -*.aar diff --git a/examples/demo-apps/android/LlamaDemo/README.md b/examples/demo-apps/android/LlamaDemo/README.md deleted file mode 100644 index 9a6b3b020e7..00000000000 --- a/examples/demo-apps/android/LlamaDemo/README.md +++ /dev/null @@ -1,174 +0,0 @@ -# ExecuTorch Llama Android Demo App - -**[UPDATE - 2025-05-15]** We have added support for running Qwen3 0.6B and 4B model. Please see [this tutorial](https://github.com/pytorch/executorch/tree/main/examples/models/qwen3#summary) for export. Loading and running Qwen3 with this app is the same as Llama, as in this doc. - -We’re excited to share that the newly revamped Android demo app is live and includes many new updates to provide a more intuitive and smoother user experience with a chat use case! The primary goal of this app is to showcase how easily ExecuTorch can be integrated into an Android demo app and how to exercise the many features ExecuTorch and Llama models have to offer. - -This app serves as a valuable resource to inspire your creativity and provide foundational code that you can customize and adapt for your particular use case. - -Please dive in and start exploring our demo app today! We look forward to any feedback and are excited to see your innovative ideas. - - -## Key Concepts -From this demo app, you will learn many key concepts such as: -* How to prepare Llama models, build the ExecuTorch library, and model inferencing across delegates -* Expose the ExecuTorch library via JNI layer -* Familiarity with current ExecuTorch app-facing capabilities - -The goal is for you to see the type of support ExecuTorch provides and feel comfortable with leveraging it for your use cases. - -## Supporting Models -As a whole, the models that this app supports are (varies by delegate): -* Llama 3.2 Quantized 1B/3B -* Llama 3.2 1B/3B in BF16 -* Llama Guard 3 1B -* Llama 3.1 8B -* Llama 3 8B -* Llama 2 7B -* LLaVA-1.5 vision model (only XNNPACK) -* Qwen 3 0.6B, 1.7B, and 4B - - -## Building the APK -First it’s important to note that currently ExecuTorch provides support across 3 delegates. Once you identify the delegate of your choice, select the README link to get a complete end-to-end instructions for environment set-up to exporting the models to build ExecuTorch libraries and apps to run on device: - -| Delegate | Resource | -| ------------- | ------------- | -| XNNPACK (CPU-based library) | [link](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/xnnpack_README.md) | -| QNN (Qualcomm AI Accelerators) | [link](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md) | -| MediaTek (MediaTek AI Accelerators) | [link](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/mediatek_README.md) | - - -## How to Use the App - -This section will provide the main steps to use the app, along with a code snippet of the ExecuTorch API. - -For loading the app, development, and running on device we recommend Android Studio: -1. Open Android Studio and select "Open an existing Android Studio project" to open examples/demo-apps/android/LlamaDemo. -2. Run the app (^R). This builds and launches the app on the phone. - -### Opening the App - -Below are the UI features for the app. - -Select the settings widget to get started with picking a model, its parameters and any prompts. -

- -

- - - -### Select Models and Parameters - -Once you've selected the model, tokenizer, and model type you are ready to click on "Load Model" to have the app load the model and go back to the main Chat activity. -

- -

- - - -Optional Parameters: -* Temperature: Defaulted to 0, you can adjust the temperature for the model as well. The model will reload upon any adjustments. -* System Prompt: Without any formatting, you can enter in a system prompt. For example, "you are a travel assistant" or "give me a response in a few sentences". -* User Prompt: More for the advanced user, if you would like to manually input a prompt then you can do so by modifying the `{{user prompt}}`. You can also modify the special tokens as well. Once changed then go back to the main Chat activity to send. - -#### ExecuTorch App API - -```java -// Upon returning to the Main Chat Activity -mModule = new LlmModule( - ModelUtils.getModelCategory(mCurrentSettingsFields.getModelType()), - modelPath, - tokenizerPath, - temperature); -int loadResult = mModule.load(); -``` - -* `modelCategory`: Indicate whether it’s a text-only or vision model -* `modePath`: path to the .pte file -* `tokenizerPath`: path to the tokenizer file -* `temperature`: model parameter to adjust the randomness of the model’s output - - -### User Prompt -Once model is successfully loaded then enter any prompt and click the send (i.e. generate) button to send it to the model. -

- -

- -You can provide it more follow-up questions as well. -

- -

- -#### ExecuTorch App API - -```java -mModule.generate(prompt,sequence_length, MainActivity.this); -``` -* `prompt`: User formatted prompt -* `sequence_length`: Number of tokens to generate in response to a prompt -* `MainActivity.this`: Indicate that the callback functions (OnResult(), OnStats()) are present in this class. - -[*LLaVA-1.5: Only for XNNPACK delegate*] - -For LLaVA-1.5 implementation, select the exported LLaVA .pte and tokenizer file in the Settings menu and load the model. After this you can send an image from your gallery or take a live picture along with a text prompt to the model. - -

- -

- - -### Output Generated -To show completion of the follow-up question, here is the complete detailed response from the model. -

- -

- -#### ExecuTorch App API - -Ensure you have the following functions in your callback class that you provided in the `mModule.generate()`. For this example, it is `MainActivity.this`. -```java - @Override - public void onResult(String result) { - //...result contains token from response - //.. onResult will continue to be invoked until response is complete - } - - @Override - public void onStats(String stats) { - //... will be a json. See extension/llm/stats.h for the field definitions - } - -``` - -## Instrumentation Test -You can run the instrumentation test for sanity check. The test loads a model pte file and tokenizer.bin file -under `/data/local/tmp/llama`. - -### Model preparation -Go to ExecuTorch root, -```sh -curl -C - -Ls "https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.pt" --output stories110M.pt -curl -C - -Ls "https://raw.githubusercontent.com/karpathy/llama2.c/master/tokenizer.model" --output tokenizer.model -# Create params.json file -touch params.json -echo '{"dim": 768, "multiple_of": 32, "n_heads": 12, "n_layers": 12, "norm_eps": 1e-05, "vocab_size": 32000}' > params.json -python -m extension.llm.export.export_llm base.checkpoint=stories110M.pt base.params=params.json model.dtype_override="fp16" export.output_name=stories110m_h.pte model.use_kv_cache=True -python -m pytorch_tokenizers.tools.llama2c.convert -t tokenizer.model -o tokenizer.bin -``` -### Push model -```sh -adb mkdir -p /data/local/tmp/llama -adb push stories110m_h.pte /data/local/tmp/llama -adb push tokenizer.bin /data/local/tmp/llama -``` - -### Run test -Go to `examples/demo-apps/android/LlamaDemo`, -```sh -./gradlew connectedAndroidTest -``` - -## Reporting Issues -If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new), or join our discord [here](https://lnkd.in/gWCM4ViK). diff --git a/examples/demo-apps/android/LlamaDemo/SDK-quick-setup-guide.md b/examples/demo-apps/android/LlamaDemo/SDK-quick-setup-guide.md deleted file mode 100644 index 9ae79e96763..00000000000 --- a/examples/demo-apps/android/LlamaDemo/SDK-quick-setup-guide.md +++ /dev/null @@ -1,94 +0,0 @@ -# Guide to set up Java/SDK/NDK for Android - -Follow this doc if you haven't set up Java/SDK/NDK for Android development -already. -This doc provides a CLI tutorial to set them up. Otherwise, you can do the same -thing with Android Studio GUI. - -## Set up Java 17 -1. Download the archive from Oracle website. -Make sure you have read and agree with the terms and conditions from the website before downloading. -```bash -export DEV_HOME= -cd $DEV_HOME -``` -Linux: -```bash -curl https://download.oracle.com/java/17/archive/jdk-17.0.10_linux-x64_bin.tar.gz -o jdk-17.0.10.tar.gz -``` -macOS: -```bash -curl https://download.oracle.com/java/17/archive/jdk-17.0.10_macos-aarch64_bin.tar.gz -o jdk-17.0.10.tar.gz -``` -2. Unzip the archive. The directory named `jdk-17.0.10` is the Java root directory. -```bash -tar xf jdk-17.0.10.tar.gz -``` -3. Set `JAVA_HOME` and update `PATH`. - -Linux: -```bash -export JAVA_HOME="$DEV_HOME"/jdk-17.0.10 -export PATH="$JAVA_HOME/bin:$PATH" -``` -macOS: -```bash -export JAVA_HOME="$DEV_HOME"/jdk-17.0.10.jdk/Contents/Home -export PATH="$JAVA_HOME/bin:$PATH" -``` - -Note: Oracle has tutorials for installing Java on -[Linux](https://docs.oracle.com/en/java/javase/17/install/installation-jdk-linux-platforms.html#GUID-4A6BD592-1840-4BB4-A758-4CD49E9EE88B) -and [macOS](https://docs.oracle.com/en/java/javase/17/install/installation-jdk-macos.html#GUID-E8A251B6-D9A9-4276-ABC8-CC0DAD62EA33). -Some Linux distributions has JDK package in package manager. For example, Debian users can install -openjdk-17-jdk package. - -## Set up Android SDK/NDK -Android has a command line tool [sdkmanager](https://developer.android.com/tools/sdkmanager) which -helps users managing SDK and other tools related to Android development. - -1. Go to https://developer.android.com/studio and download the archive from "Command line tools -only" section. Make sure you have read and agree with the terms and conditions from the website. - -Linux: -```bash -curl https://dl.google.com/android/repository/commandlinetools-linux-11076708_latest.zip -o commandlinetools.zip -``` -macOS: -```bash -curl https://dl.google.com/android/repository/commandlinetools-mac-11076708_latest.zip -o commandlinetools.zip -``` -2. Unzip. -```bash -unzip commandlinetools.zip -``` -3. Specify a root for Android SDK. For example, we can put it under `$DEV_HOME/sdk`. - -``` -mkdir -p $DEV_HOME/sdk -export ANDROID_HOME="$(realpath $DEV_HOME/sdk)" -# Install SDK 34 -./cmdline-tools/bin/sdkmanager --sdk_root="${ANDROID_HOME}" --install "platforms;android-34" -# Install NDK -./cmdline-tools/bin/sdkmanager --sdk_root="${ANDROID_HOME}" --install "ndk;26.3.11579264" -# The NDK root is then under `ndk/`. -export ANDROID_NDK="$ANDROID_HOME/ndk/26.3.11579264" -``` - -### (Optional) Android Studio Setup -If you want to use Android Studio and never set up Java/SDK/NDK before, or if -you use the newly installed ones, follow these steps to set Android Studio to use -them. - -Copy these output paths to be used by Android Studio -```bash -echo $ANDROID_HOME -echo $ANDROID_NDK -echo $JAVA_HOME -``` - -Open a project in Android Studio. In Project Structure (File -> Project -Structure, or `⌘;`) -> SDK Location, -* Set Android SDK Location to the path of $ANDROID_HOME -* Set Android NDK Location to the path of $ANDROID_NDK -* Set JDK location (Click Gradle Settings link) -> Gradle JDK -> Add JDK... to the path of $JAVA_HOME diff --git a/examples/demo-apps/android/LlamaDemo/app/.gitignore b/examples/demo-apps/android/LlamaDemo/app/.gitignore deleted file mode 100644 index 796b96d1c40..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/.gitignore +++ /dev/null @@ -1 +0,0 @@ -/build diff --git a/examples/demo-apps/android/LlamaDemo/app/build.gradle.kts b/examples/demo-apps/android/LlamaDemo/app/build.gradle.kts deleted file mode 100644 index beba2696c15..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/build.gradle.kts +++ /dev/null @@ -1,103 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -plugins { - id("com.android.application") - id("org.jetbrains.kotlin.android") -} - -val qnnVersion: String? = project.findProperty("qnnVersion") as? String - -android { - namespace = "com.example.executorchllamademo" - compileSdk = 34 - - defaultConfig { - applicationId = "com.example.executorchllamademo" - minSdk = 28 - targetSdk = 33 - versionCode = 1 - versionName = "1.0" - - testInstrumentationRunner = "androidx.test.runner.AndroidJUnitRunner" - vectorDrawables { useSupportLibrary = true } - externalNativeBuild { cmake { cppFlags += "" } } - } - - buildTypes { - release { - isMinifyEnabled = false - proguardFiles(getDefaultProguardFile("proguard-android-optimize.txt"), "proguard-rules.pro") - } - } - compileOptions { - sourceCompatibility = JavaVersion.VERSION_1_8 - targetCompatibility = JavaVersion.VERSION_1_8 - } - kotlinOptions { jvmTarget = "1.8" } - buildFeatures { compose = true } - composeOptions { kotlinCompilerExtensionVersion = "1.4.3" } - packaging { resources { excludes += "/META-INF/{AL2.0,LGPL2.1}" } } -} - -dependencies { - implementation("androidx.core:core-ktx:1.9.0") - implementation("androidx.lifecycle:lifecycle-runtime-ktx:2.6.1") - implementation("androidx.activity:activity-compose:1.7.0") - implementation(platform("androidx.compose:compose-bom:2023.03.00")) - implementation("androidx.compose.ui:ui") - implementation("androidx.compose.ui:ui-graphics") - implementation("androidx.compose.ui:ui-tooling-preview") - implementation("androidx.compose.material3:material3") - implementation("androidx.appcompat:appcompat:1.6.1") - implementation("androidx.camera:camera-core:1.3.0-rc02") - implementation("androidx.constraintlayout:constraintlayout:2.2.0-alpha12") - implementation("com.facebook.fbjni:fbjni:0.7.0") - implementation("com.google.code.gson:gson:2.8.6") - implementation(files("libs/executorch.aar")) - implementation("com.google.android.material:material:1.12.0") - implementation("androidx.activity:activity:1.9.0") - implementation("org.json:json:20250107") - if (!qnnVersion.isNullOrEmpty()) { - implementation("com.qualcomm.qti:qnn-runtime:$qnnVersion") - } - testImplementation("junit:junit:4.13.2") - androidTestImplementation("androidx.test.ext:junit:1.1.5") - androidTestImplementation("androidx.test.espresso:espresso-core:3.5.1") - androidTestImplementation(platform("androidx.compose:compose-bom:2023.03.00")) - androidTestImplementation("androidx.compose.ui:ui-test-junit4") - debugImplementation("androidx.compose.ui:ui-tooling") - debugImplementation("androidx.compose.ui:ui-test-manifest") -} - -tasks.register("setup") { - doFirst { - exec { - commandLine("sh", "examples/demo-apps/android/LlamaDemo/setup.sh") - workingDir("../../../../../") - } - } -} - -tasks.register("setupQnn") { - doFirst { - exec { - commandLine("sh", "examples/demo-apps/android/LlamaDemo/setup-with-qnn.sh") - workingDir("../../../../../") - } - } -} - -tasks.register("download_prebuilt_lib") { - doFirst { - exec { - commandLine("sh", "examples/demo-apps/android/LlamaDemo/download_prebuilt_lib.sh") - workingDir("../../../../../") - } - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/proguard-rules.pro b/examples/demo-apps/android/LlamaDemo/app/proguard-rules.pro deleted file mode 100644 index 481bb434814..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/proguard-rules.pro +++ /dev/null @@ -1,21 +0,0 @@ -# Add project specific ProGuard rules here. -# You can control the set of applied configuration files using the -# proguardFiles setting in build.gradle. -# -# For more details, see -# http://developer.android.com/guide/developing/tools/proguard.html - -# If your project uses WebView with JS, uncomment the following -# and specify the fully qualified class name to the JavaScript interface -# class: -#-keepclassmembers class fqcn.of.javascript.interface.for.webview { -# public *; -#} - -# Uncomment this to preserve the line number information for -# debugging stack traces. -#-keepattributes SourceFile,LineNumberTable - -# If you keep the line number information, uncomment this to -# hide the original source file name. -#-renamesourcefileattribute SourceFile \ No newline at end of file diff --git a/examples/demo-apps/android/LlamaDemo/app/src/androidTest/java/com/example/executorchllamademo/PerfTest.java b/examples/demo-apps/android/LlamaDemo/app/src/androidTest/java/com/example/executorchllamademo/PerfTest.java deleted file mode 100644 index 32ec24a0df9..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/androidTest/java/com/example/executorchllamademo/PerfTest.java +++ /dev/null @@ -1,92 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertFalse; - -import android.os.Bundle; -import androidx.test.ext.junit.runners.AndroidJUnit4; -import androidx.test.platform.app.InstrumentationRegistry; -import java.io.File; -import java.util.ArrayList; -import java.util.Arrays; -import java.util.List; -import org.json.JSONException; -import org.json.JSONObject; -import org.junit.Test; -import org.junit.runner.RunWith; -import org.pytorch.executorch.extension.llm.LlmCallback; -import org.pytorch.executorch.extension.llm.LlmModule; - -@RunWith(AndroidJUnit4.class) -public class PerfTest implements LlmCallback { - - private static final String RESOURCE_PATH = "/data/local/tmp/llama/"; - private static final String TOKENIZER_BIN = "tokenizer.bin"; - - private final List results = new ArrayList<>(); - private final List tokensPerSecond = new ArrayList<>(); - - @Test - public void testTokensPerSecond() { - String tokenizerPath = RESOURCE_PATH + TOKENIZER_BIN; - // Find out the model name - File directory = new File(RESOURCE_PATH); - Arrays.stream(directory.listFiles()) - .filter(file -> file.getName().endsWith(".pte")) - .forEach( - model -> { - LlmModule mModule = new LlmModule(model.getPath(), tokenizerPath, 0.8f); - // Print the model name because there might be more than one of them - report("ModelName", model.getName()); - - int loadResult = mModule.load(); - // Check that the model can be load successfully - assertEquals(0, loadResult); - - // Run a testing prompt - mModule.generate("How do you do! I'm testing llama2 on mobile device", PerfTest.this); - assertFalse(tokensPerSecond.isEmpty()); - - final Float tps = tokensPerSecond.get(tokensPerSecond.size() - 1); - report("TPS", tps); - }); - } - - @Override - public void onResult(String result) { - results.add(result); - } - - @Override - public void onStats(String result) { - try { - JSONObject jsonObject = new JSONObject(result); - int numGeneratedTokens = jsonObject.getInt("generated_tokens"); - int inferenceEndMs = jsonObject.getInt("inference_end_ms"); - int promptEvalEndMs = jsonObject.getInt("prompt_eval_end_ms"); - float tps = (float) numGeneratedTokens / (inferenceEndMs - promptEvalEndMs) * 1000; - tokensPerSecond.add(tps); - } catch (JSONException e) { - } - } - - private void report(final String metric, final Float value) { - Bundle bundle = new Bundle(); - bundle.putFloat(metric, value); - InstrumentationRegistry.getInstrumentation().sendStatus(0, bundle); - } - - private void report(final String key, final String value) { - Bundle bundle = new Bundle(); - bundle.putString(key, value); - InstrumentationRegistry.getInstrumentation().sendStatus(0, bundle); - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/AndroidManifest.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/AndroidManifest.xml deleted file mode 100644 index 7096a7d4e76..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/AndroidManifest.xml +++ /dev/null @@ -1,85 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/BUCK b/examples/demo-apps/android/LlamaDemo/app/src/main/BUCK deleted file mode 100644 index a64e11d1306..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/BUCK +++ /dev/null @@ -1,67 +0,0 @@ -load("@fbcode_macros//build_defs:build_file_migration.bzl", "fbcode_target", "non_fbcode_target") -load("@fbsource//tools/build_defs/android:fb_android_binary.bzl", "fb_android_binary") -load("@fbsource//tools/build_defs/android:fb_android_library.bzl", "fb_android_library") -load("@fbsource//tools/build_defs/android:fb_android_resource.bzl", "fb_android_resource") - -oncall("executorch") - -non_fbcode_target(_kind = fb_android_resource, - name = "app_res", - package = "com.example.executorchllamademo", - res = "res", -) - -non_fbcode_target(_kind = fb_android_library, - name = "app_lib", - srcs = [ - "java/com/example/executorchllamademo/AppLog.java", - "java/com/example/executorchllamademo/BackendType.java", - "java/com/example/executorchllamademo/DemoSharedPreferences.java", - "java/com/example/executorchllamademo/ETImage.java", - "java/com/example/executorchllamademo/ETLogging.java", - "java/com/example/executorchllamademo/LlmBenchmarkRunner.java", - "java/com/example/executorchllamademo/LogsActivity.java", - "java/com/example/executorchllamademo/LogsAdapter.java", - "java/com/example/executorchllamademo/MainActivity.java", - "java/com/example/executorchllamademo/Message.java", - "java/com/example/executorchllamademo/MessageAdapter.java", - "java/com/example/executorchllamademo/MessageType.java", - "java/com/example/executorchllamademo/ModelRunner.java", - "java/com/example/executorchllamademo/ModelRunnerCallback.java", - "java/com/example/executorchllamademo/ModelType.java", - "java/com/example/executorchllamademo/ModelUtils.java", - "java/com/example/executorchllamademo/PromptFormat.java", - "java/com/example/executorchllamademo/SettingsActivity.java", - "java/com/example/executorchllamademo/SettingsFields.java", - ], - autoglob = False, - language = "JAVA", - deps = [ - ":app_res", - "//third-party/java/androidx/constraintlayout/constraintlayout:constraintlayout", - "//third-party/java/com/google/code/gson/gson:gson", - "//xplat/executorch/extension/android:executorch_llama", - ], -) - -non_fbcode_target(_kind = fb_android_binary, - name = "ExecuTorchLlamaDemo", - keystore = "//fbandroid/keystores:debug", - manifest = "AndroidManifest.xml", - manifest_entries = { - "min_sdk_version": 21, - "target_sdk_version": 34, - "version_code": "1", - "version_name": "1.0", - }, - package_type = "release", - skip_proguard = True, - deps = [ - ":app_lib", - ":app_res", - "//third-party/java/androidx/appcompat/appcompat:appcompat", - "//third-party/java/com/google/code/gson/gson:gson", - "//xplat/executorch/extension/android:executorch_llama", - "//xplat/executorch/extension/android/jni:executorch_llama_jni", - ], -) diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/AppLog.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/AppLog.java deleted file mode 100644 index 36d07419381..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/AppLog.java +++ /dev/null @@ -1,49 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -import java.text.SimpleDateFormat; -import java.util.Date; -import java.util.Locale; - -public class AppLog { - private final Long timestamp; - private final String message; - - public AppLog(String message) { - this.timestamp = getCurrentTimeStamp(); - this.message = message; - } - - public Long getTimestamp() { - return timestamp; - } - - public String getMessage() { - return message; - } - - public String getFormattedLog() { - return "[" + getFormattedTimeStamp() + "] " + message; - } - - private Long getCurrentTimeStamp() { - return System.currentTimeMillis(); - } - - private String getFormattedTimeStamp() { - return formatDate(timestamp); - } - - private String formatDate(long milliseconds) { - SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss", Locale.getDefault()); - Date date = new Date(milliseconds); - return formatter.format(date); - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/BackendType.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/BackendType.java deleted file mode 100644 index 7c84799795f..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/BackendType.java +++ /dev/null @@ -1,7 +0,0 @@ -package com.example.executorchllamademo; - -public enum BackendType { - XNNPACK, - QUALCOMM, - MEDIATEK -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/DemoSharedPreferences.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/DemoSharedPreferences.java deleted file mode 100644 index 99a94c00ebb..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/DemoSharedPreferences.java +++ /dev/null @@ -1,90 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -import android.content.Context; -import android.content.SharedPreferences; -import com.google.gson.Gson; -import com.google.gson.reflect.TypeToken; -import java.lang.reflect.Type; -import java.util.ArrayList; - -public class DemoSharedPreferences { - Context context; - SharedPreferences sharedPreferences; - - public DemoSharedPreferences(Context context) { - this.context = context; - this.sharedPreferences = getSharedPrefs(); - } - - private SharedPreferences getSharedPrefs() { - return context.getSharedPreferences( - context.getString(R.string.demo_pref_file_key), Context.MODE_PRIVATE); - } - - public String getSavedMessages() { - return sharedPreferences.getString(context.getString(R.string.saved_messages_json_key), ""); - } - - public void addMessages(MessageAdapter messageAdapter) { - SharedPreferences.Editor editor = sharedPreferences.edit(); - Gson gson = new Gson(); - String msgJSON = gson.toJson(messageAdapter.getSavedMessages()); - editor.putString(context.getString(R.string.saved_messages_json_key), msgJSON); - editor.apply(); - } - - public void removeExistingMessages() { - SharedPreferences.Editor editor = sharedPreferences.edit(); - editor.remove(context.getString(R.string.saved_messages_json_key)); - editor.apply(); - } - - public void addSettings(SettingsFields settingsFields) { - SharedPreferences.Editor editor = sharedPreferences.edit(); - Gson gson = new Gson(); - String settingsJSON = gson.toJson(settingsFields); - editor.putString(context.getString(R.string.settings_json_key), settingsJSON); - editor.apply(); - } - - public String getSettings() { - return sharedPreferences.getString(context.getString(R.string.settings_json_key), ""); - } - - public void saveLogs() { - SharedPreferences.Editor editor = sharedPreferences.edit(); - Gson gson = new Gson(); - String msgJSON = gson.toJson(ETLogging.getInstance().getLogs()); - editor.putString(context.getString(R.string.logs_json_key), msgJSON); - editor.apply(); - } - - public void removeExistingLogs() { - SharedPreferences.Editor editor = sharedPreferences.edit(); - editor.remove(context.getString(R.string.logs_json_key)); - editor.apply(); - } - - public ArrayList getSavedLogs() { - String logsJSONString = - sharedPreferences.getString(context.getString(R.string.logs_json_key), null); - if (logsJSONString == null || logsJSONString.isEmpty()) { - return new ArrayList<>(); - } - Gson gson = new Gson(); - Type type = new TypeToken>() {}.getType(); - ArrayList appLogs = gson.fromJson(logsJSONString, type); - if (appLogs == null) { - return new ArrayList<>(); - } - return appLogs; - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ETImage.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ETImage.java deleted file mode 100644 index e68c8472626..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ETImage.java +++ /dev/null @@ -1,126 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -import android.content.ContentResolver; -import android.graphics.Bitmap; -import android.graphics.BitmapFactory; -import android.graphics.Color; -import android.net.Uri; -import androidx.annotation.Nullable; -import java.io.FileNotFoundException; -import java.io.InputStream; - -public class ETImage { - private int width; - private int height; - private final byte[] bytes; - private final Uri uri; - private final ContentResolver contentResolver; - - ETImage(ContentResolver contentResolver, Uri uri) { - this.contentResolver = contentResolver; - this.uri = uri; - bytes = getBytesFromImageURI(uri); - } - - public int getWidth() { - return width; - } - - public int getHeight() { - return height; - } - - public Uri getUri() { - return uri; - } - - public byte[] getBytes() { - return bytes; - } - - public int[] getInts() { - // We need to convert the byte array to an int array because - // the runner expects an int array as input. - int[] intArray = new int[bytes.length]; - for (int i = 0; i < bytes.length; i++) { - intArray[i] = (bytes[i++] & 0xFF); - } - return intArray; - } - - private byte[] getBytesFromImageURI(Uri uri) { - try { - int RESIZED_IMAGE_WIDTH = 336; - Bitmap bitmap = resizeImage(uri, RESIZED_IMAGE_WIDTH); - - if (bitmap == null) { - ETLogging.getInstance().log("Unable to get bytes from Image URI. Bitmap is null"); - return new byte[0]; - } - - width = bitmap.getWidth(); - height = bitmap.getHeight(); - - byte[] rgbValues = new byte[width * height * 3]; - - for (int y = 0; y < height; y++) { - for (int x = 0; x < width; x++) { - // Get the color of the current pixel - int color = bitmap.getPixel(x, y); - - // Extract the RGB values from the color - int red = Color.red(color); - int green = Color.green(color); - int blue = Color.blue(color); - - // Store the RGB values in the byte array - rgbValues[y * width + x] = (byte) red; - rgbValues[(y * width + x) + height * width] = (byte) green; - rgbValues[(y * width + x) + 2 * height * width] = (byte) blue; - } - } - return rgbValues; - } catch (FileNotFoundException e) { - throw new RuntimeException(e); - } - } - - @Nullable - private Bitmap resizeImage(Uri uri, int maxLength) throws FileNotFoundException { - InputStream inputStream = contentResolver.openInputStream(uri); - if (inputStream == null) { - ETLogging.getInstance().log("Unable to resize image, input streams is null"); - return null; - } - Bitmap bitmap = BitmapFactory.decodeStream(inputStream); - if (bitmap == null) { - ETLogging.getInstance().log("Unable to resize image, bitmap during decode stream is null"); - return null; - } - - float aspectRatio; - int finalWidth, finalHeight; - - if (bitmap.getWidth() > bitmap.getHeight()) { - // width > height --> width = maxLength, height scale with aspect ratio - aspectRatio = bitmap.getWidth() / (float) bitmap.getHeight(); - finalWidth = maxLength; - finalHeight = Math.round(maxLength / aspectRatio); - } else { - // height >= width --> height = maxLength, width scale with aspect ratio - aspectRatio = bitmap.getHeight() / (float) bitmap.getWidth(); - finalHeight = maxLength; - finalWidth = Math.round(maxLength / aspectRatio); - } - - return Bitmap.createScaledBitmap(bitmap, finalWidth, finalHeight, false); - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ETLogging.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ETLogging.java deleted file mode 100644 index e595348945f..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ETLogging.java +++ /dev/null @@ -1,54 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -import android.app.Application; -import android.util.Log; -import java.util.ArrayList; - -public class ETLogging extends Application { - private static ETLogging singleton; - - private ArrayList logs; - private DemoSharedPreferences mDemoSharedPreferences; - - @Override - public void onCreate() { - super.onCreate(); - singleton = this; - mDemoSharedPreferences = new DemoSharedPreferences(this.getApplicationContext()); - logs = mDemoSharedPreferences.getSavedLogs(); - if (logs == null) { // We don't have existing sharedPreference stored - logs = new ArrayList<>(); - } - } - - public static ETLogging getInstance() { - return singleton; - } - - public void log(String message) { - AppLog appLog = new AppLog(message); - logs.add(appLog); - Log.d("ETLogging", appLog.getMessage()); - } - - public ArrayList getLogs() { - return logs; - } - - public void clearLogs() { - logs.clear(); - mDemoSharedPreferences.removeExistingLogs(); - } - - public void saveLogs() { - mDemoSharedPreferences.saveLogs(); - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/LlmBenchmarkRunner.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/LlmBenchmarkRunner.java deleted file mode 100644 index 8c2d60252a0..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/LlmBenchmarkRunner.java +++ /dev/null @@ -1,223 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -import android.app.Activity; -import android.app.ActivityManager; -import android.content.Intent; -import android.os.Build; -import android.os.Bundle; -import android.util.Log; -import android.widget.TextView; -import androidx.annotation.NonNull; -import com.google.gson.Gson; -import java.io.File; -import java.io.FileWriter; -import java.io.IOException; -import java.util.ArrayList; -import java.util.Arrays; -import java.util.List; -import java.util.regex.Matcher; -import java.util.regex.Pattern; - -public class LlmBenchmarkRunner extends Activity implements ModelRunnerCallback { - ModelRunner mModelRunner; - - String mPrompt; - TextView mTextView; - StatsDump mStatsDump; - - @Override - protected void onCreate(Bundle savedInstanceState) { - super.onCreate(savedInstanceState); - setContentView(R.layout.activity_benchmarking); - mTextView = findViewById(R.id.log_view); - - Intent intent = getIntent(); - - File modelDir = new File(intent.getStringExtra("model_dir")); - File model = - Arrays.stream(modelDir.listFiles()) - .filter(file -> file.getName().endsWith(".pte")) - .findFirst() - .get(); - String tokenizerPath = intent.getStringExtra("tokenizer_path"); - - float temperature = intent.getFloatExtra("temperature", 0.8f); - mPrompt = intent.getStringExtra("prompt"); - if (mPrompt == null) { - mPrompt = "The ultimate answer"; - } - - mStatsDump = new StatsDump(); - mStatsDump.modelName = model.getName().replace(".pte", ""); - mModelRunner = new ModelRunner(model.getPath(), tokenizerPath, temperature, this); - mStatsDump.loadStart = System.nanoTime(); - } - - @Override - public void onModelLoaded(int status) { - mStatsDump.loadEnd = System.nanoTime(); - mStatsDump.loadStatus = status; - if (status != 0) { - Log.e("LlmBenchmarkRunner", "Loaded failed: " + status); - onGenerationStopped(); - return; - } - mStatsDump.generateStart = System.nanoTime(); - mModelRunner.generate(mPrompt); - } - - @Override - public void onTokenGenerated(String token) { - runOnUiThread( - () -> { - mTextView.append(token); - }); - } - - @Override - public void onStats(String stats) { - mStatsDump.tokens = stats; - } - - @Override - public void onGenerationStopped() { - mStatsDump.generateEnd = System.nanoTime(); - runOnUiThread( - () -> { - mTextView.append(mStatsDump.toString()); - }); - - final BenchmarkMetric.BenchmarkModel benchmarkModel = - BenchmarkMetric.extractBackendAndQuantization(mStatsDump.modelName); - final List results = new ArrayList<>(); - // The list of metrics we have atm includes: - // Load status - results.add(new BenchmarkMetric(benchmarkModel, "load_status", mStatsDump.loadStatus, 0)); - // Model load time - results.add( - new BenchmarkMetric( - benchmarkModel, - "model_load_time(ms)", - (mStatsDump.loadEnd - mStatsDump.loadStart) * 1e-6, - 0.0f)); - // LLM generate time - results.add( - new BenchmarkMetric( - benchmarkModel, - "generate_time(ms)", - (mStatsDump.generateEnd - mStatsDump.generateStart) * 1e-6, - 0.0f)); - // Token per second - results.add( - new BenchmarkMetric(benchmarkModel, "token_per_sec", extractTPS(mStatsDump.tokens), 0.0f)); - - try (FileWriter writer = new FileWriter(getFilesDir() + "/benchmark_results.json")) { - Gson gson = new Gson(); - writer.write(gson.toJson(results)); - } catch (IOException e) { - e.printStackTrace(); - } - } - - private double extractTPS(final String tokens) { - final Matcher m = Pattern.compile("\\d+\\.?\\d*").matcher(tokens); - if (m.find()) { - return Double.parseDouble(m.group()); - } else { - return 0.0f; - } - } -} - -class BenchmarkMetric { - public static class BenchmarkModel { - // The model name, i.e. stories110M - String name; - String backend; - String quantization; - - public BenchmarkModel(final String name, final String backend, final String quantization) { - this.name = name; - this.backend = backend; - this.quantization = quantization; - } - } - - BenchmarkModel benchmarkModel; - - // The metric name, i.e. TPS - String metric; - - // The actual value and the option target value - double actualValue; - double targetValue; - - public static class DeviceInfo { - // Let's see which information we want to include here - final String device = Build.BRAND; - // The phone model and Android release version - final String arch = Build.MODEL; - final String os = "Android " + Build.VERSION.RELEASE; - final long totalMem = new ActivityManager.MemoryInfo().totalMem; - final long availMem = new ActivityManager.MemoryInfo().availMem; - } - - DeviceInfo deviceInfo = new DeviceInfo(); - - public BenchmarkMetric( - final BenchmarkModel benchmarkModel, - final String metric, - final double actualValue, - final double targetValue) { - this.benchmarkModel = benchmarkModel; - this.metric = metric; - this.actualValue = actualValue; - this.targetValue = targetValue; - } - - // TODO (huydhn): Figure out a way to extract the backend and quantization information from - // the .pte model itself instead of parsing its name - public static BenchmarkMetric.BenchmarkModel extractBackendAndQuantization(final String model) { - final Matcher m = - Pattern.compile("(?\\w+)_(?[\\w\\+]+)_(?\\w+)").matcher(model); - if (m.matches()) { - return new BenchmarkMetric.BenchmarkModel( - m.group("name"), m.group("backend"), m.group("quantization")); - } else { - return new BenchmarkMetric.BenchmarkModel(model, "", ""); - } - } -} - -class StatsDump { - int loadStatus; - long loadStart; - long loadEnd; - long generateStart; - long generateEnd; - String tokens; - String modelName; - - @NonNull - @Override - public String toString() { - return "loadStart: " - + loadStart - + "\nloadEnd: " - + loadEnd - + "\ngenerateStart: " - + generateStart - + "\ngenerateEnd: " - + generateEnd - + "\n" - + tokens; - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/LogsActivity.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/LogsActivity.java deleted file mode 100644 index 7777b275e6e..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/LogsActivity.java +++ /dev/null @@ -1,92 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -import android.app.AlertDialog; -import android.content.DialogInterface; -import android.os.Build; -import android.os.Bundle; -import android.widget.ImageButton; -import android.widget.ListView; -import androidx.appcompat.app.AppCompatActivity; -import androidx.core.content.ContextCompat; -import androidx.core.graphics.Insets; -import androidx.core.view.ViewCompat; -import androidx.core.view.WindowInsetsCompat; - -public class LogsActivity extends AppCompatActivity { - - private LogsAdapter mLogsAdapter; - - @Override - protected void onCreate(Bundle savedInstanceState) { - super.onCreate(savedInstanceState); - setContentView(R.layout.activity_logs); - if (Build.VERSION.SDK_INT >= 21) { - getWindow().setStatusBarColor(ContextCompat.getColor(this, R.color.status_bar)); - getWindow().setNavigationBarColor(ContextCompat.getColor(this, R.color.nav_bar)); - } - ViewCompat.setOnApplyWindowInsetsListener( - requireViewById(R.id.main), - (v, insets) -> { - Insets systemBars = insets.getInsets(WindowInsetsCompat.Type.systemBars()); - v.setPadding(systemBars.left, systemBars.top, systemBars.right, systemBars.bottom); - return insets; - }); - - setupLogs(); - setupClearLogsButton(); - } - - @Override - public void onResume() { - super.onResume(); - mLogsAdapter.clear(); - mLogsAdapter.addAll(ETLogging.getInstance().getLogs()); - mLogsAdapter.notifyDataSetChanged(); - } - - private void setupLogs() { - ListView mLogsListView = requireViewById(R.id.logsListView); - mLogsAdapter = new LogsAdapter(this, R.layout.logs_message); - - mLogsListView.setAdapter(mLogsAdapter); - mLogsAdapter.addAll(ETLogging.getInstance().getLogs()); - mLogsAdapter.notifyDataSetChanged(); - } - - private void setupClearLogsButton() { - ImageButton clearLogsButton = requireViewById(R.id.clearLogsButton); - clearLogsButton.setOnClickListener( - view -> { - new AlertDialog.Builder(this) - .setTitle("Delete Logs History") - .setMessage("Do you really want to delete logs history?") - .setIcon(android.R.drawable.ic_dialog_alert) - .setPositiveButton( - android.R.string.yes, - new DialogInterface.OnClickListener() { - public void onClick(DialogInterface dialog, int whichButton) { - // Clear the messageAdapter and sharedPreference - ETLogging.getInstance().clearLogs(); - mLogsAdapter.clear(); - mLogsAdapter.notifyDataSetChanged(); - } - }) - .setNegativeButton(android.R.string.no, null) - .show(); - }); - } - - @Override - protected void onDestroy() { - super.onDestroy(); - ETLogging.getInstance().saveLogs(); - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/LogsAdapter.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/LogsAdapter.java deleted file mode 100644 index 76c6a1aa1b4..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/LogsAdapter.java +++ /dev/null @@ -1,45 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -import android.view.LayoutInflater; -import android.view.View; -import android.view.ViewGroup; -import android.widget.ArrayAdapter; -import android.widget.TextView; -import androidx.annotation.NonNull; -import java.util.Objects; - -public class LogsAdapter extends ArrayAdapter { - public LogsAdapter(android.content.Context context, int resource) { - super(context, resource); - } - - static class ViewHolder { - private TextView logTextView; - } - - @NonNull - @Override - public View getView(int position, View convertView, @NonNull ViewGroup parent) { - ViewHolder mViewHolder = null; - - String logMessage = Objects.requireNonNull(getItem(position)).getFormattedLog(); - - if (convertView == null || convertView.getTag() == null) { - mViewHolder = new ViewHolder(); - convertView = LayoutInflater.from(getContext()).inflate(R.layout.logs_message, parent, false); - mViewHolder.logTextView = convertView.requireViewById(R.id.logsTextView); - } else { - mViewHolder = (ViewHolder) convertView.getTag(); - } - mViewHolder.logTextView.setText(logMessage); - return convertView; - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/MainActivity.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/MainActivity.java deleted file mode 100644 index f995c5bc65a..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/MainActivity.java +++ /dev/null @@ -1,847 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -import android.Manifest; -import android.app.ActivityManager; -import android.app.AlertDialog; -import android.content.ContentResolver; -import android.content.ContentValues; -import android.content.Intent; -import android.content.pm.PackageManager; -import android.net.Uri; -import android.os.Build; -import android.os.Bundle; -import android.os.Handler; -import android.os.Looper; -import android.os.Process; -import android.provider.MediaStore; -import android.system.ErrnoException; -import android.system.Os; -import android.util.Log; -import android.view.View; -import android.view.inputmethod.InputMethodManager; -import android.widget.EditText; -import android.widget.ImageButton; -import android.widget.ImageView; -import android.widget.LinearLayout; -import android.widget.ListView; -import android.widget.TextView; -import android.widget.Toast; -import androidx.activity.result.ActivityResultLauncher; -import androidx.activity.result.PickVisualMediaRequest; -import androidx.activity.result.contract.ActivityResultContracts; -import androidx.annotation.NonNull; -import androidx.appcompat.app.AppCompatActivity; -import androidx.constraintlayout.widget.ConstraintLayout; -import androidx.core.app.ActivityCompat; -import androidx.core.content.ContextCompat; -import androidx.core.content.res.ResourcesCompat; -import com.google.gson.Gson; -import com.google.gson.reflect.TypeToken; -import java.lang.reflect.Type; -import java.util.ArrayList; -import java.util.List; -import java.util.concurrent.Executor; -import java.util.concurrent.Executors; -import org.json.JSONException; -import org.json.JSONObject; -import org.pytorch.executorch.extension.llm.LlmCallback; -import org.pytorch.executorch.extension.llm.LlmModule; - -public class MainActivity extends AppCompatActivity implements Runnable, LlmCallback { - private EditText mEditTextMessage; - private ImageButton mThinkModeButton; - private ImageButton mSendButton; - private ImageButton mGalleryButton; - private ImageButton mCameraButton; - private ListView mMessagesView; - private MessageAdapter mMessageAdapter; - private LlmModule mModule = null; - private Message mResultMessage = null; - private ImageButton mSettingsButton; - private TextView mMemoryView; - private ActivityResultLauncher mPickGallery; - private ActivityResultLauncher mCameraRoll; - private List mSelectedImageUri; - private ConstraintLayout mMediaPreviewConstraintLayout; - private LinearLayout mAddMediaLayout; - private static final int MAX_NUM_OF_IMAGES = 5; - private static final int REQUEST_IMAGE_CAPTURE = 1; - private Uri cameraImageUri; - private DemoSharedPreferences mDemoSharedPreferences; - private SettingsFields mCurrentSettingsFields; - private Handler mMemoryUpdateHandler; - private Runnable memoryUpdater; - private boolean mThinkMode = false; - private int promptID = 0; - private static final int CONVERSATION_HISTORY_MESSAGE_LOOKBACK = 2; - private Executor executor; - - @Override - public void onResult(String result) { - if (result.equals(PromptFormat.getStopToken(mCurrentSettingsFields.getModelType()))) { - return; - } - result = PromptFormat.replaceSpecialToken(mCurrentSettingsFields.getModelType(), result); - if (result.equals("\n\n") || result.equals("\n")) { - if (!mResultMessage.getText().isEmpty()) { - mResultMessage.appendText(result); - run(); - } - } else { - mResultMessage.appendText(result); - run(); - } - } - - @Override - public void onStats(String stats) { - runOnUiThread( - () -> { - if (mResultMessage != null) { - float tps = 0; - try { - JSONObject jsonObject = new JSONObject(stats); - int numGeneratedTokens = jsonObject.getInt("generated_tokens"); - int inferenceEndMs = jsonObject.getInt("inference_end_ms"); - int promptEvalEndMs = jsonObject.getInt("prompt_eval_end_ms"); - tps = (float) numGeneratedTokens / (inferenceEndMs - promptEvalEndMs) * 1000; - } catch (JSONException e) { - Log.e("LLM", "Error parsing JSON: " + e.getMessage()); - } - mResultMessage.setTokensPerSecond(tps); - mMessageAdapter.notifyDataSetChanged(); - } - }); - } - - private void setLocalModel(String modelPath, String tokenizerPath, float temperature) { - Message modelLoadingMessage = new Message("Loading model...", false, MessageType.SYSTEM, 0); - ETLogging.getInstance().log("Loading model " + modelPath + " with tokenizer " + tokenizerPath); - runOnUiThread( - () -> { - mSendButton.setEnabled(false); - mMessageAdapter.add(modelLoadingMessage); - mMessageAdapter.notifyDataSetChanged(); - }); - if (mModule != null) { - ETLogging.getInstance().log("Start deallocating existing module instance"); - mModule.resetNative(); - mModule = null; - ETLogging.getInstance().log("Completed deallocating existing module instance"); - } - long runStartTime = System.currentTimeMillis(); - mModule = - new LlmModule( - ModelUtils.getModelCategory( - mCurrentSettingsFields.getModelType(), mCurrentSettingsFields.getBackendType()), - modelPath, - tokenizerPath, - temperature); - int loadResult = mModule.load(); - long loadDuration = System.currentTimeMillis() - runStartTime; - String modelLoadError = ""; - String modelInfo = ""; - if (loadResult != 0) { - // TODO: Map the error code to a reason to let the user know why model loading failed - modelInfo = "*Model could not load (Error Code: " + loadResult + ")*" + "\n"; - loadDuration = 0; - AlertDialog.Builder builder = new AlertDialog.Builder(this); - builder.setTitle("Load failed: " + loadResult); - runOnUiThread( - () -> { - AlertDialog alert = builder.create(); - alert.show(); - }); - } else { - String[] segments = modelPath.split("/"); - String pteName = segments[segments.length - 1]; - segments = tokenizerPath.split("/"); - String tokenizerName = segments[segments.length - 1]; - modelInfo = - "Successfully loaded model. " - + pteName - + " and tokenizer " - + tokenizerName - + " in " - + (float) loadDuration / 1000 - + " sec." - + " You can send text or image for inference"; - - if (mCurrentSettingsFields.getModelType() == ModelType.LLAVA_1_5) { - ETLogging.getInstance().log("Llava start prefill prompt"); - mModule.resetContext(); - mModule.prefillPrompt(PromptFormat.getLlavaPresetPrompt()); - ETLogging.getInstance().log("Llava completes prefill prompt"); - } - } - - Message modelLoadedMessage = new Message(modelInfo, false, MessageType.SYSTEM, 0); - - String modelLoggingInfo = - modelLoadError - + "Model path: " - + modelPath - + "\nTokenizer path: " - + tokenizerPath - + "\nBackend: " - + mCurrentSettingsFields.getBackendType().toString() - + "\nModelType: " - + ModelUtils.getModelCategory( - mCurrentSettingsFields.getModelType(), mCurrentSettingsFields.getBackendType()) - + "\nTemperature: " - + temperature - + "\nModel loaded time: " - + loadDuration - + " ms"; - ETLogging.getInstance().log("Load complete. " + modelLoggingInfo); - - runOnUiThread( - () -> { - mSendButton.setEnabled(true); - mMessageAdapter.remove(modelLoadingMessage); - mMessageAdapter.add(modelLoadedMessage); - mMessageAdapter.notifyDataSetChanged(); - }); - } - - private void loadLocalModelAndParameters( - String modelFilePath, String tokenizerFilePath, float temperature) { - Runnable runnable = - new Runnable() { - @Override - public void run() { - setLocalModel(modelFilePath, tokenizerFilePath, temperature); - } - }; - new Thread(runnable).start(); - } - - private void populateExistingMessages(String existingMsgJSON) { - Gson gson = new Gson(); - Type type = new TypeToken>() {}.getType(); - ArrayList savedMessages = gson.fromJson(existingMsgJSON, type); - for (Message msg : savedMessages) { - mMessageAdapter.add(msg); - } - mMessageAdapter.notifyDataSetChanged(); - } - - private int setPromptID() { - - return mMessageAdapter.getMaxPromptID() + 1; - } - - @Override - protected void onCreate(Bundle savedInstanceState) { - super.onCreate(savedInstanceState); - setContentView(R.layout.activity_main); - - if (Build.VERSION.SDK_INT >= 21) { - getWindow().setStatusBarColor(ContextCompat.getColor(this, R.color.status_bar)); - getWindow().setNavigationBarColor(ContextCompat.getColor(this, R.color.nav_bar)); - } - - try { - Os.setenv("ADSP_LIBRARY_PATH", getApplicationInfo().nativeLibraryDir, true); - Os.setenv("LD_LIBRARY_PATH", getApplicationInfo().nativeLibraryDir, true); - } catch (ErrnoException e) { - finish(); - } - - mThinkModeButton = requireViewById(R.id.thinkModeButton); - mEditTextMessage = requireViewById(R.id.editTextMessage); - mSendButton = requireViewById(R.id.sendButton); - mSendButton.setEnabled(false); - mMessagesView = requireViewById(R.id.messages_view); - mMessageAdapter = new MessageAdapter(this, R.layout.sent_message, new ArrayList()); - mMessagesView.setAdapter(mMessageAdapter); - mDemoSharedPreferences = new DemoSharedPreferences(this.getApplicationContext()); - String existingMsgJSON = mDemoSharedPreferences.getSavedMessages(); - if (!existingMsgJSON.isEmpty()) { - populateExistingMessages(existingMsgJSON); - promptID = setPromptID(); - } - mSettingsButton = requireViewById(R.id.settings); - mSettingsButton.setOnClickListener( - view -> { - Intent myIntent = new Intent(MainActivity.this, SettingsActivity.class); - MainActivity.this.startActivity(myIntent); - }); - - mThinkModeButton.setOnClickListener( - view -> { - if (mThinkMode) { - mThinkMode = false; - mThinkModeButton.setImageDrawable( - ResourcesCompat.getDrawable( - getResources(), R.drawable.baseline_lightbulb_24, null)); - } else { - mThinkMode = true; - mThinkModeButton.setImageDrawable( - ResourcesCompat.getDrawable(getResources(), R.drawable.blue_lightbulb_24, null)); - } - runOnUiThread( - () -> { - String thinkingModeText = mThinkMode ? "on" : "off"; - mMessageAdapter.add( - new Message( - "Thinking mode is " + thinkingModeText, false, MessageType.SYSTEM, 0)); - mMessageAdapter.notifyDataSetChanged(); - }); - }); - - mCurrentSettingsFields = new SettingsFields(); - mMemoryUpdateHandler = new Handler(Looper.getMainLooper()); - onModelRunStopped(); - setupMediaButton(); - setupGalleryPicker(); - setupCameraRoll(); - startMemoryUpdate(); - setupShowLogsButton(); - executor = Executors.newSingleThreadExecutor(); - } - - @Override - protected void onPause() { - super.onPause(); - mDemoSharedPreferences.addMessages(mMessageAdapter); - } - - @Override - protected void onResume() { - super.onResume(); - // Check for if settings parameters have changed - Gson gson = new Gson(); - String settingsFieldsJSON = mDemoSharedPreferences.getSettings(); - if (!settingsFieldsJSON.isEmpty()) { - SettingsFields updatedSettingsFields = - gson.fromJson(settingsFieldsJSON, SettingsFields.class); - if (updatedSettingsFields == null) { - // Added this check, because gson.fromJson can return null - askUserToSelectModel(); - return; - } - boolean isUpdated = !mCurrentSettingsFields.equals(updatedSettingsFields); - boolean isLoadModel = updatedSettingsFields.getIsLoadModel(); - setBackendMode(updatedSettingsFields.getBackendType()); - if (isUpdated) { - if (isLoadModel) { - // If users change the model file, but not pressing loadModelButton, we won't load the new - // model - checkForUpdateAndReloadModel(updatedSettingsFields); - } else { - askUserToSelectModel(); - } - - checkForClearChatHistory(updatedSettingsFields); - // Update current to point to the latest - mCurrentSettingsFields = new SettingsFields(updatedSettingsFields); - } - } else { - askUserToSelectModel(); - } - } - - private void setBackendMode(BackendType backendType) { - if (backendType.equals(BackendType.XNNPACK) || backendType.equals(BackendType.QUALCOMM)) { - setXNNPACKMode(); - } else if (backendType.equals(BackendType.MEDIATEK)) { - setMediaTekMode(); - } - } - - private void setXNNPACKMode() { - requireViewById(R.id.addMediaButton).setVisibility(View.VISIBLE); - } - - private void setMediaTekMode() { - requireViewById(R.id.addMediaButton).setVisibility(View.GONE); - } - - private void checkForClearChatHistory(SettingsFields updatedSettingsFields) { - if (updatedSettingsFields.getIsClearChatHistory()) { - mMessageAdapter.clear(); - mMessageAdapter.notifyDataSetChanged(); - mDemoSharedPreferences.removeExistingMessages(); - // changing to false since chat history has been cleared. - updatedSettingsFields.saveIsClearChatHistory(false); - mDemoSharedPreferences.addSettings(updatedSettingsFields); - } - } - - private void checkForUpdateAndReloadModel(SettingsFields updatedSettingsFields) { - // TODO need to add 'load model' in settings and queue loading based on that - String modelPath = updatedSettingsFields.getModelFilePath(); - String tokenizerPath = updatedSettingsFields.getTokenizerFilePath(); - double temperature = updatedSettingsFields.getTemperature(); - if (!modelPath.isEmpty() && !tokenizerPath.isEmpty()) { - if (updatedSettingsFields.getIsLoadModel() - || !modelPath.equals(mCurrentSettingsFields.getModelFilePath()) - || !tokenizerPath.equals(mCurrentSettingsFields.getTokenizerFilePath()) - || temperature != mCurrentSettingsFields.getTemperature()) { - loadLocalModelAndParameters( - updatedSettingsFields.getModelFilePath(), - updatedSettingsFields.getTokenizerFilePath(), - (float) updatedSettingsFields.getTemperature()); - updatedSettingsFields.saveLoadModelAction(false); - mDemoSharedPreferences.addSettings(updatedSettingsFields); - } - } else { - askUserToSelectModel(); - } - } - - private void askUserToSelectModel() { - String askLoadModel = - "To get started, select your desired model and tokenizer " + "from the top right corner"; - Message askLoadModelMessage = new Message(askLoadModel, false, MessageType.SYSTEM, 0); - ETLogging.getInstance().log(askLoadModel); - runOnUiThread( - () -> { - mMessageAdapter.add(askLoadModelMessage); - mMessageAdapter.notifyDataSetChanged(); - }); - } - - private void setupShowLogsButton() { - ImageButton showLogsButton = requireViewById(R.id.showLogsButton); - showLogsButton.setOnClickListener( - view -> { - Intent myIntent = new Intent(MainActivity.this, LogsActivity.class); - MainActivity.this.startActivity(myIntent); - }); - } - - private void setupMediaButton() { - mAddMediaLayout = requireViewById(R.id.addMediaLayout); - mAddMediaLayout.setVisibility(View.GONE); // We hide this initially - - ImageButton addMediaButton = requireViewById(R.id.addMediaButton); - addMediaButton.setOnClickListener( - view -> { - mAddMediaLayout.setVisibility(View.VISIBLE); - }); - - mGalleryButton = requireViewById(R.id.galleryButton); - mGalleryButton.setOnClickListener( - view -> { - // Launch the photo picker and let the user choose only images. - mPickGallery.launch( - new PickVisualMediaRequest.Builder() - .setMediaType(ActivityResultContracts.PickVisualMedia.ImageOnly.INSTANCE) - .build()); - }); - mCameraButton = requireViewById(R.id.cameraButton); - mCameraButton.setOnClickListener( - view -> { - Log.d("CameraRoll", "Check permission"); - if (ContextCompat.checkSelfPermission(MainActivity.this, Manifest.permission.CAMERA) - != PackageManager.PERMISSION_GRANTED) { - ActivityCompat.requestPermissions( - MainActivity.this, - new String[] {Manifest.permission.CAMERA}, - REQUEST_IMAGE_CAPTURE); - } else { - launchCamera(); - } - }); - } - - private void setupCameraRoll() { - // Registers a camera roll activity launcher. - mCameraRoll = - registerForActivityResult( - new ActivityResultContracts.TakePicture(), - result -> { - if (result && cameraImageUri != null) { - Log.d("CameraRoll", "Photo saved to uri: " + cameraImageUri); - mAddMediaLayout.setVisibility(View.GONE); - List uris = new ArrayList<>(); - uris.add(cameraImageUri); - showMediaPreview(uris); - } else { - // Delete the temp image file based on the url since the photo is not successfully - // taken - if (cameraImageUri != null) { - ContentResolver contentResolver = MainActivity.this.getContentResolver(); - contentResolver.delete(cameraImageUri, null, null); - Log.d("CameraRoll", "No photo taken. Delete temp uri"); - } - } - }); - mMediaPreviewConstraintLayout = requireViewById(R.id.mediaPreviewConstraintLayout); - ImageButton mediaPreviewCloseButton = requireViewById(R.id.mediaPreviewCloseButton); - mediaPreviewCloseButton.setOnClickListener( - view -> { - mMediaPreviewConstraintLayout.setVisibility(View.GONE); - mSelectedImageUri = null; - }); - - ImageButton addMoreImageButton = requireViewById(R.id.addMoreImageButton); - addMoreImageButton.setOnClickListener( - view -> { - Log.d("addMore", "clicked"); - mMediaPreviewConstraintLayout.setVisibility(View.GONE); - // Direct user to select type of input - mCameraButton.callOnClick(); - }); - } - - private String updateMemoryUsage() { - ActivityManager.MemoryInfo memoryInfo = new ActivityManager.MemoryInfo(); - ActivityManager activityManager = (ActivityManager) getSystemService(ACTIVITY_SERVICE); - if (activityManager == null) { - return "---"; - } - activityManager.getMemoryInfo(memoryInfo); - long totalMem = memoryInfo.totalMem / (1024 * 1024); - long availableMem = memoryInfo.availMem / (1024 * 1024); - long usedMem = totalMem - availableMem; - return usedMem + "MB"; - } - - private void startMemoryUpdate() { - mMemoryView = requireViewById(R.id.ram_usage_live); - memoryUpdater = - new Runnable() { - @Override - public void run() { - mMemoryView.setText(updateMemoryUsage()); - mMemoryUpdateHandler.postDelayed(this, 1000); - } - }; - mMemoryUpdateHandler.post(memoryUpdater); - } - - @Override - public void onRequestPermissionsResult( - int requestCode, @NonNull String[] permissions, @NonNull int[] grantResults) { - super.onRequestPermissionsResult(requestCode, permissions, grantResults); - if (requestCode == REQUEST_IMAGE_CAPTURE && grantResults.length != 0) { - if (grantResults[0] == PackageManager.PERMISSION_GRANTED) { - launchCamera(); - } else if (grantResults[0] == PackageManager.PERMISSION_DENIED) { - Log.d("CameraRoll", "Permission denied"); - } - } - } - - private void launchCamera() { - ContentValues values = new ContentValues(); - values.put(MediaStore.Images.Media.TITLE, "New Picture"); - values.put(MediaStore.Images.Media.DESCRIPTION, "From Camera"); - values.put(MediaStore.Images.Media.RELATIVE_PATH, "DCIM/Camera/"); - cameraImageUri = - MainActivity.this - .getContentResolver() - .insert(MediaStore.Images.Media.EXTERNAL_CONTENT_URI, values); - mCameraRoll.launch(cameraImageUri); - } - - private void setupGalleryPicker() { - // Registers a photo picker activity launcher in single-select mode. - mPickGallery = - registerForActivityResult( - new ActivityResultContracts.PickMultipleVisualMedia(MAX_NUM_OF_IMAGES), - uris -> { - if (!uris.isEmpty()) { - Log.d("PhotoPicker", "Selected URIs: " + uris); - mAddMediaLayout.setVisibility(View.GONE); - for (Uri uri : uris) { - MainActivity.this - .getContentResolver() - .takePersistableUriPermission(uri, Intent.FLAG_GRANT_READ_URI_PERMISSION); - } - showMediaPreview(uris); - } else { - Log.d("PhotoPicker", "No media selected"); - } - }); - - mMediaPreviewConstraintLayout = requireViewById(R.id.mediaPreviewConstraintLayout); - ImageButton mediaPreviewCloseButton = requireViewById(R.id.mediaPreviewCloseButton); - mediaPreviewCloseButton.setOnClickListener( - view -> { - mMediaPreviewConstraintLayout.setVisibility(View.GONE); - mSelectedImageUri = null; - }); - - ImageButton addMoreImageButton = requireViewById(R.id.addMoreImageButton); - addMoreImageButton.setOnClickListener( - view -> { - Log.d("addMore", "clicked"); - mMediaPreviewConstraintLayout.setVisibility(View.GONE); - mGalleryButton.callOnClick(); - }); - } - - private List getProcessedImagesForModel(List uris) { - List imageList = new ArrayList<>(); - if (uris != null) { - uris.forEach( - (uri) -> { - imageList.add(new ETImage(this.getContentResolver(), uri)); - }); - } - return imageList; - } - - private void showMediaPreview(List uris) { - if (mSelectedImageUri == null) { - mSelectedImageUri = uris; - } else { - mSelectedImageUri.addAll(uris); - } - - if (mSelectedImageUri.size() > MAX_NUM_OF_IMAGES) { - mSelectedImageUri = mSelectedImageUri.subList(0, MAX_NUM_OF_IMAGES); - Toast.makeText( - this, "Only max " + MAX_NUM_OF_IMAGES + " images are allowed", Toast.LENGTH_SHORT) - .show(); - } - Log.d("mSelectedImageUri", mSelectedImageUri.size() + " " + mSelectedImageUri); - - mMediaPreviewConstraintLayout.setVisibility(View.VISIBLE); - - List imageViews = new ArrayList(); - - // Pre-populate all the image views that are available from the layout (currently max 5) - imageViews.add(requireViewById(R.id.mediaPreviewImageView1)); - imageViews.add(requireViewById(R.id.mediaPreviewImageView2)); - imageViews.add(requireViewById(R.id.mediaPreviewImageView3)); - imageViews.add(requireViewById(R.id.mediaPreviewImageView4)); - imageViews.add(requireViewById(R.id.mediaPreviewImageView5)); - - // Hide all the image views (reset state) - for (int i = 0; i < imageViews.size(); i++) { - imageViews.get(i).setVisibility(View.GONE); - } - - // Only show/render those that have proper Image URIs - for (int i = 0; i < mSelectedImageUri.size(); i++) { - imageViews.get(i).setVisibility(View.VISIBLE); - imageViews.get(i).setImageURI(mSelectedImageUri.get(i)); - } - - // For LLava, we want to call prefill_image as soon as an image is selected - // Llava only support 1 image for now - if (mCurrentSettingsFields.getModelType() == ModelType.LLAVA_1_5) { - List processedImageList = getProcessedImagesForModel(mSelectedImageUri); - if (!processedImageList.isEmpty()) { - mMessageAdapter.add( - new Message("Llava - Starting image Prefill.", false, MessageType.SYSTEM, 0)); - mMessageAdapter.notifyDataSetChanged(); - Runnable runnable = - () -> { - Process.setThreadPriority(Process.THREAD_PRIORITY_MORE_FAVORABLE); - ETLogging.getInstance().log("Starting runnable prefill image"); - ETImage img = processedImageList.get(0); - ETLogging.getInstance().log("Llava start prefill image"); - mModule.prefillImages( - img.getInts(), - img.getWidth(), - img.getHeight(), - ModelUtils.VISION_MODEL_IMAGE_CHANNELS); - }; - executor.execute(runnable); - } - } - } - - private void addSelectedImagesToChatThread(List selectedImageUri) { - if (selectedImageUri == null) { - return; - } - mMediaPreviewConstraintLayout.setVisibility(View.GONE); - for (int i = 0; i < selectedImageUri.size(); i++) { - Uri imageURI = selectedImageUri.get(i); - Log.d("image uri ", "test " + imageURI.getPath()); - mMessageAdapter.add(new Message(imageURI.toString(), true, MessageType.IMAGE, 0)); - } - mMessageAdapter.notifyDataSetChanged(); - } - - private String getConversationHistory() { - String conversationHistory = ""; - - ArrayList conversations = - mMessageAdapter.getRecentSavedTextMessages(CONVERSATION_HISTORY_MESSAGE_LOOKBACK); - if (conversations.isEmpty()) { - return conversationHistory; - } - - int prevPromptID = conversations.get(0).getPromptID(); - String conversationFormat = - PromptFormat.getConversationFormat(mCurrentSettingsFields.getModelType()); - String format = conversationFormat; - for (int i = 0; i < conversations.size(); i++) { - Message conversation = conversations.get(i); - int currentPromptID = conversation.getPromptID(); - if (currentPromptID != prevPromptID) { - conversationHistory = conversationHistory + format; - format = conversationFormat; - prevPromptID = currentPromptID; - } - if (conversation.getIsSent()) { - format = - format - .replace(PromptFormat.USER_PLACEHOLDER, conversation.getText()) - .replace(PromptFormat.THINKING_MODE_PLACEHOLDER, ""); - } else { - format = format.replace(PromptFormat.ASSISTANT_PLACEHOLDER, conversation.getText()); - } - } - conversationHistory = conversationHistory + format; - - return conversationHistory; - } - - private String getTotalFormattedPrompt(String conversationHistory, String rawPrompt) { - if (conversationHistory.isEmpty()) { - return mCurrentSettingsFields.getFormattedSystemAndUserPrompt(rawPrompt, mThinkMode); - } - - return mCurrentSettingsFields.getFormattedSystemPrompt() - + conversationHistory - + mCurrentSettingsFields.getFormattedUserPrompt(rawPrompt, mThinkMode); - } - - private void onModelRunStarted() { - mSendButton.setClickable(false); - mSendButton.setImageResource(R.drawable.baseline_stop_24); - mSendButton.setOnClickListener( - view -> { - mModule.stop(); - }); - } - - private void onModelRunStopped() { - mSendButton.setClickable(true); - mSendButton.setImageResource(R.drawable.baseline_send_24); - mSendButton.setOnClickListener( - view -> { - try { - InputMethodManager imm = (InputMethodManager) getSystemService(INPUT_METHOD_SERVICE); - imm.hideSoftInputFromWindow(getCurrentFocus().getWindowToken(), 0); - } catch (Exception e) { - ETLogging.getInstance().log("Keyboard dismissal error: " + e.getMessage()); - } - addSelectedImagesToChatThread(mSelectedImageUri); - String finalPrompt; - String rawPrompt = mEditTextMessage.getText().toString(); - if (ModelUtils.getModelCategory( - mCurrentSettingsFields.getModelType(), mCurrentSettingsFields.getBackendType()) - == ModelUtils.VISION_MODEL) { - finalPrompt = - mCurrentSettingsFields.getFormattedSystemAndUserPrompt(rawPrompt, mThinkMode); - } else { - finalPrompt = getTotalFormattedPrompt(getConversationHistory(), rawPrompt); - } - // We store raw prompt into message adapter, because we don't want to show the extra - // tokens from system prompt - mMessageAdapter.add(new Message(rawPrompt, true, MessageType.TEXT, promptID)); - mMessageAdapter.notifyDataSetChanged(); - mEditTextMessage.setText(""); - mResultMessage = new Message("", false, MessageType.TEXT, promptID); - mMessageAdapter.add(mResultMessage); - // Scroll to bottom of the list - mMessagesView.smoothScrollToPosition(mMessageAdapter.getCount() - 1); - // After images are added to prompt and chat thread, we clear the imageURI list - // Note: This has to be done after imageURIs are no longer needed by LlmModule - mSelectedImageUri = null; - promptID++; - Runnable runnable = - new Runnable() { - @Override - public void run() { - Process.setThreadPriority(Process.THREAD_PRIORITY_MORE_FAVORABLE); - ETLogging.getInstance().log("starting runnable generate()"); - runOnUiThread( - new Runnable() { - @Override - public void run() { - onModelRunStarted(); - } - }); - long generateStartTime = System.currentTimeMillis(); - if (ModelUtils.getModelCategory( - mCurrentSettingsFields.getModelType(), - mCurrentSettingsFields.getBackendType()) - == ModelUtils.VISION_MODEL) { - mModule.generate( - finalPrompt, ModelUtils.VISION_MODEL_SEQ_LEN, MainActivity.this, false); - } else if (mCurrentSettingsFields.getModelType() == ModelType.LLAMA_GUARD_3) { - String llamaGuardPromptForClassification = - PromptFormat.getFormattedLlamaGuardPrompt(rawPrompt); - ETLogging.getInstance() - .log("Running inference.. prompt=" + llamaGuardPromptForClassification); - mModule.generate( - llamaGuardPromptForClassification, - llamaGuardPromptForClassification.length() + 64, - MainActivity.this, - false); - } else { - ETLogging.getInstance().log("Running inference.. prompt=" + finalPrompt); - mModule.generate( - finalPrompt, - (int) (finalPrompt.length() * 0.75) + 64, - MainActivity.this, - false); - } - - long generateDuration = System.currentTimeMillis() - generateStartTime; - mResultMessage.setTotalGenerationTime(generateDuration); - runOnUiThread( - new Runnable() { - @Override - public void run() { - onModelRunStopped(); - } - }); - ETLogging.getInstance().log("Inference completed"); - } - }; - executor.execute(runnable); - }); - mMessageAdapter.notifyDataSetChanged(); - } - - @Override - public void run() { - runOnUiThread( - new Runnable() { - @Override - public void run() { - mMessageAdapter.notifyDataSetChanged(); - } - }); - } - - @Override - public void onBackPressed() { - super.onBackPressed(); - if (mAddMediaLayout != null && mAddMediaLayout.getVisibility() == View.VISIBLE) { - mAddMediaLayout.setVisibility(View.GONE); - } else { - // Default behavior of back button - finish(); - } - } - - @Override - protected void onDestroy() { - super.onDestroy(); - mMemoryUpdateHandler.removeCallbacks(memoryUpdater); - // This is to cover the case where the app is shutdown when user is on MainActivity but - // never clicked on the logsActivity - ETLogging.getInstance().saveLogs(); - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/Message.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/Message.java deleted file mode 100644 index b2e5380e2a5..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/Message.java +++ /dev/null @@ -1,94 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -import java.text.SimpleDateFormat; -import java.util.Date; -import java.util.Locale; - -public class Message { - private String text; - private final boolean isSent; - private float tokensPerSecond; - private long totalGenerationTime; - private final long timestamp; - private final MessageType messageType; - private String imagePath; - private final int promptID; - - private static final String TIMESTAMP_FORMAT = "hh:mm a"; // example: 2:23 PM - - public Message(String text, boolean isSent, MessageType messageType, int promptID) { - this.isSent = isSent; - this.messageType = messageType; - this.promptID = promptID; - - if (messageType == MessageType.IMAGE) { - this.imagePath = text; - } else { - this.text = text; - } - - if (messageType != MessageType.SYSTEM) { - this.timestamp = System.currentTimeMillis(); - } else { - this.timestamp = (long) 0; - } - } - - public int getPromptID() { - return promptID; - } - - public MessageType getMessageType() { - return messageType; - } - - public String getImagePath() { - return imagePath; - } - - public String getText() { - return text; - } - - public void appendText(String text) { - this.text += text; - } - - public boolean getIsSent() { - return isSent; - } - - public void setTokensPerSecond(float tokensPerSecond) { - this.tokensPerSecond = tokensPerSecond; - } - - public void setTotalGenerationTime(long totalGenerationTime) { - this.totalGenerationTime = totalGenerationTime; - } - - public float getTokensPerSecond() { - return tokensPerSecond; - } - - public long getTotalGenerationTime() { - return totalGenerationTime; - } - - public long getTimestamp() { - return timestamp; - } - - public String getFormattedTimestamp() { - SimpleDateFormat formatter = new SimpleDateFormat(TIMESTAMP_FORMAT, Locale.getDefault()); - Date date = new Date(timestamp); - return formatter.format(date); - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/MessageAdapter.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/MessageAdapter.java deleted file mode 100644 index 31aaa9a1d5f..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/MessageAdapter.java +++ /dev/null @@ -1,135 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -import android.net.Uri; -import android.view.LayoutInflater; -import android.view.View; -import android.view.ViewGroup; -import android.widget.ArrayAdapter; -import android.widget.ImageView; -import android.widget.TextView; -import java.util.ArrayList; -import java.util.Collections; - -public class MessageAdapter extends ArrayAdapter { - - private final ArrayList savedMessages; - - public MessageAdapter( - android.content.Context context, int resource, ArrayList savedMessages) { - super(context, resource); - this.savedMessages = savedMessages; - } - - @Override - public View getView(int position, View convertView, ViewGroup parent) { - Message currentMessage = getItem(position); - int layoutIdForListItem; - - if (currentMessage.getMessageType() == MessageType.SYSTEM) { - layoutIdForListItem = R.layout.system_message; - } else { - layoutIdForListItem = - currentMessage.getIsSent() ? R.layout.sent_message : R.layout.received_message; - } - View listItemView = - LayoutInflater.from(getContext()).inflate(layoutIdForListItem, parent, false); - if (currentMessage.getMessageType() == MessageType.IMAGE) { - ImageView messageImageView = listItemView.requireViewById(R.id.message_image); - messageImageView.setImageURI(Uri.parse(currentMessage.getImagePath())); - TextView messageTextView = listItemView.requireViewById(R.id.message_text); - messageTextView.setVisibility(View.GONE); - } else { - TextView messageTextView = listItemView.requireViewById(R.id.message_text); - messageTextView.setText(currentMessage.getText()); - } - - String metrics = ""; - TextView tokensView; - if (currentMessage.getTokensPerSecond() > 0) { - metrics = String.format("%.2f", currentMessage.getTokensPerSecond()) + "t/s "; - } - - if (currentMessage.getTotalGenerationTime() > 0) { - metrics = metrics + (float) currentMessage.getTotalGenerationTime() / 1000 + "s "; - } - - if (currentMessage.getTokensPerSecond() > 0 || currentMessage.getTotalGenerationTime() > 0) { - tokensView = listItemView.requireViewById(R.id.generation_metrics); - tokensView.setText(metrics); - TextView separatorView = listItemView.requireViewById(R.id.bar); - separatorView.setVisibility(View.VISIBLE); - } - - if (currentMessage.getTimestamp() > 0) { - TextView timestampView = listItemView.requireViewById(R.id.timestamp); - timestampView.setText(currentMessage.getFormattedTimestamp()); - } - - return listItemView; - } - - @Override - public void add(Message msg) { - super.add(msg); - savedMessages.add(msg); - } - - @Override - public void clear() { - super.clear(); - savedMessages.clear(); - } - - public ArrayList getSavedMessages() { - return savedMessages; - } - - public ArrayList getRecentSavedTextMessages(int numOfLatestPromptMessages) { - ArrayList recentMessages = new ArrayList(); - int lastIndex = savedMessages.size() - 1; - // In most cases lastIndex >=0 . - // A situation where the user clears chat history and enters prompt. Causes lastIndex=-1 . - if (lastIndex >= 0) { - Message messageToAdd = savedMessages.get(lastIndex); - int oldPromptID = messageToAdd.getPromptID(); - - for (int i = 0; i < savedMessages.size(); i++) { - messageToAdd = savedMessages.get(lastIndex - i); - if (messageToAdd.getMessageType() != MessageType.SYSTEM) { - if (messageToAdd.getPromptID() != oldPromptID) { - numOfLatestPromptMessages--; - oldPromptID = messageToAdd.getPromptID(); - } - if (numOfLatestPromptMessages > 0) { - if (messageToAdd.getMessageType() == MessageType.TEXT) { - recentMessages.add(messageToAdd); - } - } else { - break; - } - } - } - // To place the order in [input1, output1, input2, output2...] - Collections.reverse(recentMessages); - } - - return recentMessages; - } - - public int getMaxPromptID() { - int maxPromptID = -1; - for (Message msg : savedMessages) { - - maxPromptID = Math.max(msg.getPromptID(), maxPromptID); - } - return maxPromptID; - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/MessageType.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/MessageType.java deleted file mode 100644 index 6042acb5726..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/MessageType.java +++ /dev/null @@ -1,15 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -public enum MessageType { - TEXT, - IMAGE, - SYSTEM -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelRunner.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelRunner.java deleted file mode 100644 index a1bc205c4ac..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelRunner.java +++ /dev/null @@ -1,109 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -import android.os.Handler; -import android.os.HandlerThread; -import android.os.Looper; -import android.os.Message; -import androidx.annotation.NonNull; -import org.json.JSONException; -import org.json.JSONObject; -import org.pytorch.executorch.extension.llm.LlmCallback; -import org.pytorch.executorch.extension.llm.LlmModule; - -/** A helper class to handle all model running logic within this class. */ -public class ModelRunner implements LlmCallback { - LlmModule mModule = null; - - String mModelFilePath = ""; - String mTokenizerFilePath = ""; - - ModelRunnerCallback mCallback = null; - - HandlerThread mHandlerThread = null; - Handler mHandler = null; - - /** - * ] Helper class to separate between UI logic and model runner logic. Automatically handle - * generate() request on worker thread. - * - * @param modelFilePath - * @param tokenizerFilePath - * @param callback - */ - ModelRunner( - String modelFilePath, - String tokenizerFilePath, - float temperature, - ModelRunnerCallback callback) { - mModelFilePath = modelFilePath; - mTokenizerFilePath = tokenizerFilePath; - mCallback = callback; - - mModule = new LlmModule(mModelFilePath, mTokenizerFilePath, 0.8f); - mHandlerThread = new HandlerThread("ModelRunner"); - mHandlerThread.start(); - mHandler = new ModelRunnerHandler(mHandlerThread.getLooper(), this); - - mHandler.sendEmptyMessage(ModelRunnerHandler.MESSAGE_LOAD_MODEL); - } - - int generate(String prompt) { - Message msg = Message.obtain(mHandler, ModelRunnerHandler.MESSAGE_GENERATE, prompt); - msg.sendToTarget(); - return 0; - } - - void stop() { - mModule.stop(); - } - - @Override - public void onResult(String result) { - mCallback.onTokenGenerated(result); - } - - @Override - public void onStats(String stats) { - float tps = 0; - try { - JSONObject jsonObject = new JSONObject(stats); - int numGeneratedTokens = jsonObject.getInt("generated_tokens"); - int inferenceEndMs = jsonObject.getInt("inference_end_ms"); - int promptEvalEndMs = jsonObject.getInt("prompt_eval_end_ms"); - tps = (float) numGeneratedTokens / (inferenceEndMs - promptEvalEndMs) * 1000; - } catch (JSONException e) { - } - mCallback.onStats("tokens/second: " + tps); - } -} - -class ModelRunnerHandler extends Handler { - public static int MESSAGE_LOAD_MODEL = 1; - public static int MESSAGE_GENERATE = 2; - - private final ModelRunner mModelRunner; - - public ModelRunnerHandler(Looper looper, ModelRunner modelRunner) { - super(looper); - mModelRunner = modelRunner; - } - - @Override - public void handleMessage(@NonNull android.os.Message msg) { - if (msg.what == MESSAGE_LOAD_MODEL) { - int status = mModelRunner.mModule.load(); - mModelRunner.mCallback.onModelLoaded(status); - } else if (msg.what == MESSAGE_GENERATE) { - mModelRunner.mModule.generate((String) msg.obj, mModelRunner); - mModelRunner.mCallback.onGenerationStopped(); - } - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelRunnerCallback.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelRunnerCallback.java deleted file mode 100644 index 5e8b6f00e3d..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelRunnerCallback.java +++ /dev/null @@ -1,24 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -/** - * A helper interface within the app for MainActivity and Benchmarking to handle callback from - * ModelRunner. - */ -public interface ModelRunnerCallback { - - void onModelLoaded(int status); - - void onTokenGenerated(String token); - - void onStats(String stats); - - void onGenerationStopped(); -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelType.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelType.java deleted file mode 100644 index 9f8132504ea..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelType.java +++ /dev/null @@ -1,18 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -public enum ModelType { - LLAMA_3, - LLAMA_3_1, - LLAMA_3_2, - LLAVA_1_5, - LLAMA_GUARD_3, - QWEN_3, -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelUtils.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelUtils.java deleted file mode 100644 index cf7ab1756ce..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/ModelUtils.java +++ /dev/null @@ -1,47 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -public class ModelUtils { - // XNNPACK or QNN - static final int TEXT_MODEL = 1; - - // XNNPACK - static final int VISION_MODEL = 2; - static final int VISION_MODEL_IMAGE_CHANNELS = 3; - static final int VISION_MODEL_SEQ_LEN = 768; - static final int TEXT_MODEL_SEQ_LEN = 256; - - // MediaTek - static final int MEDIATEK_TEXT_MODEL = 3; - - // QNN static llama - static final int QNN_TEXT_MODEL = 4; - - public static int getModelCategory(ModelType modelType, BackendType backendType) { - if (backendType.equals(BackendType.XNNPACK)) { - switch (modelType) { - case LLAVA_1_5: - return VISION_MODEL; - case LLAMA_3: - case LLAMA_3_1: - case LLAMA_3_2: - case QWEN_3: - default: - return TEXT_MODEL; - } - } else if (backendType.equals(BackendType.MEDIATEK)) { - return MEDIATEK_TEXT_MODEL; - } else if (backendType.equals(BackendType.QUALCOMM)) { - return QNN_TEXT_MODEL; - } - - return TEXT_MODEL; // default - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/PromptFormat.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/PromptFormat.java deleted file mode 100644 index 524ad7cbf6d..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/PromptFormat.java +++ /dev/null @@ -1,162 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -public class PromptFormat { - - public static final String SYSTEM_PLACEHOLDER = "{{ system_prompt }}"; - public static final String USER_PLACEHOLDER = "{{ user_prompt }}"; - public static final String ASSISTANT_PLACEHOLDER = "{{ assistant_response }}"; - public static final String THINKING_MODE_PLACEHOLDER = "{{ thinking_mode }}"; - public static final String DEFAULT_SYSTEM_PROMPT = "Answer the questions in a few sentences"; - - public static String getSystemPromptTemplate(ModelType modelType) { - switch (modelType) { - case LLAMA_3: - case LLAMA_3_1: - case LLAMA_3_2: - return "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n" - + SYSTEM_PLACEHOLDER - + "<|eot_id|>"; - case LLAVA_1_5: - return "USER: "; - case QWEN_3: - return "<|im_start|>system\n" + "You are a helpful assistant.\n" + "<|im_end|>\n"; - default: - return SYSTEM_PLACEHOLDER; - } - } - - public static String getUserPromptTemplate(ModelType modelType, boolean thinkingMode) { - switch (modelType) { - case LLAMA_3: - case LLAMA_3_1: - case LLAMA_3_2: - case LLAMA_GUARD_3: - return "<|start_header_id|>user<|end_header_id|>\n" - + USER_PLACEHOLDER - + "<|eot_id|>" - + "<|start_header_id|>assistant<|end_header_id|>"; - - case QWEN_3: - return "<|im_start|>user\n" - + USER_PLACEHOLDER - + "\n<|im_end|>\n" - + "<|im_start|>assistant\n" - + THINKING_MODE_PLACEHOLDER; - case LLAVA_1_5: - default: - return USER_PLACEHOLDER; - } - } - - public static String getConversationFormat(ModelType modelType) { - switch (modelType) { - case LLAMA_3: - case LLAMA_3_1: - case LLAMA_3_2: - return getUserPromptTemplate(modelType, false) - + "\n" - + ASSISTANT_PLACEHOLDER - + "<|eot_id|>"; - case LLAVA_1_5: - return USER_PLACEHOLDER + " ASSISTANT:"; - case QWEN_3: - return getUserPromptTemplate(modelType, false) + "<|im_end|>\n"; - default: - return USER_PLACEHOLDER; - } - } - - public static String getStopToken(ModelType modelType) { - switch (modelType) { - case LLAMA_3: - case LLAMA_3_1: - case LLAMA_3_2: - case LLAMA_GUARD_3: - return "<|eot_id|>"; - case LLAVA_1_5: - return "
"; - case QWEN_3: - return "<|endoftext|>"; - default: - return ""; - } - } - - public static String getThinkingModeToken(ModelType modelType, boolean thinkingMode) { - switch (modelType) { - case QWEN_3: - return thinkingMode ? "" : "\n\n\n\n\n"; - default: - return ""; - } - } - - public static String replaceSpecialToken(ModelType modelType, String token) { - switch (modelType) { - case QWEN_3: - switch (token) { - case "<|im_end|>": - return ""; - case "": - return "Thinking...\n"; - case "": - return "\nDone thinking"; - } - default: - return token; - } - } - - public static String getLlavaPresetPrompt() { - return "A chat between a curious human and an artificial intelligence assistant. The assistant" - + " gives helpful, detailed, and polite answers to the human's questions. USER: "; - } - - public static String getFormattedLlamaGuardPrompt(String userPrompt) { - return getUserPromptTemplate(ModelType.LLAMA_GUARD_3, false) - .replace( - USER_PLACEHOLDER, getLlamaGuardPresetPrompt().replace(USER_PLACEHOLDER, userPrompt)); - } - - public static String getLlamaGuardPresetPrompt() { - return "Task: Check if there is unsafe content in 'User' messages in conversations" - + " according our safety policy with the below categories.\n" - + "\n" - + "\n" - + "S1: Violent Crimes. \n" - + "S2: Non-Violent Crimes. \n" - + "S3: Sex Crimes. \n" - + "S4: Child Exploitation. \n" - + "S5: Defamation. \n" - + "S6: Specialized Advice. \n" - + "S7: Privacy. \n" - + "S8: Intellectual Property. \n" - + "S9: Indiscriminate Weapons. \n" - + "S10: Hate. \n" - + "S11: Self-Harm. \n" - + "S12: Sexual Content. \n" - + "S13: Elections. \n" - + "S14: Code Interpreter Abuse.\n" - + "\n" - + "\n" - + "\n" - + "\n" - + "User: " - + USER_PLACEHOLDER - + "\n" - + "\n" - + "Provide your safety assessment for ONLY THE LAST User message in the above" - + " conversation:\n" - + " - First line must read 'safe' or 'unsafe'.\n" - + " - If unsafe, a second line must include a comma-separated list of violated" - + " categories."; - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/SettingsActivity.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/SettingsActivity.java deleted file mode 100644 index 0e388a5b0a4..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/SettingsActivity.java +++ /dev/null @@ -1,463 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -import android.app.AlertDialog; -import android.content.DialogInterface; -import android.os.Build; -import android.os.Bundle; -import android.text.Editable; -import android.text.TextWatcher; -import android.view.View; -import android.widget.Button; -import android.widget.EditText; -import android.widget.ImageButton; -import android.widget.TextView; -import androidx.appcompat.app.AppCompatActivity; -import androidx.core.content.ContextCompat; -import androidx.core.graphics.Insets; -import androidx.core.view.ViewCompat; -import androidx.core.view.WindowInsetsCompat; -import com.google.gson.Gson; -import java.io.File; -import java.util.ArrayList; -import java.util.Arrays; -import java.util.List; - -public class SettingsActivity extends AppCompatActivity { - - private String mModelFilePath = ""; - private String mTokenizerFilePath = ""; - private TextView mBackendTextView; - private TextView mModelTextView; - private TextView mTokenizerTextView; - private TextView mModelTypeTextView; - private EditText mSystemPromptEditText; - private EditText mUserPromptEditText; - private Button mLoadModelButton; - private double mSetTemperature; - private String mSystemPrompt; - private String mUserPrompt; - private BackendType mBackendType; - private ModelType mModelType; - public SettingsFields mSettingsFields; - - private DemoSharedPreferences mDemoSharedPreferences; - public static double TEMPERATURE_MIN_VALUE = 0.0; - - @Override - protected void onCreate(Bundle savedInstanceState) { - super.onCreate(savedInstanceState); - setContentView(R.layout.activity_settings); - if (Build.VERSION.SDK_INT >= 21) { - getWindow().setStatusBarColor(ContextCompat.getColor(this, R.color.status_bar)); - getWindow().setNavigationBarColor(ContextCompat.getColor(this, R.color.nav_bar)); - } - ViewCompat.setOnApplyWindowInsetsListener( - requireViewById(R.id.main), - (v, insets) -> { - Insets systemBars = insets.getInsets(WindowInsetsCompat.Type.systemBars()); - v.setPadding(systemBars.left, systemBars.top, systemBars.right, systemBars.bottom); - return insets; - }); - mDemoSharedPreferences = new DemoSharedPreferences(getBaseContext()); - mSettingsFields = new SettingsFields(); - setupSettings(); - } - - private void setupSettings() { - mBackendTextView = requireViewById(R.id.backendTextView); - mModelTextView = requireViewById(R.id.modelTextView); - mTokenizerTextView = requireViewById(R.id.tokenizerTextView); - mModelTypeTextView = requireViewById(R.id.modelTypeTextView); - ImageButton backendImageButton = requireViewById(R.id.backendImageButton); - ImageButton modelImageButton = requireViewById(R.id.modelImageButton); - ImageButton tokenizerImageButton = requireViewById(R.id.tokenizerImageButton); - ImageButton modelTypeImageButton = requireViewById(R.id.modelTypeImageButton); - mSystemPromptEditText = requireViewById(R.id.systemPromptText); - mUserPromptEditText = requireViewById(R.id.userPromptText); - loadSettings(); - - // TODO: The two setOnClickListeners will be removed after file path issue is resolved - backendImageButton.setOnClickListener( - view -> { - setupBackendSelectorDialog(); - }); - modelImageButton.setOnClickListener( - view -> { - setupModelSelectorDialog(); - }); - tokenizerImageButton.setOnClickListener( - view -> { - setupTokenizerSelectorDialog(); - }); - modelTypeImageButton.setOnClickListener( - view -> { - setupModelTypeSelectorDialog(); - }); - mModelFilePath = mSettingsFields.getModelFilePath(); - if (!mModelFilePath.isEmpty()) { - mModelTextView.setText(getFilenameFromPath(mModelFilePath)); - } - mTokenizerFilePath = mSettingsFields.getTokenizerFilePath(); - if (!mTokenizerFilePath.isEmpty()) { - mTokenizerTextView.setText(getFilenameFromPath(mTokenizerFilePath)); - } - mModelType = mSettingsFields.getModelType(); - ETLogging.getInstance().log("mModelType from settings " + mModelType); - if (mModelType != null) { - mModelTypeTextView.setText(mModelType.toString()); - } - mBackendType = mSettingsFields.getBackendType(); - ETLogging.getInstance().log("mBackendType from settings " + mBackendType); - if (mBackendType != null) { - mBackendTextView.setText(mBackendType.toString()); - setBackendSettingMode(); - } - - setupParameterSettings(); - setupPromptSettings(); - setupClearChatHistoryButton(); - setupLoadModelButton(); - } - - private void setupLoadModelButton() { - mLoadModelButton = requireViewById(R.id.loadModelButton); - mLoadModelButton.setEnabled(true); - mLoadModelButton.setOnClickListener( - view -> { - new AlertDialog.Builder(this) - .setTitle("Load Model") - .setMessage("Do you really want to load the new model?") - .setIcon(android.R.drawable.ic_dialog_alert) - .setPositiveButton( - android.R.string.yes, - new DialogInterface.OnClickListener() { - public void onClick(DialogInterface dialog, int whichButton) { - mSettingsFields.saveLoadModelAction(true); - mLoadModelButton.setEnabled(false); - onBackPressed(); - } - }) - .setNegativeButton(android.R.string.no, null) - .show(); - }); - } - - private void setupClearChatHistoryButton() { - Button clearChatButton = requireViewById(R.id.clearChatButton); - clearChatButton.setOnClickListener( - view -> { - new AlertDialog.Builder(this) - .setTitle("Delete Chat History") - .setMessage("Do you really want to delete chat history?") - .setIcon(android.R.drawable.ic_dialog_alert) - .setPositiveButton( - android.R.string.yes, - new DialogInterface.OnClickListener() { - public void onClick(DialogInterface dialog, int whichButton) { - mSettingsFields.saveIsClearChatHistory(true); - } - }) - .setNegativeButton(android.R.string.no, null) - .show(); - }); - } - - private void setupParameterSettings() { - setupTemperatureSettings(); - } - - private void setupTemperatureSettings() { - mSetTemperature = mSettingsFields.getTemperature(); - EditText temperatureEditText = requireViewById(R.id.temperatureEditText); - temperatureEditText.setText(String.valueOf(mSetTemperature)); - temperatureEditText.addTextChangedListener( - new TextWatcher() { - @Override - public void beforeTextChanged(CharSequence s, int start, int count, int after) {} - - @Override - public void onTextChanged(CharSequence s, int start, int before, int count) {} - - @Override - public void afterTextChanged(Editable s) { - mSetTemperature = Double.parseDouble(s.toString()); - // This is needed because temperature is changed together with model loading - // Once temperature is no longer in LlmModule constructor, we can remove this - mSettingsFields.saveLoadModelAction(true); - saveSettings(); - } - }); - } - - private void setupPromptSettings() { - setupSystemPromptSettings(); - setupUserPromptSettings(); - } - - private void setupSystemPromptSettings() { - mSystemPrompt = mSettingsFields.getSystemPrompt(); - mSystemPromptEditText.setText(mSystemPrompt); - mSystemPromptEditText.addTextChangedListener( - new TextWatcher() { - @Override - public void beforeTextChanged(CharSequence s, int start, int count, int after) {} - - @Override - public void onTextChanged(CharSequence s, int start, int before, int count) {} - - @Override - public void afterTextChanged(Editable s) { - mSystemPrompt = s.toString(); - } - }); - - ImageButton resetSystemPrompt = requireViewById(R.id.resetSystemPrompt); - resetSystemPrompt.setOnClickListener( - view -> { - new AlertDialog.Builder(this) - .setTitle("Reset System Prompt") - .setMessage("Do you really want to reset system prompt?") - .setIcon(android.R.drawable.ic_dialog_alert) - .setPositiveButton( - android.R.string.yes, - new DialogInterface.OnClickListener() { - public void onClick(DialogInterface dialog, int whichButton) { - // Clear the messageAdapter and sharedPreference - mSystemPromptEditText.setText(PromptFormat.DEFAULT_SYSTEM_PROMPT); - } - }) - .setNegativeButton(android.R.string.no, null) - .show(); - }); - } - - private void setupUserPromptSettings() { - mUserPrompt = mSettingsFields.getUserPrompt(); - mUserPromptEditText.setText(mUserPrompt); - mUserPromptEditText.addTextChangedListener( - new TextWatcher() { - @Override - public void beforeTextChanged(CharSequence s, int start, int count, int after) {} - - @Override - public void onTextChanged(CharSequence s, int start, int before, int count) {} - - @Override - public void afterTextChanged(Editable s) { - if (isValidUserPrompt(s.toString())) { - mUserPrompt = s.toString(); - } else { - showInvalidPromptDialog(); - } - } - }); - - ImageButton resetUserPrompt = requireViewById(R.id.resetUserPrompt); - resetUserPrompt.setOnClickListener( - view -> { - new AlertDialog.Builder(this) - .setTitle("Reset Prompt Template") - .setMessage("Do you really want to reset the prompt template?") - .setIcon(android.R.drawable.ic_dialog_alert) - .setPositiveButton( - android.R.string.yes, - new DialogInterface.OnClickListener() { - public void onClick(DialogInterface dialog, int whichButton) { - // Clear the messageAdapter and sharedPreference - mUserPromptEditText.setText( - PromptFormat.getUserPromptTemplate(mModelType, false)); - } - }) - .setNegativeButton(android.R.string.no, null) - .show(); - }); - } - - private boolean isValidUserPrompt(String userPrompt) { - return userPrompt.contains(PromptFormat.USER_PLACEHOLDER); - } - - private void showInvalidPromptDialog() { - new AlertDialog.Builder(this) - .setTitle("Invalid Prompt Format") - .setMessage( - "Prompt format must contain " - + PromptFormat.USER_PLACEHOLDER - + ". Do you want to reset prompt format?") - .setIcon(android.R.drawable.ic_dialog_alert) - .setPositiveButton( - android.R.string.yes, - (dialog, whichButton) -> { - mUserPromptEditText.setText(PromptFormat.getUserPromptTemplate(mModelType, false)); - }) - .setNegativeButton(android.R.string.no, null) - .show(); - } - - private void setupBackendSelectorDialog() { - // Convert enum to list - List backendTypesList = new ArrayList<>(); - for (BackendType backendType : BackendType.values()) { - backendTypesList.add(backendType.toString()); - } - // Alert dialog builder takes in arr of string instead of list - String[] backendTypes = backendTypesList.toArray(new String[0]); - AlertDialog.Builder backendTypeBuilder = new AlertDialog.Builder(this); - backendTypeBuilder.setTitle("Select backend type"); - backendTypeBuilder.setSingleChoiceItems( - backendTypes, - -1, - (dialog, item) -> { - mBackendTextView.setText(backendTypes[item]); - mBackendType = BackendType.valueOf(backendTypes[item]); - setBackendSettingMode(); - dialog.dismiss(); - }); - - backendTypeBuilder.create().show(); - } - - private void setupModelSelectorDialog() { - String[] pteFiles = listLocalFile("/data/local/tmp/llama/", new String[] {".pte"}); - AlertDialog.Builder modelPathBuilder = new AlertDialog.Builder(this); - modelPathBuilder.setTitle("Select model path"); - - modelPathBuilder.setSingleChoiceItems( - pteFiles, - -1, - (dialog, item) -> { - mModelFilePath = pteFiles[item]; - mModelTextView.setText(getFilenameFromPath(mModelFilePath)); - mLoadModelButton.setEnabled(true); - dialog.dismiss(); - }); - - modelPathBuilder.create().show(); - } - - private static boolean fileHasExtension(String file, String[] suffix) { - return Arrays.stream(suffix).anyMatch(entry -> file.endsWith(entry)); - } - - private static String[] listLocalFile(String path, String[] suffix) { - File directory = new File(path); - if (directory.exists() && directory.isDirectory()) { - File[] files = directory.listFiles((dir, name) -> (fileHasExtension(name, suffix))); - String[] result = new String[files.length]; - for (int i = 0; i < files.length; i++) { - if (files[i].isFile() && fileHasExtension(files[i].getName(), suffix)) { - result[i] = files[i].getAbsolutePath(); - } - } - return result; - } - return new String[] {}; - } - - private void setupModelTypeSelectorDialog() { - // Convert enum to list - List modelTypesList = new ArrayList<>(); - for (ModelType modelType : ModelType.values()) { - modelTypesList.add(modelType.toString()); - } - // Alert dialog builder takes in arr of string instead of list - String[] modelTypes = modelTypesList.toArray(new String[0]); - AlertDialog.Builder modelTypeBuilder = new AlertDialog.Builder(this); - modelTypeBuilder.setTitle("Select model type"); - modelTypeBuilder.setSingleChoiceItems( - modelTypes, - -1, - (dialog, item) -> { - mModelTypeTextView.setText(modelTypes[item]); - mModelType = ModelType.valueOf(modelTypes[item]); - mUserPromptEditText.setText(PromptFormat.getUserPromptTemplate(mModelType, false)); - dialog.dismiss(); - }); - - modelTypeBuilder.create().show(); - } - - private void setupTokenizerSelectorDialog() { - String[] tokenizerFiles = - listLocalFile("/data/local/tmp/llama/", new String[] {".bin", ".json", ".model"}); - AlertDialog.Builder tokenizerPathBuilder = new AlertDialog.Builder(this); - tokenizerPathBuilder.setTitle("Select tokenizer path"); - tokenizerPathBuilder.setSingleChoiceItems( - tokenizerFiles, - -1, - (dialog, item) -> { - mTokenizerFilePath = tokenizerFiles[item]; - mTokenizerTextView.setText(getFilenameFromPath(mTokenizerFilePath)); - mLoadModelButton.setEnabled(true); - dialog.dismiss(); - }); - - tokenizerPathBuilder.create().show(); - } - - private String getFilenameFromPath(String uriFilePath) { - String[] segments = uriFilePath.split("/"); - if (segments.length > 0) { - return segments[segments.length - 1]; // get last element (aka filename) - } - return ""; - } - - private void setBackendSettingMode() { - if (mBackendType.equals(BackendType.XNNPACK) || mBackendType.equals(BackendType.QUALCOMM)) { - setXNNPACKSettingMode(); - } else if (mBackendType.equals(BackendType.MEDIATEK)) { - setMediaTekSettingMode(); - } - } - - private void setXNNPACKSettingMode() { - requireViewById(R.id.modelLayout).setVisibility(View.VISIBLE); - requireViewById(R.id.tokenizerLayout).setVisibility(View.VISIBLE); - requireViewById(R.id.parametersView).setVisibility(View.VISIBLE); - requireViewById(R.id.temperatureLayout).setVisibility(View.VISIBLE); - mModelFilePath = ""; - mTokenizerFilePath = ""; - } - - private void setMediaTekSettingMode() { - requireViewById(R.id.modelLayout).setVisibility(View.GONE); - requireViewById(R.id.tokenizerLayout).setVisibility(View.GONE); - requireViewById(R.id.parametersView).setVisibility(View.GONE); - requireViewById(R.id.temperatureLayout).setVisibility(View.GONE); - mModelFilePath = "/in/mtk/llama/runner"; - mTokenizerFilePath = "/in/mtk/llama/runner"; - } - - private void loadSettings() { - Gson gson = new Gson(); - String settingsFieldsJSON = mDemoSharedPreferences.getSettings(); - if (!settingsFieldsJSON.isEmpty()) { - mSettingsFields = gson.fromJson(settingsFieldsJSON, SettingsFields.class); - } - } - - private void saveSettings() { - mSettingsFields.saveModelPath(mModelFilePath); - mSettingsFields.saveTokenizerPath(mTokenizerFilePath); - mSettingsFields.saveParameters(mSetTemperature); - mSettingsFields.savePrompts(mSystemPrompt, mUserPrompt); - mSettingsFields.saveModelType(mModelType); - mSettingsFields.saveBackendType(mBackendType); - mDemoSharedPreferences.addSettings(mSettingsFields); - } - - @Override - public void onBackPressed() { - super.onBackPressed(); - saveSettings(); - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/SettingsFields.java b/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/SettingsFields.java deleted file mode 100644 index 94036f43947..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/SettingsFields.java +++ /dev/null @@ -1,148 +0,0 @@ -/* - * Copyright (c) Meta Platforms, Inc. and affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -package com.example.executorchllamademo; - -public class SettingsFields { - - public String getModelFilePath() { - return modelFilePath; - } - - public String getTokenizerFilePath() { - return tokenizerFilePath; - } - - public double getTemperature() { - return temperature; - } - - public String getSystemPrompt() { - return systemPrompt; - } - - public ModelType getModelType() { - return modelType; - } - - public BackendType getBackendType() { - return backendType; - } - - public String getUserPrompt() { - return userPrompt; - } - - public String getFormattedSystemAndUserPrompt(String prompt, boolean thinkingMode) { - return getFormattedSystemPrompt() + getFormattedUserPrompt(prompt, thinkingMode); - } - - public String getFormattedSystemPrompt() { - return PromptFormat.getSystemPromptTemplate(modelType) - .replace(PromptFormat.SYSTEM_PLACEHOLDER, systemPrompt); - } - - public String getFormattedUserPrompt(String prompt, boolean thinkingMode) { - return userPrompt - .replace(PromptFormat.USER_PLACEHOLDER, prompt) - .replace( - PromptFormat.THINKING_MODE_PLACEHOLDER, - PromptFormat.getThinkingModeToken(modelType, thinkingMode)); - } - - public boolean getIsClearChatHistory() { - return isClearChatHistory; - } - - public boolean getIsLoadModel() { - return isLoadModel; - } - - private String modelFilePath; - private String tokenizerFilePath; - private double temperature; - private String systemPrompt; - private String userPrompt; - private boolean isClearChatHistory; - private boolean isLoadModel; - private ModelType modelType; - private BackendType backendType; - - public SettingsFields() { - ModelType DEFAULT_MODEL = ModelType.LLAMA_3; - BackendType DEFAULT_BACKEND = BackendType.XNNPACK; - - modelFilePath = ""; - tokenizerFilePath = ""; - temperature = SettingsActivity.TEMPERATURE_MIN_VALUE; - systemPrompt = ""; - userPrompt = PromptFormat.getUserPromptTemplate(DEFAULT_MODEL, false); - isClearChatHistory = false; - isLoadModel = false; - modelType = DEFAULT_MODEL; - backendType = DEFAULT_BACKEND; - } - - public SettingsFields(SettingsFields settingsFields) { - this.modelFilePath = settingsFields.modelFilePath; - this.tokenizerFilePath = settingsFields.tokenizerFilePath; - this.temperature = settingsFields.temperature; - this.systemPrompt = settingsFields.getSystemPrompt(); - this.userPrompt = settingsFields.getUserPrompt(); - this.isClearChatHistory = settingsFields.getIsClearChatHistory(); - this.isLoadModel = settingsFields.getIsLoadModel(); - this.modelType = settingsFields.modelType; - this.backendType = settingsFields.backendType; - } - - public void saveModelPath(String modelFilePath) { - this.modelFilePath = modelFilePath; - } - - public void saveTokenizerPath(String tokenizerFilePath) { - this.tokenizerFilePath = tokenizerFilePath; - } - - public void saveModelType(ModelType modelType) { - this.modelType = modelType; - } - - public void saveBackendType(BackendType backendType) { - this.backendType = backendType; - } - - public void saveParameters(Double temperature) { - this.temperature = temperature; - } - - public void savePrompts(String systemPrompt, String userPrompt) { - this.systemPrompt = systemPrompt; - this.userPrompt = userPrompt; - } - - public void saveIsClearChatHistory(boolean needToClear) { - this.isClearChatHistory = needToClear; - } - - public void saveLoadModelAction(boolean shouldLoadModel) { - this.isLoadModel = shouldLoadModel; - } - - public boolean equals(SettingsFields anotherSettingsFields) { - if (this == anotherSettingsFields) return true; - return modelFilePath.equals(anotherSettingsFields.modelFilePath) - && tokenizerFilePath.equals(anotherSettingsFields.tokenizerFilePath) - && temperature == anotherSettingsFields.temperature - && systemPrompt.equals(anotherSettingsFields.systemPrompt) - && userPrompt.equals(anotherSettingsFields.userPrompt) - && isClearChatHistory == anotherSettingsFields.isClearChatHistory - && isLoadModel == anotherSettingsFields.isLoadModel - && modelType == anotherSettingsFields.modelType - && backendType == anotherSettingsFields.backendType; - } -} diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/banner_shape.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/banner_shape.xml deleted file mode 100644 index 0868ffffa6f..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/banner_shape.xml +++ /dev/null @@ -1,5 +0,0 @@ - - - - \ No newline at end of file diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_add_24.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_add_24.xml deleted file mode 100644 index 2ae27b8409e..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_add_24.xml +++ /dev/null @@ -1,5 +0,0 @@ - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_add_photo_alternate_24.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_add_photo_alternate_24.xml deleted file mode 100644 index 7077fedd483..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_add_photo_alternate_24.xml +++ /dev/null @@ -1,5 +0,0 @@ - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_article_24.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_article_24.xml deleted file mode 100644 index a6837b9c69f..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_article_24.xml +++ /dev/null @@ -1,6 +0,0 @@ - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_close_24.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_close_24.xml deleted file mode 100644 index fb902d4331b..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_close_24.xml +++ /dev/null @@ -1,6 +0,0 @@ - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_delete_forever_24.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_delete_forever_24.xml deleted file mode 100644 index 4680bc6629e..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_delete_forever_24.xml +++ /dev/null @@ -1,5 +0,0 @@ - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_lightbulb_24.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_lightbulb_24.xml deleted file mode 100644 index aa045396d28..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_lightbulb_24.xml +++ /dev/null @@ -1,5 +0,0 @@ - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_restart_alt_24.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_restart_alt_24.xml deleted file mode 100644 index 860470ab109..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_restart_alt_24.xml +++ /dev/null @@ -1,6 +0,0 @@ - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_send_24.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_send_24.xml deleted file mode 100644 index 2de1f642089..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_send_24.xml +++ /dev/null @@ -1,6 +0,0 @@ - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_settings_24.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_settings_24.xml deleted file mode 100644 index c51d84b9f4f..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_settings_24.xml +++ /dev/null @@ -1,11 +0,0 @@ - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_stop_24.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_stop_24.xml deleted file mode 100644 index 832e2585954..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/baseline_stop_24.xml +++ /dev/null @@ -1,6 +0,0 @@ - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/blue_lightbulb_24.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/blue_lightbulb_24.xml deleted file mode 100644 index 585cd3b1892..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/blue_lightbulb_24.xml +++ /dev/null @@ -1,5 +0,0 @@ - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/btn.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/btn.xml deleted file mode 100644 index ceb3ac56c9e..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/btn.xml +++ /dev/null @@ -1,8 +0,0 @@ - - - - - - - \ No newline at end of file diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/chat_background.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/chat_background.xml deleted file mode 100644 index eb8b9d1f1a9..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/chat_background.xml +++ /dev/null @@ -1,21 +0,0 @@ - - - - - - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/custom_button_round.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/custom_button_round.xml deleted file mode 100644 index 87c82d2a38d..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/custom_button_round.xml +++ /dev/null @@ -1,7 +0,0 @@ - - - - - - \ No newline at end of file diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/expand_circle_down.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/expand_circle_down.xml deleted file mode 100644 index 0a7a71f0700..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/expand_circle_down.xml +++ /dev/null @@ -1,9 +0,0 @@ - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/ic_launcher_background.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/ic_launcher_background.xml deleted file mode 100644 index 07d5da9cbf1..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/ic_launcher_background.xml +++ /dev/null @@ -1,170 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/ic_launcher_foreground.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/ic_launcher_foreground.xml deleted file mode 100644 index 7706ab9e6d4..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/ic_launcher_foreground.xml +++ /dev/null @@ -1,30 +0,0 @@ - - - - - - - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/input_text_shape.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/input_text_shape.xml deleted file mode 100644 index 35c778a437d..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/input_text_shape.xml +++ /dev/null @@ -1,7 +0,0 @@ - - - - - - \ No newline at end of file diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/logo.png b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/logo.png deleted file mode 100644 index 60e3e5174e9bdec2caf09cd42a9232e1dff65530..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 33036 zcmZ6ycRbbK9|wNkcjMxcJueB#-kH}3WmYo7wf9IeD!h$+j6||BZ={r2grvGkMU;z- zQn`tcnZ5mdACKSu>-~N_-s_y#Iq%mw&+$sJvM^?5k@Dv45$iqXr zl4||oLCng``jjBA_`$(J_xra?vlH}h{f+hY1v&XgSa>7CBGgq*+Mc&P*j+heux$kZ z5u7&EvyS{W_tHF=|M`%LL%Bhox8h)g&?&`GrSb}8S=v$4r+ty$HHPt;?mq9%kQ(Ifa?5#aeJo|rSE4w)Nbz~g`ZxdJESpbS z{Cjp@I6UZa2V@~6Sm6Qr9{^Pg1O6z0%7uiUY+j6f1_Vd_KX*u1-mj5*eG%<~;QRky z`ad5oWC3=?H-&Qc(Kd5CN_a94Rg39P@&D%~|MzE3Y*H@j2A+DQ>wmi3<{m@O?Be71 z5iH!h?`t;m9Dl3S%Rha*U;OqIMBa=Yc%_R3M-iV6TUwINd=UeZpq_@V_&z|XNBMt( zVAWn0cov5`@wS6sRTcmL?#;1oLgNzv&xgMr3$mRGoOb6S+y7?POds-5yoZZHo?dtz{bf6kxIrUat5lDt??_dMk{^8xb5ff8)8l1aU7JER{(e6|2duN`I zxB&HB-ofhEySE>)YK@M46bO!-OZ@4TG@sWnC;~+T%Ch%so2rc31(-8BNH;s*Z(fHi z5G8y)_8&uy0y4XKgvH2~Y-FsiNY8kQ@V48VyPksVe;lu9ZvdzXzYD+iuxq*K$_2Kc z;$2$8Fgg3EjtbQo@z{v~>JL+njH2W57ySyQezwVYU_}hHrd#tF5;tl{W=}zuO6jFN~29!Xf>;nxn6CmO>X}I)=20 zqX>kyqO>DWEve{-tv$|02!w2SJy1AkT6>8CySI?> zy~}8Tb9J=|dls4w!SolV6i$e`whUPvs103J+qpCbi2gwR)st01{j=*}u4p{}&yhT- zGlBN@o`-=JE^wdAM!+Vh%>MIN5=if$r3LE@k8U^cBhRtL)6)R5(#42mymSMbGu9c! zm3Vc76Rhy546&Xi0Ude!^egg9AC_+neNJ#j*%^(EJ1A0ra#OmW)ZK#Y0lUwW$FUV;~$nLXVHkd3Fv@ zDSTLDeGY5G1F+l2$tj`)yH;s*lP*8Fe+KgQ{ZpDsYV2?zo~eR5SE3Jlh=z9QOlimB zNqG?MDKGKHnJ2Ga$(hg92?4q>+1~ek9w2`e6w#EzQ0Gcbq>{A^-T~swC3A9-K2kcG z9DUSeyFfj^BYvJWN7MJ|V*ndB*rlDCG^9L)sEk$cV+47TV<$y;Xbqw-$DcUE2)$|E z-~I_mi~r0>(RCNvLA?nka}Ez)g)_HlK8Q9oLi!jE%QZwLF|UfYKP@I@cL4qApqHw6 z%F6B>`NZV;GK>{bsrAzRP|i3%1PDEXLn%ttNRy-Zz_vXwxiK(0i~(KWw--cIAa&18 z@c0%{y4LMmDKp5*8!jwtTnUa-)Tcg50=S4{x@vI;LSY^*xyp0&k*zf(@RI4Y?3;H>-vcs6-} z%VChGTVOXvCk1ArJhH}~;RTN^BhYktN+7K%16*Fx)O-!ue{2<5=|n~8yyNr+1a;{S zIW9W&`GX|LntkcF^zo9Pn(RP2$~A?T{#BROv*_Edr(jpXW2Ftk*u{uVF0w*m+Hn`i zU}t>tr2H5uB%tT@Iy2y4qyMMz_}j+R!1=pJ>|SBO=pCk*``VbkBjs)C|h zIoZC{EluX3C^Hi#tLD>Ix^j*%Ak@FB+uRdGhH}hS9rD!p9g!xEJS-V{@M5Xf5i`YR_cbX~ozfZqN zu`5U8X@GRuJgWCdTCFuo%g%!HW4jwBKWi}dlAiO*xIH_F)a#UoB^k0KzTLrRO4l48 z1m&rTfB`9wvU+y)%2COS&r|5JeuE=NFDd|mGslSar8v-8X(1?=@FY=K)W{{KRH$H04lpr z#Eo)?ksh@cn5d9Ap8}FH745V)H+Eu1Xrsn92|}@x7LWsgDn<_zWra+Q;0`RM&P2R) zOON$w`7O(RcOK&Y?$x6bK(UNlxyji~sur4F9pIswbyv~8U={V2KFPx*L}&AtVwWNp znk01@L~+3%vEXvtwBtE&QC)2HcMFDML3Kco?=3#w4@N^vl#e*lXVCZhm0fO5BD}(o zCnfEnZEBs~!9e2w1eV>07kVF|O`gJ+#SxZO0cgv7HhEWHpx17`$#pj>WohqxQtNWHmUH;SiNpY}LGluj=jNqBzucT4#lFC2! z)cbD+N#8hoKBYeDw#~B(&7hq?!b=Sj4RJwtcijp3{Vo|3f+CG@dTZrtV+Q-}srTzUf9$Z*-{<|Eovj;?QN`hKwoi@~(;whRTl9Y!?tz$45+^sZ`mT zEY}=x_ygGA=F_|L{G#?l6T)J6O%iv$65!g(r7u3!Al&Oqd!$=CaP~%bJItBzAc(;8 zgggArGmxPWq?QD&KJYYlLybLYWpp_>FUK4v)g(u4j%;$;nmV2P&tq9YMldv34R0_-kyl8MIOnWSA-}%}{(H4O8`@n~ zVG)vzS$09f^rODGA3EN9IYoO;m)5M(mE}7@2@Y(m+>IDMFzm6L*ct^yBF8VbTvNbx z;o)UAJ6;KSX`mp7Nsk6*58YFtlp|I*%H50d;!38$znKt21PYgBdvLfsYc|@)E18SR0>eMx4~m(uSI=TQrt=u4tSKL)~+Wzrzmtf`l1C zFV0IvL?8S1_6Q@Rre4TpUf7IT9|dmlfa}U1UbP5yU57mvSz>TbLd;si@LHY4`=h!6 z=I^*Z#7cbxlOsS^?AjZ9wI%gR=)?fWHo<>mk`JDeF<=GUQ|B*L90qSG`2{rh;dx2P zHl8SlEw4Zlzs+6}?sRzX?4B$i{DkU_1P5%w?7^iO;!nKA9%0@bo8`4c!u1$(vS6RE z`!JrcA#YEAjRE^BT*=B*RV)~Y(*vE3Z*nVjA8H>R=cla|8enZ7^e{=)!nLr3@j=qvG9Q<-WlWatsRApTj-Sp+g0f-_M> zz=PJJ-e&sI=Xs&35X}fwtW?QAGh4+JMW;KQ^e7lcskDq97yJmc3ofpVBmGp^1F6fyc_Ha zVpXEqyyJ>iqlL-p=l%ny#9>A^5L|_2*#sQG;4fN%EC1#L{XF-oHvm2csJ9hi7l-0b zKIweUo7epn6#h*3p|SeZ5#(I*S$McZOiMEPVJ%G_@b;K0Ad5GjLMpJ1r&gs|xvLorNZ^1sf_831=sI{c zaeRKjB|TC~y|rV|$Gu7l&^|vp8ieD&*0FfyM!JU%>G&S#8*s7XA`R$jtQ`4+CLnma zATke$6VWs>mofDTb{DdL=S$|DBnxQb_J_w1jE zbs<)jVZkGD*^p7twH!y+J54vW7W!n6e+S88g_+GB5_Y) zndXKP8Z}=f;cWD=Q6Uyj6kzf^OgSOC-?Kb*J^EwcRKk9C@o}c3fOHxP#gMyW%wDc3 z3=fqnixN1K1Y3WibK)?-z5z7f z|Ge#=|5_uUQ}sVlnX>=35X3MNVFQu`k~Via$On8F8UrNZ2Jm+`l!{uS21B$)v}=+m zjK!?<9niSlJ#e6)lEA~n1AM%p@YU;?|2p@m0wSk$!Nn%Fr2zwvecv;YrWcqhzyicB z&{)Z(tV4?DmcERT2=eD#e8T&5(7MI}olWkKZ$lK&z0f0{yS{!nn{Amv46 zlEEos!nw4AcmufVuvJFY1T$b7qn(=wej`!hUFJx%4!8pO26LwDSGT*3@TJK5pE-}$ z--1BL$M!D`VFS7gsQQTuAHf5z!xkej^gZRCi{7p|kE+goGc6}8cr2Y)W}@?vI~m&p z7L0!nCEv+qfc|nG-+;bH4_&WP5j?Z?lT>+Tozc$M8Asi}5|Rxi1qrE72pgS|An<}s z5z>9gsKd7r)xDKqgXubD@`@y1?JU&wfRo_$kJkH>!$GfX@ z6MV!=Z(Jcfp${zloeHPy2%i8Tf8g9_2zMGO{61j-|Wdv zb%Yt0Rd)y2ZToTCA1GYAF%cK<0H2RP!lN$P?;Y>BUM8cq+v`j9fp1z6{cCwI8@wxr zhY|u@sLL5?tiGT}g_^7cOwkUQ!?_M=Em)59uqIW%y|rs`iY0A-M$gU3Dt8>Bo_bp^ z5{|eHk<6nE@enm%)!IBsM)9$XM$yFEg@oe2L-N){Bp_%=jjTZ<%fU~&rZ{K>B^ZJ= z4VIOJsy z^8v|K%NMX`YDnn=U3{?ddByBI?VT&u9(aNdA&fynew*7UQJLhg#nU<)yN|z*Ctn8@ z7)rUiTjJkaUZ!PG(bImCsRw$+_KM5hKXT$Iu^A->fO(tbXzhCCSs$^Fu@(S_g=;L) z$0-=_EfTW?=p7K9^NWu>IW@K8%Yfe&u%kdp$YF?2-k9R444*@iP9n{;{!BS8ND^2q z2IiTibD@yU zrCmN@;wAYC9V*$-Jm+5w2pgvvm>S9FI`Uaz-k;y>b!_W0d0gIFRWl?qWzu5C=ZwlfDnJ)XVjPH52em zg6Zh8=34tcH$2U+kx!Zr$4l;(C-q=M$^fP0TU)bNN2;F8-iw5@zrfOvE}#yBiHaZ7 zW8_PR)5}nzUbYMI{e7xYw<)$u&SMXlXzWymV5pZdME{tQ7fp8Z#L7SuUaAL#N0{&- z?7jyt@%g7oBt8udS@tQr*iL9wWqF}HrrKE+kucaYnN&AK_w&++&y^2O0-jH&Nd7IF z;=f1xC9&lmD+(_HdA;E%OW+DadC>_*Y23)uz^Ad+0}q$E#6EvhFKXODyn96b*RR+P z!7C1){>%<96q)NM(410O_r*W7Tu00wKs-?4!GE!!JXg7C=Hz69A5Zh?@rRJjd_E0< z6Sw4ZhV!*9;rUvAc?D8Foh}mdcy70G zX;KNWc*UcD$FDoS5B*C+N#%p07s(E9vf|;*#7uuaOYBXE?~9#?zaIw*PGSWrZmpx5 zFtN^*`UOvz!-fRXp(}u4XW)R?I{9%E_z6nRVtt}2sPEg3Q7&a(ci3F6A&=q8D8~Q~ zEsWCcT)2B(x)0C*n4_b>d9)g4_bjd`7ZP6kO52av1?%lzbfpBL*+DwOsu9i$xV zMObD1432O(s``Y?Y&LZZt z3N@z$gS*D-W55GKF;*3EZDu#&t> zxsYvk@)Sa6KAOtR0C~v2#ubzNs`_rhakdQiO+O9;i$*SZqBSGr_1pZPXRgcdDew4d z;iU8{`4N>nqi$TJghI%WQT3M00B7}00S-{6ocZ2rJMO{-o&&Y}_ORbthOFcoQb2dm zc+=_zJioJ|S103=!VYLf{b*9cLBRyVwk$Z9>29ukMA+?oDCq9aTZbw;N8(xA^!qig zmskSt&<;Ye%DFKfivNCeziTmmg|prQ#C-fYfFF6|eUH*7sK`e70v_<;kUFp+3)F)h zUyur}E8BP2&6anJu@XBQBfaX#E42L3xkV`P8y}_1&p@v|RWx)-rGE${LJmdyg<3qc1xjuYqpTyw zX6g@3E-MDcEY|OT2cc%DcXE?gc6~_MfkA36mj!wb@K|^db$^nNbtE z4C1#2=9mrnxe*i;xL(RR>isDcO1fb~0Jq{Hjl@0gh8t6*J%8sWdO-|?%Au5F!@R3F9DAz{UAJ|TUuMYF)T=a8&!8GA z!_&UL4D|{^eb}mp_dd#SQT(pw_sYlbTp2}oT)O%z1Y=H zB?c~}#=r0w|03&$(*gn0u zw${~@fGEuR~)}NgX^zEcg!!( z>UH~LDKpCcHJK??XX2J_IDCPM^jaUFmNu3eF6u$FiC?j#*q~5@rI9ey@hRa!gLg!U zdhIrq2SKgFJ~g+|B=-H^e6kYwBCY6x-fy?)+9tRqKIRv)1RkD5n-Clbxx-5eLOl_3 zY}$YH$Y>DQpBtAZm+ZN)tpyub7kxJYW4AVr@KCzBboScoX%ZJ{oDS_t5aJz0uIv`Y z|NZV4iwZt?vC%h-)uO_w-Fn|8>637#62@!PSwqqbeotLgejr8*=)Y)XiKKo)fv^TR zGGlEm6G>hB!tkQ0Gek&IpA@~p%klnTp&@#ROA+j(@dAex0&U8K=l(|o9g_q~p`v=u z9$TF~YA)Biw{cpiFBua*V`Jo7bVqIF;aQ;UBQ!FhdnFxL{Yb!4(jWKSi%UKb;q&CB zO_=91OekiAU%PjlJ8H5D;lt7LFi=#CnWN`#?u*q&TytvELwB;lh@k|{6G-IZe)fOL zL3*M3g^@q~kNV(#@lh_Xy4x&`N3bpBh|(n>AcbTtu9VfxG(_rM*qkA5_+iw>S}k5)7J0)z4_0;0aU&exbr}SM1f!q!+(Sblyl) zU!C#W`NWHCm`|zHj7keu`J+m@v#Uf@xF?4t`7YPJ;Unf_SbYBU3OBX8Q328S=bg`5 zohO+?%Hg9BX}}bj&vE~?PeBRklQgjeqbOW4`Rzot#->FFxDeGe(i*J#!pRTOFG*We zYmL5x_E8`yzuXyyKW)P|4&tVsoaP~Y>$fi=05@jZF{amD`xV7{(PdCUfsCuqQHbyn zF=yqdzE~^Vs3k-Z#?AoxJHRBJ03x|a?z(q3yh@YgmPI&`)O1gvhvQk2wORln#KsOC z(kY=3(`eATz+ynnR2ui6R49ID2@g>t5~am|~%3y$##vjxs;gNg#7Hp2ecZeQF8K0KVg>`cMuc z8{KFC9tV<*30aHGbE=q6NzGqFotf;u)VSUWOv6!6ce5&BrdKA%Jl05$kvm@qTB^^$ z+R>lTHV?qu^=!$R|CblwYh_ixR;MK@iRwBUwm*->$ z25L-Do{xMhT5=rF$MiD}p8vGq28c)IK0k^!E}7z9*3up88$*?#E-BW;SRykY zntGD%Jb*bEKLs6hDvDY?LPSf@81TAmam=loFXD{+zi};)qd9?_UTMjW+t}-iw6}5g zw@w$bh~G)e^A5KVLnjxXqGuQPWyc%0<9L+4@84HR?cJODG7&vjY%q(DVbX$uGNDqtlr_UXM_2`n5TY&!^ zNOd3AX_07EBCI+J;ABCFA7h9Bytq&*E1uwG873L|;>YBa$$;gv8Xgf}Zz$mBjx;pa zU`^4@5oGt&hesH~E;7@+Rf&$fNK9hTUuoj%u|>J}=`k;4CXbFRC>wFhu?#29lop-@ zKH8OIlUY{n00X2XGQFS@B$YqV=h6I+ncb|~un9}c*kk^>F={w{$xh_1oxPSNt|$yo zD+fM`>=fMw)&lT9twZZ`k|A*x5&XIN>%YH#1rA`{?Oi~)Wo2y4iDY`Jf=$8YMZv#~ zM%*k%67TX*2Bz3xf9M8^KyutNc_P~4|=-TDhel`SU}!q+#j8q zo7kvaN%i{}uhoZ`yDuWaiCn#Eh6;NT!$Uj}@q`_opQT8N5dWaQ2_f;DZz3W_>r11! zD2u4)54HS4&o0f>CnOED#Hl%cIX``Exp!}5@gooQJOA`B0x?OJkVF(ZjWKv&gIjzl zK|&EUcV%19cJH|rQCHY^6%`bukld5na&$UZu(gZvgp)H< znWo&v>IQlw?1)Rlyx?YO(90(bu2_o)^*5S3A3snUo8380a=DGpFQG~z4Gk&Y$mIc> zk)+v$M-`4tVDos+dPX90JvcV(&WE>*pQvN5%<kA!fO)`T2RA`>esdnUM$VI z>OKK%TF77Dd6b9!@bqNa!`>MsP$RLL%s#!QS%lJX6DE%(g5aCSywMHm;cX~GQQtfB zjUITAHFPj17T&g_U+}2m8d|vlcpc698jLMF#Nc>v9aH~X+wQ<`xnaF>-P>HX`iECW z1p34jGSOn~g1wc>r(QP+nz$|IySod~@gQ0jz4Pf+Hbi-Z{^HlqZ&1tc#D$!`wWIS1 zA<4Y@>^l6ARr298UE72gd8G45c}Aas!QN*igD&vrCrOtHR};&B@!Ee2{qLoQ(G+0v z5dtw^@?%rhj$)4wLQSo<9FbVvLzH-q)O+KwaaMPalP=VdU!x+-BfhCUM|W)4(Y$s% zFzQ{Z2Mp7sE6m4Qo-LlE>I$muM^y5YmEnTamn%?_56)-#H`iCpiEr!{8yDw_mL1{y z0%YBa;@JBJXhTNYo#f&~Kg!P6M#(R;&-pR|Rly#X!u?lGmdHjgDU$Czha0K z&FJdb{nlcq#SjzYHh8B9REI9HYliw z4+z)vnMD%bQIo9=`%bo=9YSBn5T)RnSW6_17X+yz#d!6G&Ho_&2@X5Ij<);+=p|#z z3~6>3E1Qf7lpe%Y9XkzcOn5%8hcf9+3xNFKKz=?p3WA67$<7fE~zUSSA=Yy;ju=uQ!(faOF>>4pWe9S;`EFS zko9H6tv=s%ALGp8Y&0bu4XRW4^aXM;p@6_h=AZXTbE?sS*UNx0@#xZu%f(#Z7DXy>r&Swj31+Wqgh&ra}_l-Ml>!d49}vsOraTh;hrIr0sjwDK$;6S_E#@ z#WQd~yHCaM9Z62(1||5-I2C1ZJMN|G+4#UGocNnKd@{A0J|2MpiXTYz2gV0 zvXPQmkXn{Z=RE?+R9~5~+ERv657nme!s`kD1XZ1@UUGnu?K4Q=@Ofi-;fxJ4-drl6 z*%wT&irJGM6pMt+^MHQ@@_#6x{cy72Dy;nt(_v0pJ`xHMGeO47Q*5b?*yquk#41eE zE287R7@@7t6=s&j9%Vq%&T|*0Z@KcDW8m%Ar;KO;vu_oRpIk=W#}gvaQV1XO&Q;Xm zII@GVk#NdFTl1}^sG(Ty1+Z&leLX(Qj4Zwiy}83UDwxH%T7ic@I&5F)MHHO`ZGug^ z_paz{+Hl0wxyWqgL()$Qf{n%SsU{#R$=;Ax?I>$i=# zfLbm@;Ewy}+&%js;^+8xbZB4;8DIZe#BNCc+YWnT*`7TxFnP)?iU>Shd#?jW9z?}`}W zh5}LA!uYm#P}CvsU)LJ1MIWw59#C1j$j?ju+BP;<9txj5^8OQ5J`gD#yq|LQ&{e|Ne(Ms^5c%Io5+_a~CUik!k*>(f*flM#wMY&p@mTaEFDCK!VB#0qWjE;f}+QWGLlp+ODXv z==P}^h~EFR@hKi0x%+@#6?+4j0K(b};2LL!FemJs~)OC$prko}!o_S4Z|g z3TgVIZ|t&n2zEU7$sZ}C%Ad-YCftHdaLaC{ z@n{akueV7Ym~l}KInadHonHnFY&42FuRwo@hqBfi5*sMO>4V#V1CfZeLgzhmg?CFz zJ47@6sH&z&TLpQU!x9`c|guRC#mzycm;Sqjmk1#d0&-KOS zU2TjaD8hXpPNj@J&L02~Wd1FpAlqiIu75Yky%yC7fnmTdOpZ2y|3McS@HsY14u;?g z*P*yLZngL zx6V8Id(y-(bZBdfrdyi`Cv|L`1J@(?$WW+Gt@IY6q) zp-ZN|U4-o91|Pz@pQtP(Iq)gyYu@;9;w|`j^l*TCar%b}p<@TAqkP_ZyM_T-{{d

`5eqyv=sNbZS=eDEM*Xau+I2kIH{6hQSq)SkJ>X*^(X$Pk(Q zu^j_my&t>d3C;Gil6f&EZosqy_}{^N!yKAA!N>0QSy9C6LxAqXLpBE#8mQDF*uCMj z=LH_!2(CSGF5X~{&D$jqxxbLo_BusM9;^a7E*jVg>Hg^gMo6(O@MPx0>;&?g($IPb(TcEJ)$Y2O(PF+h>+kmDiDv8dt+9>5U~8g1}kLIm((MredXFZ&=^-~JB> z`~lpcQkvU`lZQCjoMDp!o%07RMu4PY59|t?mQDhFfkWWVkssM2iie+;MNPG{A9^?L z7~R-6Un3wvAjXc$5t8U3ZPd&#HGkiTaB*I|lJq9HSK}mqlVRzP>6#Nj zBfW2Re0+RnW+ruaXL&XB)V^zPgAXjdGr6!exxc@E?32N3*@Jfj-S0e1~!k^h`ee!9#p7F&uG8>4Que09uHsJ{rpZj}h07))^b)}j8Rv~VhW{ePVM z-C_(mB5dvYhoBw6g~?sg9$9o>FIa_E@1lhTwr}lvt4?Vw(^wZCe%FrvD zU7pCaj5@RSan$Y!nD>>gum`Vsq=tDIAr@aQr{b0;Sze$TR=IJ#%XHAyZ6q-LNz&>{Sn!FlQRpv8g_q59fYN{orTp z=jzc_Tx3KJp{DJ%5U%U=vLM+>NZr6RL@kd85gwc>(!cfBTew2YUpLx>{~#kc)nVVk zWy7Xl2DZ5abrhC2N(e~4reU+3=1LB}1qfxaeHuI4#|OXlWlIw$<3HrC>$m)r`&sKO1XzN(ewhkT5cK_r z&HM7bR1h-YyKz2l!?;ZZJM5d)zFqNoVpbk>Zl{kOG)c3y;Cp9eGMSB*3&7*fu2}9J z`SxEt^n>F&j;RiV{MaeA2I6XMYwYL!HX|LdE4b6Ak;lQ;t&%gb2%5Lw98rKmJt<_cn4%ts~-OiA#xp6*e&A>D()9KZYtm8E6;%HOeiLrscSELlf$DhtGHaD;5x_UkSrV@o4Jx zld4`#lsn(MGNtbP{DkO$Y2^OMfBxlAjE0jWqGL_;^SKiU3HRbbYu(n9cJ7w+v3;gMtP)y)%^IGDCbdP53{%iTY6dw@Fj zM&gZ2)Q5AZo&9wKe$nMdkuBrRuyDDRe8bGsKN6P#0%Y5HI6Ws2>`JJ6#=*%X0aV-| zR$q7`5W7C$q0DD4{<|xA*G(W+oLi0<{aM!S;Vv(jTwsu815ICsEcs*vELvbubF(27 zUF=@_+}FEX`EO=yY=?g6f7k_MoPelP7WrY7I1Gq@SU$zbRayPUVAy1=msH|842;$beV0fP>}N!yr1 zHwKv)K0HF$nO+T{J?!E4fB-e#g!09Y=$xz;x2kkl)l;8+LoVWELriZQ7qQf@F$;9+ zzfQS4a|kI%*T+etcr&Df)#1NZ8RIRz@w z^e>Nj9u}ljygOK``n8Mv-1;EyhR1}y_XEico198M(0LV+*5JO%B_-*xmr(X<-N6QF zI`edJUaRJ3LoOc9`vA3&|1`gO>GKYN+PfIZ7AbdzS%hg9L=pi6&z%r7L=VJ6rgwK} zE~wO%gpl@=4@x+14mqJ@&7Km?{|gd123Pn3>(hXOhJfM+pjYOc$b6G&oaZD6)zpsc z5+Qdvw2_(NG})8Y=Z$HfL(?e;+nC*|0teDBK63j>soG1uo1j&Zf(LfT3|ANVTH`~#k>>3$XA0y#>g)}_axglpdl%2+%pZSrXl-UHo^&X zwZNf>H(zxH0c_(uvk}zCS=0#B@WbQQJG*<=-?mGAfa^??K{tx}`N^bo#JcV7y^T|p zXTFRT97}MX_FhY;QX6B;f=>CG8|*OU^IR$JcL`0Z0(ifq)Fu7#qNhT{&G(D7MW_=r z)qhmMgq@Z9JAw4W?H9Aq#h&1^&L6iet0B#G_OR0!>oMIMe}rp{abz_GOP3-~x#Ps% z#)K7j+y9UYz1YBrpTi=|_!!ptnBUW`O+}}%_b!+JwkBvRHHi|y`T%9jd8pu`EV4y9 zQ#uAReCXveC0C@ z?k7DhU%QvScK+ibjRn6*Nq>#EZm#GDT+$6{^2eQP;Jht7VLSh$n~^}Rxn_>PXvAWr z49F9By-43c%FG}w9W~H)>YkH(`5Ziqj^ey*wo7VGwA4szTcI}mwj3u?8Dc;qWjxP7L z?@5eVIaZg^f0^I=-tp7?mcmk?UIn)56IdBGcyl6g?qVd_>(1&vFUnFhuc{!FI2rW8YRR$+R-ld$Z3X>{tXk$>1^r#W$_aQsxP%sKRTmz!NL zoU}Sa_UrZvBriEMNW<@%)K0x~*x9{O=oR+CB4qi}Q^b-ag{mE}ZqM58@Zz#G@~cZ{ zVd#Cen$-Gj_oM%arf{-Bwk#aqC#@huAzF&SRd? zs_ZXB2iF5kM(Kd#xaMh#FE7L2t<~)W{`~LZ8%y=AHa)Nz0XZ`g8TPeO7X(a$I7ZGD z&THP7cWWzYjvCfwoLpd6O|Zd$rHG4(=z2jrgCKszWVTECOV=GaAQWz&WTNEa@-U^W ziWx-Y7Z;tPZooGeF1r0fE1r&9UwS}96C%}qdHF7V*#9FBkffiAARrdE&uKjF|MlPw zM{z0}wEi&j3rF;dOv+!VO>aHN~1#WFcv^slOUwT$Tay2Nv_xTVCe5_NWXo1~=SV5`=(+4X) zpC`0{vq<`|C|DXvOQxei{Zl+a>(~15ZXkV>_4eyU_a3ZR?60wux~MPK($_fYzu?1!YA|9nQ;IkfC4+RqJ@3Z?!7WDfB8y%MS95Cv?$8GUqWiv6o6 zAXZ>KSe2)-?``ep$kP2%-dd6KLQdX_))zDW_jYDn+#(ojTr##V-m<_c^sUTbznWE z8K{IHJ!|0HL6?B)(Zc!C>8t_6l;tG zSNi6~BV$Hx2F*n$NBfM_?Kcz)<TKCKx&Z35+`;R@38=$? zH_r;9szGQNnU3X)j&MaXpf16eML_Fb&rf;Ku|kXxZaUJtmGE0jca~X#jg;tdjDN4# zF?-skeeA~_w-EYGwmu_w6=J;^L_Nx|savN8-jthJ;zE>_d?}1rYpA1lAhX!CdLROO zFh;$_9=AO3Fz$|}8yY^OQXHZ&q-PPjs1FeNIg6PAuzK}j!{{M>nMy&lcF21xoM4lz5UbjA1W2PPM=9nEjbJN4)5ML^q_7-nQk zsXaIy-((mP7|WIM4Y) z-8<_ZJmSK7-Gv>+&hgwXlH(d!_E{el2h<~T9H7Y%2~0P>4ezk3R3MU39Oe~*!xHKx z4CF_a^QkN%C!@?6^)vo9n=?XIQCzG`X+a+y#}XH|epbXQh9z7YEib$-r(T)b{L_dtGp;rt-4H9s2M zA4yF2bk2P5Ot^N{2M>8K_nGm&_L{(-gPuF<}JGe!cS23monbfRzv+X%fSDzw~fmGbO>h zv%d}^z6##!#sd8AIH41J(x=5_kLwIDDMG;V$%uhR5nNNfaUWT^rWv#(B&FxvKDZ&K zD;Mu?EgC71=8N6^+5H}SRhh;j|0CxdXA0UM-sxhKUgDUCH=js|ad{rO`O+1bd`o0z zR}4B`hpzYV+`aDvwcG6x#G}8;9^*|(VIW(YAC`>r>cGAC^(I1Hlc!&7+}w|M>sw zp3PwF`%)$?WXqBeaZ9$chY*=$r^uf5wqz+qmLl6&LMnONOEJ=hvPLM3Eo8~QGr#$q z-|u(6|9sE6&YUxI&VBAP*L|PsdcCgK^Z9uF$B^s|+ythh)0mLE9e9pgZ(`##u^?uF z`1D`ER*Tm*XHC()J$HKuzwl?csOn3j7c1xrO+Rkh8 z{|mdHqC*bux$8&0dN}&+MZ(geZcn-Fr)z)%npY3v1z{J?AkLjEWqro~kz9!ES(i*D z{QcfZ}t8~U+B&{%3Cv| zQc3UTu*&5z(0v#FbUWV4)R1lQ0AHOPe-oJtvnx>hpsyH_$#|^s(VwLgt$I?0hl*@* zU&CdZsN)*^7U08U=CB<9_*WGZN6)=DW`LsXG@klq>}(Cc9|5IdAKeTCK8a>Y0*kc% zz+VwQ<1EBmRv(DJ#e_e9Oj!E2wS8ncYmfORJg1#8CME3}(~s}FgOI9S%0nY0op?V7Xc z)P70xR1RePh04i}F|Coiti;6~RNrSJQ34~6Os%OxoS8VCi@<>QqgopkJfvFdR6b99@zZbwhJoi^ySnv|M%a`OXCGK8Q2E?Nu_M29oYTS1AFt1*O7%$ z_h#R|XD%q6uiH`!?fYM;X4=daONAlRk1Aiw-VCyF1DL|VTIM0q?NHZt#g0NwZa-9y zx_!&l{?UB_UuJKKqx|IxXiAQ!#L`t)t9L9{7_TOn&3XR86!Ai11s!S{j5SKv+4Zd> z-(G}%p21L(GM`2rj*e@2OBeb1#Q7Q5&OOk&Lk>J4{BZbenk|*LsMv!6NxVjq?TGC} zCv&wWB29*tQQ>w7%V)_9x4DaCY2m}W|8NCaiBYdZcFSrL z2$Z{)kBM*{&KECii)&cn#((t@>Fl)LYHd2f)Od&TD9cM~`G}U&AUuVJgZVF+kq=ck z;Wi&Cs+{euz2D(hEY#tpal6(nr=y@EEwnD;=NVq0G3tmy)=$sW{mNN1Egimg%@@ji zyQpB(S9Nv6hvACCzh?#ACsY0l-+yNfkg=3Y#SEi)NpY!?x=&!U^{e0Pmt=4Jk`H@R z=+e!)0()?1fdA%F6t)zjL4Z2S(h^U`Q)Ify;GF4Vw3u##i1^ zdbHpV&H7BQ6bqC7bZ{o$e&BSvOpkkMgylwZC|5__kn>%n!up`4ZHvnNCBSK+ephAg z*WG_=3Bxgo3Y!wV44~TaAduGhF>LjY+QxL^sNKH&D$cT%mq!G(Dmhyu1Edb*m+1=q zdy+Rpx{nw+bry#_T!|?OnnN8lHnkoh2Uw{=Qr0H=Qo4)AL#y=ljo{&%^oH`~|lA(d!fA7R1(l zEz;b32=rMiydbKwPip@6>A896`=;vvxwH}G!!fu%5yj6mkmnH#sh#>tf2&d^C;Vj& zZN_1t>Um+Z6reooL(kI+egm72zO`Qb|Qof z_m~c{#DH@9pJy^56TlPqh6~jC{PtzIpN}+kzonn<(8I-1cPD0)#Gi`QU;c*X@WCDT zor|(Ik~loe(m$*?@?~#vTq+8Me8a$Vhfv8gSutL#43wY7D&;Rw?|`>mFh+d4sO#>6 z<@!-sh~wg|>bMHS+=1ysYLf?1S9gUNz;-F^)>IDubGi3ob?ij9cQQNQ&MU?{pIe;M z3f2GERbe9d0e2|`7jY2|C(2y=>MLiN9!K)!|}7haZc=LOAh61|J5>%~QuFMz9?zV59IOaY&UlMEL{H0o*S(eTe#QpM32ekew%Dm14G@ls-J5l z3R}1@0jx~Ng+ef25W9VY(Muj(?_1Pcd$z_j_yfC6_HR0Z;mPNxD4^XqrUrq=@bmdi zFli#{SB6l9*!bqJfhq1PQ7te8DX_8w8?`q9+ZhF@RWGJ z;JD9n>&tF;%Iirrhl)eWYUSbC-Z@-rqw7?DQVZbe1DiK$%g%EzdAPnsv9EZbysroj z(qcK@o?lvGn#6(nWk7AQ>0|)SOt~V%$9j4o5`F7Fg8FRe1ko|G->F(D!ieN+akQTpM zhI(lXWe?j-Xi|FtkTXJnNpkCTwA+jgzqM?Ah{KBU-mx6@d`sU%Nc7`)B*|-Me5N#g zx8%PicHtJFPgv5Z=@38jInwIZs+ITVnr1~R`a4vSY{T4-HpSHE^jmPEKz>G*Gu_*- z$sFG~u_q$`d3aVK15#R2y?!IZ?of4@+^t$3`!Fv}@Q!YXN>T&(9E4G9GAPKExSIjQ^lVA|n-hGkja zcV}B>(~zJyyC;r- zm(q^>@FCQJ^Zf^J8|c0-IblN?ncSI!#wR8BSC=cg|J6Lo$~l>c3wqZz$oldq250L5 zrToeXuanC*82qEPV*BW)(COVqX=^q>F;$@8P@vDzPW3c~7&A0!e-WK?tFX^~sbut) z?^&b6AC>Ma2MnD1jE#LDdB8Q9C8W}UztB3zGTZBZIz-$oi9M#CA5j0u!{Qj_5&ZtS z(XOPSP?ctzictp5oZG6lMFAc(KBk!9Nv{~aNlk3{>s)S>iN12Ffv!m+B-{bIL_`Fc z*c}qVFa2n4;jTR1R&-ONv91&6kzhf$i?>5T+7PE%jysi)*bADWVmbcrSd|{325r`$ z0;PwuI_`IcUkPz!T*M6d0?N~0N$81|2Y!6xsOjV=g|x@tF1)5c@-Z~Eeqwhs^iiR9 zh7Y7LaFE@ducc3~$^#6JIpI9A;EvplV?5km>wiOXth(Fp%bTOtdTiHJC$9Wgm;7JU zlo9y}&iQaQ#XxujZ;pigP_~3Yc9EI~??rYZ3shlyv_b+P6^jbU!q44^cAriQQ7Avld4JEZ#`uTgUS!K;O;Gi@#pe-MDuVW3wG`3Qtt)|`8r+4=!_NvJ+I z>V)DudiGc!kj37ArHH1y{fe)&V?ju*X(B;vqUq=qegs=k_2t?dxeVAiM8w`DWEg`W*vwWzQcsvM95j-Xjtd!B71} z?h<;w>e}MgH69j()1^!qN2z2-eD*@FuP|zC$I|`Ct&U);@Mz zLxW^}+AG4kPG%980j)=M!`;TRw>)(j*+2243{&a*|t&t)T*vp~_ zn-J8#HW57k~SWBbBf*E6gOZ3h$qqU@8r{TJ(f}c z@<)(zEk7L8PY&u<^Aeqq^LmwLgo@80==?>RnH+p%;<7R@`8sVg8N*)x0qAj{3JBDf z)`F$^5Oqfm+2EH}Wwr{(C%YmkLMVx9MS`R|v~5gy=}8w=vI3Lm5R<%a_{l@y%zGvF zegj35V(8v0oJ=ZYB1*TfVUdsl)s%h|lRJU-3?+z+3g5f<0A-oEdYBdBmwZjncNIXX zfg*dl9*Rc-mmq*Rw}}Vz`5h)y5CH5X=(@B?wwi=+lH?L2$Fp-YVY0iO#vf|qjHm2Z8jEUSUAi)CF#yQud)F4r64RFe|T;p<{XP zeCNGGhg^0uEnl2wW~U0>6a=IwoE}YBJ*3K`ohh_Hif-k_lf9SBp0C}>%EgH{vzBtA z0o`CPsiRdtX8{H4&3L=w!JeYaKx$TC0d=o3w3{1)(v-+jQ679|XTM=D2%B!xK!34h- zPH(R0ltI-G8+ANb!E7U)7r_d4%2hEwyJzk_p*o!ojaTJiWOI_(L{QVf$^ zpK}o{o|khur=2*OWAl`5YpJu5)u%IeUgI7&@Aq*J;H*L_^X|of&%jKCXs1faw``yZwf{Hi!WBp6{H3!eea_;=i%64eT=InE z4pZ{ea~uA(Bh2qaU*5TWY%ku|TkNCm$9f~;kROg}S+N(2n4>@1fy%~(JM0;s9EqV~ zjjC2e)W*2O*r_x+eoM^zy;7V^8<4Fz%=sHH@Fa3+Mkgq+`cNmotC^eb9nt-p>22<$ zoqs;x0VM?0JkD|?CI8`IUC4MtA=YGYn=>bTMgR3mJF5QU-K-9@%e*U(l{ zcz5MaNMs@74HFe^lTZCG)Ux4Uy+Cj2dp^Csq7-GmiJeJe_!|#Mcc0tofZL_<>Rg`h z+vJ#K-RPR{|Ef=(?hF34+iQP4yf~1iBHORpD>S%wRcyNX0y%sVqxMM!z4!P)HO&b+ zu8aKLM7v5y97Dk~%}L?DRxWGA2$`cPrrcjqN)wo17m{j5-5+-U>?%AYCepocar1FwN!J3#MOItT-r% z8C@t@U3Qg!Z}z=OlE3521DzYE!)|}H|GQ)Hx3A0rm`g5G2Fe^E@j7?%fKUG1%>a7# z@5A7%zv}gU{>lCO_n%~CPDARixuwiFb|siZ!m7YL7xtu4SGsp!jW@D0!CApa-19r$p??b=>@rW1C3Q z<~7-(^V^s%u36C)UV@dQgckD#Z$Iv2nB{TqCiFVyeJLj9m~Ut61i8mx(|5dJyL@{l%ekBhF@oKspbn;Ph;dEz{UWxoG)WQjWZbd%wY10`fdXkY`=1}f(fRq&)M7} zPIw!dUf{>l_{Get?#7Dk3uB1b4ia)B4FseY!8X<(4wfyT%#kNEi;QsP_DKS zC4IVtwaXw)2c4FMGrtRw=@_8Wse_L=Iq&b$d(Fqn+dBx%o&7(Jt&17mT|p{Vd;=#V ztG4vDn}}=@BwD$>R|c+9Rnt6xuMyFFfyFkSxY>9N7s+*Eek?2N0_Jyk=@biSqcdRf z=I5BL;tyRfV9kD$xj*tGlh>?rC!2|dWP~?$hVC;Kb(nr=Q@=eCENZ?qF&Dv0sXLD* zUS=jeuSkUIjbqJDsVL!vC7qD{E%$Q1TQiE$X54CFF!T3R19V!Lm*donNLS5=YW0}l z+hPCs#g(ZCmxs1>$Lw1L=(_XUw(Iou|0TM!xhCXn>S2LZMd~~SeWyqHm-?KvvM!2+ zS}#bEg{9qGOmK^vy5Vb5FgUjbdB(}0VjDU+OOv;aj{wrv?!^^|tdHLQqwxl3)g|#- zX} z9|JE8WUEFBjp+9{`ic)(36$%j#Nc58N}CoM&*6>5zaNMcCpd51QE-;yD*g6B?ZUzD zh5DZzN7~cn5qy)E>0 zW4a+RrV_Z}twNCvohzNJ8Q;Mj=l)gSF0^{O-77&NBiL?o@EArl$) z7d5KG{f=~G?THB;`1PP6n_b_?EQv~DHGt?+F$K!L-wCPIIOX_J>Q;k6aFWx zqe-WP{pNSLK$1R!!JpteeC?d-<1k&_#&2De(H5V@PJ5SbFndGjH~sXk+mLtB*W16v zCeV3v!t6yE!xvI)IEMlYS$>!q!j><;Z05c*+|i^dVrJ@YTutyCUWT?Pj?^a-z)0+>xN9dFv}=D z$=#Sf0yf+PNYCWW1Ej~k*L1y=4aA+)*BbO!yFWbrM-q%E}p8X7JTTve( zIo;#H&bGbz6*XzUX8TI-SP_d&_M0@zUINsy*I!}y6X@NBV}iV?LOfuR`4zL4ml(3O z{j*U95YQKR{htzjv-%>+& z{VqFt9dNgo4y@{gjc$gSDn_=n$4LI!)MolAi|DY^120D-gwJ*PvYb5WpHyout@UtO5>^l7-Yabr<`yvdIj$N#=@|OdTOo9W;R{(%Em-gEje{)nKiPsT zWbQ~aTr$TowTomBnw2UtnX_spa!!oI`cbI&kr>K&N|3SzWauOXs3uUKTejiQq)PSr zaGW&?vRZMDQAR8ODIEZB`biXqiqs3lupwE^3d*ip}p#<(-EF-lNS80d~;bWZdoI){^T<8mbVGKaAM3zg$| zSxqWG*lsM_oK``P{*;7^H%r_Z+I{ORkril`EJyVdMD9Pv=R(wul9o z2p36kvHwmtGn{5L$ivDqDI2nW@AuqQ>&-!h?vMi4WP7)+zWm~BPdNP;_q7sOZQK>N zP!ZJ`Vx!2%Je|B?aS?J4$t_nk=+UG#ZuxlvwugrgTk=tqN$%>c@Y3dHkp_2xW5YI!_r(weuW~!NY?S7gjn1*MSLFu;ZSeOkOybwFPV_og%dFc!!*Xo##m!g21~Z(hAZDy}H?7E*J%`R0#hNLmjJmR`p$k6@_e5iF>|~M*%ep_nET60dVlKHFzed~&SL2Q*#f<~ zRDDFiN#8Y>?<%6~^$|})T^U?;m+z`J^wQI<#V56$@u->2Ugby`kh~1Nv(>+IJtj0` z#7?DF2IahU5UMsKHXTDhTIu^v@X(sO-E~}_xDJtH%(&IT%B@UYA!aa@qse%wDBO?H zPps|fcX4KRy!dB9y2Go`MK3pLzbMrYA()k;zPFpRfY*ZcM6tCzM&@3U-e zo*#yZOMY?!N>BF#;+cru`-QW8_fr?=IIsgVN`iE>jK2h~;+y6pO(D=`%xoMF6+Tbp zf3-l(wLSk&VfSEa+(BM|EiSs5teI=+p`2|XlbbUr)%UlADY+OK4lf*i@vIY-tU?qW zs}C~0cX+3_p*M}2@)6vTZEOg+^XBbaa%V5|@$eO-+Wx*cEWddH(xX>z1t>89i!Oad z;zbwtGI&}Gz%T}4osG7ReO!ySN%$=wwsK-;m2)z?ycVU8#v1Qm=4~l?+ zPrg?DRvCKWSa>oaSr1s?^+=$;8^HH$kdAR>elb(z2G2V3%h++pjWyX6&42VOHqtS$ z;^AdW5u!hlc)5GY~FcIVFElqwL6jC4TJz_|2Rgw9p>)}7$OB~?yF)aCvtRed8 zcY+`sfNJ~fhNKyh97|QT1TmFIkc#Nkn>(*|jUQ;HJ_UXUeWDdIJdvN7{s@wfu0np} zM?4KbzYFQKE&vssrPwYfby>8gG#X72910Sl%rSL5#hD2x3mEELQEKOo;@4o^VR?Hn z`1gqmB!m9i5cTMdGN;5MYi!M=0ej!o)K+CJLlOs23;TOUWmAE%)Dv&E13u$@LQuWi zV5~${z*r~K38t&cF(hC0&9!KEW*fW1s52u>8*ig7|2{0kPwn=W&lEmoi27CvTmu@) zF~?{DS-aM!fdiYrK>wlEh!prLc%B=;-3H#L1E+9`|EN5(MUeW`L zU!rk^5r@IqK>=8JeeBO&pn|4cV7`8ho&WL?<*<`)PpdVE)ZvD4`l#nD^XaINury0F zsFlMANM#vfh28-K>kANvdq13`N?5Edmlj)9uB;dAn_C5Y|50e188ON9Rrku zghuWWb&=D%45t~0ZLgcRntJE!cF#>cR2U3{oLnL$$&guSLVwT&=ZFuL&DtEu_MWcM zT~t+w4fHo~_7eakZ%KeGVl5Kg9M%ASmHdZZcBH@cbc+>7N}Zkq8Hs)){T?U`Rz^m zbuqnCT^(@tSpbM-i5pk(6o@if0Utp^6eKxSBXjrfaoD=n#eOUtyuU{YGB^zkaXJRfUGH0Xzs5&qI`xisl1Cl) zj6oOm@91~h$?a9OipL4`WOB)}l8(^)gE?`EuU@tI;Zg=CsELUe&gfE3)@JIuF(nhU z()lOv<$=sFy5Aq_nqvokI}6v$Fv;%V26hz?=a>#8TbKrgI9SO?NZ$?}Igg*1{X~Zc zy0zvR(HsXntM&eL`XDD#BSJSCq_&UlFx6DK6+!H`8o}EnI_!<(W8uD2)RJJ7I!!Dn%L}`X(mqg2*Sqp^4SeARWM6< zzIU|Oc(qy16n#vr$mzrE{X!OKXEFFTp-VuVemdgT-*yY*DVfX;Bl2nCJ}J2 zNvv8Ebp6F&d?a%yG0t;2HC^Mx{k((Ijx@#h!L^*9KrsX3&?eupb&u}ZX<>L*L+CdM z=!_;-jpnP%>O)+|k#{2WO?kO?MpzN#BfJ(7X$+;32GhAuwOTBr4`a7i42&NfpQoSp*2gmAPpBYO&Oi0;%|iz|1X^Y%UupHtNhfaeY#e5Oo<{{>yN01udjb8mJ& z&=o_b9kYG`^uB8%OYf3en{KsS zjy$j=mC@c_*ncFlCUOYq36SUUwl_tH3g|9Xjj!3P)=y@>+UXDMJ?+snn@M>o^6rZd60maUm^@D04zVcZFX{ram)QA}p7Tup z1xDzXdh_=pd7%&WnkCu9{7>QRr~PJiqaK{N0_>F7O;(@W#LQQoHF-UKwmGaAQUvr# z@(l?pU`xIoS#8ZuXDjt%20!}R!FjmdIz?&%lo`gm%W&PI|8>Fn9k6XHV|2H{F%=xiA0PRPw6rls~%;tFWDFmWxF%iRZhMn@&QS$c)!(vFB+0PZ@S`|h#fNE)Dm}F}- zMVt5g)4Pap7*PKsJtQpnG~_YoTa7BJF$FHkX`?-EIIz>4(MAWe56EA4j2QPR~bnvyf>ZEGoira=#KvhI8-QUsWZYt`^jw8~dLp^KiXM|4w zwm<#es>{7%ovvW=9bxEkevg=gE_xlF|9?cVd*6!)$zh?ys_59?*nrC#<-4(K_evqV zz0;!A8FQo5Bjo0S^X*Qk4_hp6*^ryq`QR*Q_mCx;EC9YkwC?<)sn}yS&9l2uicV54 zQFToL?qn+F_tunkTs1VM=#vMRBmug3fAII^{VxI0D+}~)5%ag%tP1%qJi>bBo86?Q zFF_K}*CA+!G0o>0|-UyFLsVnn*5olsrOv$C1X2 zjA)qS`YYvuAzDG`kkA-rrw`7+VGeq|9$5dLu0eP&feb-@!pYU%BvPjLkp54W>XJp}$I(9y-spaUAh0G(a90DAH= z*(la|BD5A;z=W!0MEub~E|LCL^Cc%pSs_4=A$RbxvNx2~XU}*WMSr@5yl2D(EDZNLQ0KqgdAK|M zq3l+8g5kY1z31ANBF+oTLq{vGH3eJPS?vE#rY6czQBG#AJ=48mhv9bQ*Xfm#=)>vI zc$mn4IP>#ey`M{764PnP>(4hlbk32lo1u7P%pt+0)4+nUXpyY`4V%~?cVRQk+?eUp z`0>$K5R@;$NVR)iKoM`nguE!4(Q-qic!bMjh^)*#Zq&Ctr%f_J-k^MP z9CnN0hZ_v_|Am4nhl%>_mQhSC~q~&wY;S01P1dHRuB^S zzU<)X?VY%Bgw%-NA}Dt~fKT)M*?D>|GV2~C0g!Y>U5Ax<68+C8?HzeHS(O)Jf%`a~ zR2@-tk+%KOa(*<&I0T)+*_UKa%kCt7&JW3Uy1{%%tn#bw~)iG5?lGT z;?-~dzG!0@9(^r3#aunSt*5LHm8v7M(l|yH91)MnFQ=`(VUICGE=|$$_NV^#oEkq#)A=V_M`MamBQp-F0?WUp9vEPt=9oUG>1|GA-goOhFQZ z0%ZmQwqgc4yU*~vTP$ewUb&wVTO$F)vn zxap2y$vudYZryzA`};^73$j2?zXRnkQZe51Ybwm}k)|g?fa7k&rq%wca&q!(tX=F` ztne@O{_rwh6A^T@EF386JyaoAq;i-kqK|Do!NiO~?+=^5v^OV1%6F|CMgwDw`IDh0vjKXvfeh8#Aik18lYU* z$lS0nu1jty>IKNPP#y4d$dJjWI_UvKA+6I57gct6K;6_Pc_4Hv+mnuOPSu?u8LT-?LovX(2mnUw= zCnA}3Pp6n*PB!?Y{b8t$iQb;FN|U{eSC+~I4nej1wlyx$jeamIRA1(REP|3EEvk0@ zlhdf#6W%r3EzjOE70WVZ6h;VQ{-d{Q$-6_lA=GOjNALJ&wZ0cLXp1?|JoKfq^Y+IrQaLky0U?z#Q|@qPcRm0H2S4bP*6&A3BK_5c1QlS0UszNhD$OyXXM$LU z8UrSsQ9t}iYA8}>ZDgq>xmGNwFBA6r(gu>5i^?vLhWNT>J($jO(#-b-rvAbXhfTDQ z^s(n$8J7`1LcLzG=52c4$BAyW;|8aZtZO-;Z}M=Z%lTG(gm31|CPk=-MwD~rr-(iO z{9j;(U}YR-s!Q1&c##5m7YUM7I9?up0j3&qDH^b}sFzGwLqnZW#Y0 zsn~#e`M$yxaoU{gNU6R+O6Q=)Fw5EL5WoZ02+G(fTToA%c*;*X*@sS4=9s*u{CCo=~34G88GhTBGbgVxlw3H2WBeC4jR&PLBsY$HcBCd;M}M<402zxZcj) zz)z*VzxGoF!_85Y6>*q$mimI2VvWSDt_@y*zU*DN`y-nJ^yUOIXt^Mhr6z6^;&W(u zRvp9-(nw|HZD{hcfH#$Z#QqajVv-FNNk;8s>u9$*$ay;li8U3MaNftbGPb}=5*>?E zTavmXVj0I-whrg4UY`r(d1j#lO|YQvkJ&kKnFDCLxs3X<3+>1flKiMWC^B$$}2z5Rq2M% z?(k9LapGJu<`DT`{p&xznkAr=5q*7<02ED;Uo86P^>rN*&}8Gw{1?cD0651+YodYB zG-OnADVR5s4m{U}c4y=Vv_`c$RbPJuLN88Jo@?k@aLb0tN7NdvT@AYjB~d^M(?}*h zSb`rFvX9rjkE&FLS3rX;B_xn2*$IT`omK1}81hpmfm&?3G+#67}#xVp!y*2*1_u z^-Y@k*b@0c09^P5$sd7!z3RRtKQT=67w}#oc#qUhpYr;dWhP46DFnn2Lt3dfq)fmC z)yO=H3=&jl0za6eG%|sA;5V!HR}7aOUfq#?g3AaWV?q}>2bro90>VRl>-D00+wmU; zri=CK1Iyzne*1l=F9qJFu-3E(hHm7C_K*F}KF}Jv?L}M z23~!CeUecoe6)FQ6|H4+N#?=5pY1DmmeyVz5r<#meU(zZj79}rA2G+Hk4Gm7>k!~} zqK}4k@>B&LoND8wdZFdiUr^YLELT3-ANbzTOldz1>htz1SP(DTyE>1mlZ=jjx8*SE z1=u7<4sdaHCD6Ve?E_t*+0F@5KIcjy+6hD^j0QKfwCoITFl?^AlARv^tlbT;B1`LU zWp$?_TI4idwMQ7I4mX)5F^r?`WldUKkDKg476*5u2KtxATlyykmkiE8h&a(O-FP(2 z{}0dD@l~9c2q%_Z7r+Zj z(MnkiH8e4e(R{^fwnK6s`d)}|A`$PZFqM{ca-vkzaOvf$e>6pIEdIF`j0*a8Q;s3@ z-D-0UYV0utJ}5PP{5R8(SNJK}2h7@x9LV3XilB{tt<(Nx>nCPfqjReKnBoN4A)sMTuSZ4=`m4zMJZ%il1P zxsHmoOibLwi)xyl$Q z_T4@fRARt+Miel`qkuJi+(1)gPd2t1TB8uT^8^fN;ALXJ3pvtTdsY9_1jI}K8+{tg zbR$3T)xV%={{Q|$|CAk%r+NZnoVVSn>ZRk%= - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/outline_camera_alt_48.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/outline_camera_alt_48.xml deleted file mode 100644 index c7b4b2e4a1d..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/outline_camera_alt_48.xml +++ /dev/null @@ -1,5 +0,0 @@ - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/outline_image_48.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/outline_image_48.xml deleted file mode 100644 index a8bb4b2f646..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/outline_image_48.xml +++ /dev/null @@ -1,5 +0,0 @@ - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/prompt_shape.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/prompt_shape.xml deleted file mode 100644 index 5f81396e382..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/prompt_shape.xml +++ /dev/null @@ -1,6 +0,0 @@ - - - - - \ No newline at end of file diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/received_message.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/received_message.xml deleted file mode 100644 index c2288b5bfce..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/received_message.xml +++ /dev/null @@ -1,6 +0,0 @@ - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/sent_message.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/sent_message.xml deleted file mode 100644 index e8d13ca4e12..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/sent_message.xml +++ /dev/null @@ -1,6 +0,0 @@ - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/three_dots.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/three_dots.xml deleted file mode 100644 index afbe22da808..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/drawable/three_dots.xml +++ /dev/null @@ -1,5 +0,0 @@ - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_benchmarking.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_benchmarking.xml deleted file mode 100644 index 6e48b5de8be..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_benchmarking.xml +++ /dev/null @@ -1,16 +0,0 @@ - - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_logs.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_logs.xml deleted file mode 100644 index b327a544f25..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_logs.xml +++ /dev/null @@ -1,55 +0,0 @@ - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_main.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_main.xml deleted file mode 100644 index 52bf533521a..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_main.xml +++ /dev/null @@ -1,241 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_settings.xml b/examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_settings.xml deleted file mode 100644 index 0ec551ae364..00000000000 --- a/examples/demo-apps/android/LlamaDemo/app/src/main/res/layout/activity_settings.xml +++ /dev/null @@ -1,338 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -