Skip to content

Dijopaul add nnlib#1

Closed
dijopaul wants to merge 63 commits intocad-audio:mainfrom
dijopaul:dijopaul_add-nnlib
Closed

Dijopaul add nnlib#1
dijopaul wants to merge 63 commits intocad-audio:mainfrom
dijopaul:dijopaul_add-nnlib

Conversation

@dijopaul
Copy link
Collaborator

No description provided.

JakobDegen and others added 30 commits May 8, 2024 15:32
Summary: The most recent fbsource version was just released onto crates.io, so builds are failing until we update

Reviewed By: iguridi

Differential Revision: D57091891

fbshipit-source-id: b7a44252e42ad72f81378e50cd404818b454fcbf
…ch#3459)

Summary:
Pull Request resolved: pytorch#3459

Yet another smaller pair of ops.

Reviewed By: manuelcandales

Differential Revision: D56807402

fbshipit-source-id: 04a4a57df88cc1734243fd5c4ef20d1b7fc02a76
…ytorch#3455)

Summary:
Pull Request resolved: pytorch#3455

Continuing rollout of this technique.

Reviewed By: manuelcandales

Differential Revision: D56827786

fbshipit-source-id: ede23e0377b9d70e0f378c3a5342c3bc1c9cd09b
…orch#3458)

Summary:
Pull Request resolved: pytorch#3458

Yet another op that can benefit from compile-time type promotion.

Reviewed By: manuelcandales

Differential Revision: D56831293

fbshipit-source-id: ff79870512e3baaaaeb08a311cb5bf323ebfbe19
…3456)

Summary:
Pull Request resolved: pytorch#3456

Almost done with Tensor ops that can benefit from compile-time promotion!

Reviewed By: manuelcandales

Differential Revision: D56835200

fbshipit-source-id: af3fb1723fd2488c44287ada01764d7bcddd6728
Summary:
Pull Request resolved: pytorch#3457

IIUC, these ops need to support Half but don't. Noticed it as a difference from maximum.

Reviewed By: manuelcandales

Differential Revision: D56846242

fbshipit-source-id: 6b5f85ee77ac6078ae2e82ad1f1944c5d5104340
Summary:
Pull Request resolved: pytorch#3545

Because the current mix of camelCase and snake_case is inconsistent.
ghstack-source-id: 225583077
exported-using-ghexport
bypass-github-export-checks
bypass-github-pytorch-ci-checks
bypass-github-executorch-ci-checks

Reviewed By: SS-JIA

Differential Revision: D57080712

fbshipit-source-id: c3a6a9a5c533c0f4ac5c28f0a8a0e4333d29f5ba
Summary:
Pull Request resolved: pytorch#3552

We currently use `vk_api.h` for inclusion of third-party `vulkan-headers`.

To adhere to the same style, we rename as `vma_api.h` for inclusion of third-party `VulkanMemoryAllocator`. (This also opens the door to renaming our wrapper `MemoryAllocator` to `Allocator` in the next change.)
ghstack-source-id: 225636265
exported-using-ghexport
bypass-github-export-checks
bypass-github-pytorch-ci-checks
bypass-github-executorch-ci-checks

Reviewed By: copyrightly, SS-JIA

Differential Revision: D57126895

fbshipit-source-id: ee1a9ee3af799de33c9e1222f031754ce1da16f2
Summary:
Pull Request resolved: pytorch#3535

Ideally it shouldn't happen, but if we post process the weight somehow too much it might happen. In Android, it just seg fault directly if it's outside of the range without error message. After this change, it's clearer:
```
E 00:00:00.180911 executorch:bpe_tokenizer.cpp:155] token 18446744073709551615 is out side of vacab range 512
Aborted
```

Reviewed By: larryliu0820

Differential Revision: D57057026

fbshipit-source-id: 838260d60b75e7c392d7f496d7cdf6f81957f56c
…h#3560)

Summary:
This fixes problem with clone and view unit test if TOSA is not installed

Pull Request resolved: pytorch#3560

Reviewed By: mergennachin

Differential Revision: D57161708

Pulled By: digantdesai

fbshipit-source-id: e4b6733ef4da89bb894c5197a90912eaa7fe4b5c
Summary:
Action item for: pytorch#3561

There are torch nightlies where yaml files don't exist (pytorch/pytorch#124941) in certain wheels.

Pull Request resolved: pytorch#3562

Test Plan: `unzip -t torch-2.4.0.dev20240507+cu118-cp38-cp38-linux_x86_64.whl | grep yaml` and make sure yaml files exist.

Reviewed By: tarun292

Differential Revision: D57163422

Pulled By: mergennachin

fbshipit-source-id: ba087ccd31315932c3203d5d7e5ec0d7a19878b6
Summary:
Pull Request resolved: pytorch#3537

Fixing P1215895395

Reviewed By: tarun292

Differential Revision: D56325190

fbshipit-source-id: a0d6edf84fa783f11b31f3340a94851738cb50b1
Summary:
Pull Request resolved: pytorch#3453

Noticed this inconsistency with clamp.

Reviewed By: manuelcandales

Differential Revision: D56846313

fbshipit-source-id: 2fd891fd774101ad56c21cbea4984e2d9a7c9c20
…torch#3487)

Summary:
Pull Request resolved: pytorch#3487

Finally getting close to the end of compile-time promotion for Tensor ops!

Reviewed By: manuelcandales

Differential Revision: D56855548

fbshipit-source-id: ca93db620c88babbb8ae0c7dc7d6a569c3bd13d6
…me (pytorch#3532)

Summary:
Pull Request resolved: pytorch#3532

another in a long line of fixes.

Reviewed By: manuelcandales

Differential Revision: D56896048

fbshipit-source-id: c945f2bf4028944591332e9eda3ce8046d7cc049
…d time (pytorch#3533)

Summary:
Pull Request resolved: pytorch#3533

Yet another pair of ops.

Reviewed By: manuelcandales

Differential Revision: D57023819

fbshipit-source-id: b3ce993c6926d0e1e277278e8a5a4638429a4a1e
pytorch#3534)

Summary:
Pull Request resolved: pytorch#3534

Yet another optimized op.

Reviewed By: manuelcandales

Differential Revision: D57028967

fbshipit-source-id: a8203e8cca86beadf352630893e8822dd022c819
Summary:
Pull Request resolved: pytorch#3405

updated all existing callsites to use the previous default value of False.

when extract_delegate_segments is set to False (previous behavior), the backend blob data is part of the flatbuffer serialized program. this leads to higher memory consumption, as backends may not need the input blob post initialization, but cannot free the memory as it's part of the flatbuffer.

when extract_delegate_segments is set to True, the backend blob data is extracted into separate segments. this way, each backend can choose to free the memory after initialization if it is no longer needed. this reduces peak memory consumption as a result. the con is that this leads to an increased program size due to internal padding between the flatbuffer program and the extracted segments

Reviewed By: JacobSzwejbka, cccclai, dbort, zonglinpengmeta

Differential Revision: D56712292

fbshipit-source-id: 0f29972357b3a8288f170ce4a00f0d7b043036e5
Summary:
Pull Request resolved: pytorch#3553

This change is a no-op and simply refactors existing classes.

I'm trying to learn what's going on within this `api/` folder.

`Resource.*` files can be split to be less intimidating to readers. I'm also thinking we can flatten the hierarchy more in future changes.
ghstack-source-id: 225755561
exported-using-ghexport
bypass-github-export-checks
bypass-github-pytorch-ci-checks
bypass-github-executorch-ci-checks

Reviewed By: SS-JIA

Differential Revision: D57126893

fbshipit-source-id: 0a5f729c70351a62647af70225d8288faa5af035
Summary:
Pull Request resolved: pytorch#3554

TSIA
ghstack-source-id: 225755563
exported-using-ghexport
bypass-github-export-checks
bypass-github-pytorch-ci-checks
bypass-github-executorch-ci-checks

Reviewed By: yipjustin

Differential Revision: D57126896

fbshipit-source-id: 2c4467b07730ca3371244fb95067a3b3b654163c
Summary:
Pull Request resolved: pytorch#3546

## Context
Pipeline creation involves the compilation of shader SPIR-V code into machine-specific code. This makes the application's first model-load via the `Program::load_method()` ET-API very slow, due to the creation of compute pipelines via the `vkCreateComputePipelines()` VK-API. To amortize that cost, Vulkan offers a [Compute Pipeline Cache API](https://docs.vulkan.org/guide/latest/pipeline_cache.html). Following [this Vulkan example](https://github.com/KhronosGroup/Vulkan-Samples/tree/main/samples/performance/pipeline_cache), we can (A) retrieve the compiled machine-specific code saving it to a file and (B) load it to a file next time. For an internal model executing on a resource-constrained device, this improves model-load time from ~1200ms to ~500ms.

## This change
We implement both (A)+(B) ET-VK logic. Note that these changes are actually no-op unless you initialize the `pipeline_cache_file_path` manually. The expectation is for the client application to specify the file path of their pipeline cache data if they want to leverage this optimization. In a future ET-wide change, we will expose the file_path config parameter to the ET-API.
ghstack-source-id: 225763792
bypass-github-export-checks
bypass-github-pytorch-ci-checks
bypass-github-executorch-ci-checks

Reviewed By: SS-JIA

Differential Revision: D57085276

fbshipit-source-id: 993dc55d5930c913884ad455f359b62afb75bf87
Summary:
Pull Request resolved: pytorch#3131

Ref: https://forums.developer.apple.com/forums/thread/105088

*If you’re going to record a single number, this footprint value is a good one to use. I don’t think we guarantee that it’ll align with the Xcode memory gauge, but it’s much more useful value than all the older stuff (like resident_size)*

Therefore revise it.

Reviewed By: shoumikhin

Differential Revision: D56290391

fbshipit-source-id: f3f408f2677c3788a25e46af0df0eccd23a21c2f
Summary:
Pull Request resolved: pytorch#3266

There are use cases where we might like to supply a separate ExecutorchBackendConfig for each method in the model. An example use case is where we might want to alloc inputs for one method and not alloc them for another. In order to support this, in this diff we add support for passing in a dictionary of configs to `to_executorch`.

Reviewed By: JacobSzwejbka, cccclai

Differential Revision: D56499598

fbshipit-source-id: e02947597e4f898c7b4963b5922904f3b642a5e5
Summary:
Pull Request resolved: pytorch#3128

XNNPACK broke Backwards Compatibility by forcing pooling operations to reduce dims, and introducing a flag to allow these operation to keep the dims.

This is backwards breaking because previously XNNPACK would not keep the dims if no flag was given, now a flag must be specified to keep the dims. While initially we proposed the inverse to maintain backwards compatibility, they have encountered breakages and have decided to commit to this breakage. As we are a downstream dependency, and will have to accept this breakages ourselves, it is important that we break early before this is used in any production code.

As a result we break BC Here by accepting XNNPACK's change.

```
git diff roll_back_commit > rollback.patch
cd fbsource/fbcode/xplat/third-party/XNNPACK/XNNPACK
git apply ../../../../../../rollback.patch
```

We have to update the change in ExecuTorch, as a result we change the XNNPACK dep we are pointing to the branch containing these changes here:
https://github.com/digantdesai/XNNPACK/commits/et_v21/

Reviewed By: digantdesai

Differential Revision: D56271242

fbshipit-source-id: f05a0c98cb3b8e0b52ded9480ae5fb3ac71bbc14
Summary:
Pull Request resolved: pytorch#3551

Adding a test for qc8 linear

Reviewed By: digantdesai

Differential Revision: D55941565

fbshipit-source-id: ecc870dbd879e00790a1052aaf3b4be748b02c94
Summary:
Pull Request resolved: pytorch#3565

If the `view_copy` op is a graph output, leave it as a view_copy for now since the output pointer may be modified at runtime when deploying on device.

Right now, the modified pointer would be ignored since the view_copy op will always point to its predecessor memory.

cc chrismthompson jcoriell fengwang

Reviewed By: JacobSzwejbka, metascroy

Differential Revision: D57132664

fbshipit-source-id: b97bd81166b728c306fae8b212aeb5e38348b391
Summary:
Pull Request resolved: pytorch#3520

I need an update sysinfo for some Windows stuff, the previous version doesn't seem to work correctly (plus the documented API has changed a bit and the examples just don't build).

Reviewed By: JakobDegen

Differential Revision: D56913751

fbshipit-source-id: 30ba14269792ad46236929ba40071974dd1ce436
)

Summary:
Backends use a static initializer to register themselves. We have an established solution to forcing the Apple linker to load the object files containing said initializer, so let's use it for CoreML.

Pull Request resolved: pytorch#3556

Test Plan: Attempt to load a CoreML PTE from Python no longer fails with error about the backend not being registered

Reviewed By: mikekgfb

Differential Revision: D57136490

Pulled By: swolchok

fbshipit-source-id: 613d7f786fa47f34a94ee4eea7b2a81ef670a573
Summary:
Pull Request resolved: pytorch#3574

We will still merge two serialization, but for now just fix the missing imports.

Reviewed By: tugsbayasgalan

Differential Revision: D57219425

fbshipit-source-id: 21ca361c5041872e90fa779b6b75027c7e3585b8
Summary:
see [T188128067](https://www.internalfb.com/intern/tasks/?t=188128067)
and pytorch/torchchat#726

Pull Request resolved: pytorch#3526

Reviewed By: huydhn

Differential Revision: D57032452

Pulled By: lucylq

fbshipit-source-id: 9770d5fea83b551518e8b14579f4e40baef85195
mcr229 and others added 24 commits May 13, 2024 11:32
Summary:
Pull Request resolved: pytorch#3578

XNNPACK doesn't support max pooling with ceil mode, so we should not be partitioning these nodes where ceil mode is True

Resolving this issue:
pytorch#3567

Reviewed By: mergennachin, digantdesai

Differential Revision: D57228128

fbshipit-source-id: ee57a783d314d69ebef57f0e1707c0d038582a31
)

Summary:
Pull Request resolved: pytorch#3575

.

Reviewed By: mergennachin

Differential Revision: D57225246

fbshipit-source-id: c0145c4dcabd8aff8fbb38d51ff6e4ddcc87f48c
Summary:
Summary of changes:
- support for scatter slice
- enable index put

With whole model delegation, I am seeing following crash in llama2:
```
in _verify_exported_program_signature
    raise SpecViolationError(
torch._export.verifier.SpecViolationError: Buffer output getitem_1 does not point to a buffer that exists.
Dict of buffers that are mutated, in order: {'getitem_1': 'layers_0_attention_SDPA_kv_cache_k_cache', 'getitem': 'layers_0_attention_SDPA_kv_cache_v_cache', 'getitem_3': 'layers_1_attention_SDPA_kv_cache_k_cache', 'getitem_2': 'layers_1_attention_SDPA_kv_cache_v_cache', 'getitem_5': 'layers_2_attention_SDPA_kv_cache_k_cache', 'getitem_4': 'layers_2_attention_SDPA_kv_cache_v_cache', 'getitem_7': 'layers_3_attention_SDPA_kv_cache_k_cache', 'getitem_6': 'layers_3_attention_SDPA_kv_cache_v_cache', 'getitem_9': 'layers_4_attention_SDPA_kv_cache_k_cache', 'getitem_8': 'layers_4_attention_SDPA_kv_cache_v_cache'}
Buffer nodes available: []
```

Commands to lower llama2 to MPS:
- python -m examples.models.llama2.export_llama  -kv  --mps
- python3 -m examples.apple.mps.scripts.mps_example --model_name="llama2"

Pull Request resolved: pytorch#3399

Reviewed By: shoumikhin

Differential Revision: D57293487

Pulled By: cccclai

fbshipit-source-id: a7ea392dc3c14b3538416b492d512aec71a0524e
Summary:
Pull Request resolved: pytorch#3593

A followup to pytorch#2209 and pytorch#2228

Consistent with https://github.com/pytorch/executorch/blob/main/.ci/docker/requirements-ci.txt

Reviewed By: lucylq

Differential Revision: D57289472

fbshipit-source-id: e155f627fe537be85129c70da5fa91ae555b010a
Summary:
Pull Request resolved: pytorch#3496

This adds the sgd_optimizer header to executorch. would appreciate some thoughts on where to place this file.

Reviewed By: JacobSzwejbka

Differential Revision: D56888378

fbshipit-source-id: 17d6bb3975ae2d58aee911ee91a3ff07acbc6850
Summary:
Pull Request resolved: pytorch#3564

.

Reviewed By: mcr229

Differential Revision: D57172491

fbshipit-source-id: c7724130d973ca8e7df510e9d5eb95c329c4c2bd
Summary:
Pull Request resolved: pytorch#3603

Some headers are relying on transitive includes, that may be missing when building for different platforms, so we have to include everything explicitly. Also, need to use quotes over angle parenthesis for local headers includes.

Reviewed By: kirklandsign

Differential Revision: D57340360

fbshipit-source-id: dedc9737314231be5255c06c3ad7c9a800b247b8
Summary:
Pull Request resolved: pytorch#3602

The previous implementation of ignoring `view_copy` on outputs was incorrect
in that it only checked `node.next` instead of all users of the node.
`node.next` just selects the next node in topological order, which may or
may not be the output if there is more than one output. In the case of more
than one output, the next node may not be related at all!

Check if any of the users of the node are an output instead.

Reviewed By: metascroy, mcremon-meta

Differential Revision: D57299853

fbshipit-source-id: 6a373181f6bdd58444e0c859fce320d576b7f749
Summary:
Pull Request resolved: pytorch#3607

.

Reviewed By: kirklandsign

Differential Revision: D57348926

fbshipit-source-id: f867150138f2b8162ea51de245a980606022f018
Summary:
Pull Request resolved: pytorch#3594

As title.

Implementation is rather simple because the shaders just have to accumulate the `mat2` shader across the width dim rather than the height dim.

Reviewed By: yipjustin

Differential Revision: D57203869

fbshipit-source-id: 08932a75e66924a0dfb0816f8ccefa718a341dd8
…s ios ci issue (pytorch#3610)

Summary:
Pull Request resolved: pytorch#3610
as title

Reviewed By: shoumikhin

Differential Revision: D57353025

fbshipit-source-id: f45ffa81a1d877238cac3068bae9ecf3b365230f
Summary:
Add CMake rule for tests, and a script to invoke it.

Pull Request resolved: pytorch#3606

Reviewed By: larryliu0820

Differential Revision: D57343746

Pulled By: kirklandsign

fbshipit-source-id: 289a37fb97c7f80cab44aa2ba30b859d1d527e59
Summary:
Pull Request resolved: pytorch#3614
overriding_review_checks_triggers_an_audit_and_retroactive_review
Oncall Short Name: executorch

Differential Revision: D57368772

fbshipit-source-id: aea98235a4bf936bc462eca353d8f47463fd8a2e
Summary:
Pull Request resolved: pytorch#3576

^

bypass-github-export-checks

Reviewed By: bigfootjon

Differential Revision: D57225582

fbshipit-source-id: 30edd8ebe70468eee5c8586060e849a53c9828d9
Updating .gitmodules
Removing extra entry in .gitmodule
@dijopaul dijopaul marked this pull request as draft May 30, 2024 16:27
cad-audio pushed a commit that referenced this pull request Jun 28, 2024
Summary: Pull Request resolved: pytorch#3763

Reviewed By: itamaro, tarun292

Differential Revision: D51566750

fbshipit-source-id: 654c426d479833867e93083e9b55786abfc24a32
@dijopaul dijopaul closed this Jul 17, 2024
@dijopaul dijopaul deleted the dijopaul_add-nnlib branch July 26, 2024 12:06
cad-audio pushed a commit that referenced this pull request Jun 27, 2025
Differential Revision: D75911655

Pull Request resolved: pytorch#11344
zonglinpeng pushed a commit that referenced this pull request Sep 11, 2025
BNNS copy crashes the process when the dtypes differ
(pytorch#11714).

With the example in this PR
(pytorch#11714), we crash the
process on main. Here is the stack trace from LLDB:

```
Process 19234 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8
libsystem_kernel.dylib`__pthread_kill:
->  0x190ac9388 <+8>:  b.lo   0x190ac93a8    ; <+40>
    0x190ac938c <+12>: pacibsp 
    0x190ac9390 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x190ac9394 <+20>: mov    x29, sp
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000190b0288c libsystem_pthread.dylib`pthread_kill + 296
    frame #2: 0x0000000190a0bc60 libsystem_c.dylib`abort + 124
    frame #3: 0x0000000190910174 libsystem_malloc.dylib`malloc_vreport + 892
    frame #4: 0x0000000190913c90 libsystem_malloc.dylib`malloc_report + 64
    frame #5: 0x000000019091821c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32
    frame #6: 0x000000019d2f4084 libBNNS.dylib`___lldb_unnamed_symbol1620 + 564
    frame #7: 0x000000019d2f5bac libBNNS.dylib`___lldb_unnamed_symbol1628 + 680
    frame #8: 0x000000019d69ce48 libBNNS.dylib`BNNSCopy + 616
    frame #9: 0x000000030c74d950 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy_using_bnns(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&) + 188
    frame #10: 0x000000030c74cfdc _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) + 72
    frame #11: 0x000000030c74ceec _portable_lib.cpython-310-darwin.so`executorchcoreml::MultiArray::copy(executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) const + 148
    frame #12: 0x000000030c7488d4 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 376
    frame #13: 0x000000030c748ac8 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 52
    frame #14: 0x000000019ad33f4c CoreML`CoreML::MultiArrayBuffer::getBytesWithHandler(void (void const*, unsigned long) block_pointer) const + 340
    frame #15: 0x000000019ad34138 CoreML`-[MLMultiArray(ScopedBufferAccess) getBytesWithHandler:] + 152
    frame #16: 0x000000030c7485ec _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 296
    frame #17: 0x000000030c744f68 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::set_outputs(std::__1::vector<executorchcoreml::MultiArray, std::__1::allocator<executorchcoreml::MultiArray>>&, NSArray<MLMultiArray*>*) + 180
```


With this PR, the process succeeds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.