Closed
Conversation
Summary: The most recent fbsource version was just released onto crates.io, so builds are failing until we update Reviewed By: iguridi Differential Revision: D57091891 fbshipit-source-id: b7a44252e42ad72f81378e50cd404818b454fcbf
…ch#3459) Summary: Pull Request resolved: pytorch#3459 Yet another smaller pair of ops. Reviewed By: manuelcandales Differential Revision: D56807402 fbshipit-source-id: 04a4a57df88cc1734243fd5c4ef20d1b7fc02a76
…ytorch#3455) Summary: Pull Request resolved: pytorch#3455 Continuing rollout of this technique. Reviewed By: manuelcandales Differential Revision: D56827786 fbshipit-source-id: ede23e0377b9d70e0f378c3a5342c3bc1c9cd09b
…orch#3458) Summary: Pull Request resolved: pytorch#3458 Yet another op that can benefit from compile-time type promotion. Reviewed By: manuelcandales Differential Revision: D56831293 fbshipit-source-id: ff79870512e3baaaaeb08a311cb5bf323ebfbe19
…3456) Summary: Pull Request resolved: pytorch#3456 Almost done with Tensor ops that can benefit from compile-time promotion! Reviewed By: manuelcandales Differential Revision: D56835200 fbshipit-source-id: af3fb1723fd2488c44287ada01764d7bcddd6728
Summary: Pull Request resolved: pytorch#3457 IIUC, these ops need to support Half but don't. Noticed it as a difference from maximum. Reviewed By: manuelcandales Differential Revision: D56846242 fbshipit-source-id: 6b5f85ee77ac6078ae2e82ad1f1944c5d5104340
Summary: Pull Request resolved: pytorch#3545 Because the current mix of camelCase and snake_case is inconsistent. ghstack-source-id: 225583077 exported-using-ghexport bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: SS-JIA Differential Revision: D57080712 fbshipit-source-id: c3a6a9a5c533c0f4ac5c28f0a8a0e4333d29f5ba
Summary: Pull Request resolved: pytorch#3552 We currently use `vk_api.h` for inclusion of third-party `vulkan-headers`. To adhere to the same style, we rename as `vma_api.h` for inclusion of third-party `VulkanMemoryAllocator`. (This also opens the door to renaming our wrapper `MemoryAllocator` to `Allocator` in the next change.) ghstack-source-id: 225636265 exported-using-ghexport bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: copyrightly, SS-JIA Differential Revision: D57126895 fbshipit-source-id: ee1a9ee3af799de33c9e1222f031754ce1da16f2
Summary: Pull Request resolved: pytorch#3535 Ideally it shouldn't happen, but if we post process the weight somehow too much it might happen. In Android, it just seg fault directly if it's outside of the range without error message. After this change, it's clearer: ``` E 00:00:00.180911 executorch:bpe_tokenizer.cpp:155] token 18446744073709551615 is out side of vacab range 512 Aborted ``` Reviewed By: larryliu0820 Differential Revision: D57057026 fbshipit-source-id: 838260d60b75e7c392d7f496d7cdf6f81957f56c
…h#3560) Summary: This fixes problem with clone and view unit test if TOSA is not installed Pull Request resolved: pytorch#3560 Reviewed By: mergennachin Differential Revision: D57161708 Pulled By: digantdesai fbshipit-source-id: e4b6733ef4da89bb894c5197a90912eaa7fe4b5c
Summary: Action item for: pytorch#3561 There are torch nightlies where yaml files don't exist (pytorch/pytorch#124941) in certain wheels. Pull Request resolved: pytorch#3562 Test Plan: `unzip -t torch-2.4.0.dev20240507+cu118-cp38-cp38-linux_x86_64.whl | grep yaml` and make sure yaml files exist. Reviewed By: tarun292 Differential Revision: D57163422 Pulled By: mergennachin fbshipit-source-id: ba087ccd31315932c3203d5d7e5ec0d7a19878b6
Summary: Pull Request resolved: pytorch#3537 Fixing P1215895395 Reviewed By: tarun292 Differential Revision: D56325190 fbshipit-source-id: a0d6edf84fa783f11b31f3340a94851738cb50b1
Summary: Pull Request resolved: pytorch#3453 Noticed this inconsistency with clamp. Reviewed By: manuelcandales Differential Revision: D56846313 fbshipit-source-id: 2fd891fd774101ad56c21cbea4984e2d9a7c9c20
…torch#3487) Summary: Pull Request resolved: pytorch#3487 Finally getting close to the end of compile-time promotion for Tensor ops! Reviewed By: manuelcandales Differential Revision: D56855548 fbshipit-source-id: ca93db620c88babbb8ae0c7dc7d6a569c3bd13d6
…me (pytorch#3532) Summary: Pull Request resolved: pytorch#3532 another in a long line of fixes. Reviewed By: manuelcandales Differential Revision: D56896048 fbshipit-source-id: c945f2bf4028944591332e9eda3ce8046d7cc049
…d time (pytorch#3533) Summary: Pull Request resolved: pytorch#3533 Yet another pair of ops. Reviewed By: manuelcandales Differential Revision: D57023819 fbshipit-source-id: b3ce993c6926d0e1e277278e8a5a4638429a4a1e
pytorch#3534) Summary: Pull Request resolved: pytorch#3534 Yet another optimized op. Reviewed By: manuelcandales Differential Revision: D57028967 fbshipit-source-id: a8203e8cca86beadf352630893e8822dd022c819
Summary: Pull Request resolved: pytorch#3405 updated all existing callsites to use the previous default value of False. when extract_delegate_segments is set to False (previous behavior), the backend blob data is part of the flatbuffer serialized program. this leads to higher memory consumption, as backends may not need the input blob post initialization, but cannot free the memory as it's part of the flatbuffer. when extract_delegate_segments is set to True, the backend blob data is extracted into separate segments. this way, each backend can choose to free the memory after initialization if it is no longer needed. this reduces peak memory consumption as a result. the con is that this leads to an increased program size due to internal padding between the flatbuffer program and the extracted segments Reviewed By: JacobSzwejbka, cccclai, dbort, zonglinpengmeta Differential Revision: D56712292 fbshipit-source-id: 0f29972357b3a8288f170ce4a00f0d7b043036e5
Summary: Pull Request resolved: pytorch#3553 This change is a no-op and simply refactors existing classes. I'm trying to learn what's going on within this `api/` folder. `Resource.*` files can be split to be less intimidating to readers. I'm also thinking we can flatten the hierarchy more in future changes. ghstack-source-id: 225755561 exported-using-ghexport bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: SS-JIA Differential Revision: D57126893 fbshipit-source-id: 0a5f729c70351a62647af70225d8288faa5af035
Summary: Pull Request resolved: pytorch#3554 TSIA ghstack-source-id: 225755563 exported-using-ghexport bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: yipjustin Differential Revision: D57126896 fbshipit-source-id: 2c4467b07730ca3371244fb95067a3b3b654163c
Summary: Pull Request resolved: pytorch#3546 ## Context Pipeline creation involves the compilation of shader SPIR-V code into machine-specific code. This makes the application's first model-load via the `Program::load_method()` ET-API very slow, due to the creation of compute pipelines via the `vkCreateComputePipelines()` VK-API. To amortize that cost, Vulkan offers a [Compute Pipeline Cache API](https://docs.vulkan.org/guide/latest/pipeline_cache.html). Following [this Vulkan example](https://github.com/KhronosGroup/Vulkan-Samples/tree/main/samples/performance/pipeline_cache), we can (A) retrieve the compiled machine-specific code saving it to a file and (B) load it to a file next time. For an internal model executing on a resource-constrained device, this improves model-load time from ~1200ms to ~500ms. ## This change We implement both (A)+(B) ET-VK logic. Note that these changes are actually no-op unless you initialize the `pipeline_cache_file_path` manually. The expectation is for the client application to specify the file path of their pipeline cache data if they want to leverage this optimization. In a future ET-wide change, we will expose the file_path config parameter to the ET-API. ghstack-source-id: 225763792 bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Reviewed By: SS-JIA Differential Revision: D57085276 fbshipit-source-id: 993dc55d5930c913884ad455f359b62afb75bf87
Summary: Pull Request resolved: pytorch#3131 Ref: https://forums.developer.apple.com/forums/thread/105088 *If you’re going to record a single number, this footprint value is a good one to use. I don’t think we guarantee that it’ll align with the Xcode memory gauge, but it’s much more useful value than all the older stuff (like resident_size)* Therefore revise it. Reviewed By: shoumikhin Differential Revision: D56290391 fbshipit-source-id: f3f408f2677c3788a25e46af0df0eccd23a21c2f
Summary: Pull Request resolved: pytorch#3266 There are use cases where we might like to supply a separate ExecutorchBackendConfig for each method in the model. An example use case is where we might want to alloc inputs for one method and not alloc them for another. In order to support this, in this diff we add support for passing in a dictionary of configs to `to_executorch`. Reviewed By: JacobSzwejbka, cccclai Differential Revision: D56499598 fbshipit-source-id: e02947597e4f898c7b4963b5922904f3b642a5e5
Summary: Pull Request resolved: pytorch#3128 XNNPACK broke Backwards Compatibility by forcing pooling operations to reduce dims, and introducing a flag to allow these operation to keep the dims. This is backwards breaking because previously XNNPACK would not keep the dims if no flag was given, now a flag must be specified to keep the dims. While initially we proposed the inverse to maintain backwards compatibility, they have encountered breakages and have decided to commit to this breakage. As we are a downstream dependency, and will have to accept this breakages ourselves, it is important that we break early before this is used in any production code. As a result we break BC Here by accepting XNNPACK's change. ``` git diff roll_back_commit > rollback.patch cd fbsource/fbcode/xplat/third-party/XNNPACK/XNNPACK git apply ../../../../../../rollback.patch ``` We have to update the change in ExecuTorch, as a result we change the XNNPACK dep we are pointing to the branch containing these changes here: https://github.com/digantdesai/XNNPACK/commits/et_v21/ Reviewed By: digantdesai Differential Revision: D56271242 fbshipit-source-id: f05a0c98cb3b8e0b52ded9480ae5fb3ac71bbc14
Summary: Pull Request resolved: pytorch#3551 Adding a test for qc8 linear Reviewed By: digantdesai Differential Revision: D55941565 fbshipit-source-id: ecc870dbd879e00790a1052aaf3b4be748b02c94
Summary: Pull Request resolved: pytorch#3565 If the `view_copy` op is a graph output, leave it as a view_copy for now since the output pointer may be modified at runtime when deploying on device. Right now, the modified pointer would be ignored since the view_copy op will always point to its predecessor memory. cc chrismthompson jcoriell fengwang Reviewed By: JacobSzwejbka, metascroy Differential Revision: D57132664 fbshipit-source-id: b97bd81166b728c306fae8b212aeb5e38348b391
Summary: Pull Request resolved: pytorch#3520 I need an update sysinfo for some Windows stuff, the previous version doesn't seem to work correctly (plus the documented API has changed a bit and the examples just don't build). Reviewed By: JakobDegen Differential Revision: D56913751 fbshipit-source-id: 30ba14269792ad46236929ba40071974dd1ce436
) Summary: Backends use a static initializer to register themselves. We have an established solution to forcing the Apple linker to load the object files containing said initializer, so let's use it for CoreML. Pull Request resolved: pytorch#3556 Test Plan: Attempt to load a CoreML PTE from Python no longer fails with error about the backend not being registered Reviewed By: mikekgfb Differential Revision: D57136490 Pulled By: swolchok fbshipit-source-id: 613d7f786fa47f34a94ee4eea7b2a81ef670a573
Summary: Pull Request resolved: pytorch#3574 We will still merge two serialization, but for now just fix the missing imports. Reviewed By: tugsbayasgalan Differential Revision: D57219425 fbshipit-source-id: 21ca361c5041872e90fa779b6b75027c7e3585b8
Summary: see [T188128067](https://www.internalfb.com/intern/tasks/?t=188128067) and pytorch/torchchat#726 Pull Request resolved: pytorch#3526 Reviewed By: huydhn Differential Revision: D57032452 Pulled By: lucylq fbshipit-source-id: 9770d5fea83b551518e8b14579f4e40baef85195
Summary: Pull Request resolved: pytorch#3578 XNNPACK doesn't support max pooling with ceil mode, so we should not be partitioning these nodes where ceil mode is True Resolving this issue: pytorch#3567 Reviewed By: mergennachin, digantdesai Differential Revision: D57228128 fbshipit-source-id: ee57a783d314d69ebef57f0e1707c0d038582a31
) Summary: Pull Request resolved: pytorch#3575 . Reviewed By: mergennachin Differential Revision: D57225246 fbshipit-source-id: c0145c4dcabd8aff8fbb38d51ff6e4ddcc87f48c
Summary:
Summary of changes:
- support for scatter slice
- enable index put
With whole model delegation, I am seeing following crash in llama2:
```
in _verify_exported_program_signature
raise SpecViolationError(
torch._export.verifier.SpecViolationError: Buffer output getitem_1 does not point to a buffer that exists.
Dict of buffers that are mutated, in order: {'getitem_1': 'layers_0_attention_SDPA_kv_cache_k_cache', 'getitem': 'layers_0_attention_SDPA_kv_cache_v_cache', 'getitem_3': 'layers_1_attention_SDPA_kv_cache_k_cache', 'getitem_2': 'layers_1_attention_SDPA_kv_cache_v_cache', 'getitem_5': 'layers_2_attention_SDPA_kv_cache_k_cache', 'getitem_4': 'layers_2_attention_SDPA_kv_cache_v_cache', 'getitem_7': 'layers_3_attention_SDPA_kv_cache_k_cache', 'getitem_6': 'layers_3_attention_SDPA_kv_cache_v_cache', 'getitem_9': 'layers_4_attention_SDPA_kv_cache_k_cache', 'getitem_8': 'layers_4_attention_SDPA_kv_cache_v_cache'}
Buffer nodes available: []
```
Commands to lower llama2 to MPS:
- python -m examples.models.llama2.export_llama -kv --mps
- python3 -m examples.apple.mps.scripts.mps_example --model_name="llama2"
Pull Request resolved: pytorch#3399
Reviewed By: shoumikhin
Differential Revision: D57293487
Pulled By: cccclai
fbshipit-source-id: a7ea392dc3c14b3538416b492d512aec71a0524e
Summary: Pull Request resolved: pytorch#3593 A followup to pytorch#2209 and pytorch#2228 Consistent with https://github.com/pytorch/executorch/blob/main/.ci/docker/requirements-ci.txt Reviewed By: lucylq Differential Revision: D57289472 fbshipit-source-id: e155f627fe537be85129c70da5fa91ae555b010a
Summary: Pull Request resolved: pytorch#3496 This adds the sgd_optimizer header to executorch. would appreciate some thoughts on where to place this file. Reviewed By: JacobSzwejbka Differential Revision: D56888378 fbshipit-source-id: 17d6bb3975ae2d58aee911ee91a3ff07acbc6850
Summary: Pull Request resolved: pytorch#3564 . Reviewed By: mcr229 Differential Revision: D57172491 fbshipit-source-id: c7724130d973ca8e7df510e9d5eb95c329c4c2bd
Summary: Pull Request resolved: pytorch#3603 Some headers are relying on transitive includes, that may be missing when building for different platforms, so we have to include everything explicitly. Also, need to use quotes over angle parenthesis for local headers includes. Reviewed By: kirklandsign Differential Revision: D57340360 fbshipit-source-id: dedc9737314231be5255c06c3ad7c9a800b247b8
Summary: Pull Request resolved: pytorch#3602 The previous implementation of ignoring `view_copy` on outputs was incorrect in that it only checked `node.next` instead of all users of the node. `node.next` just selects the next node in topological order, which may or may not be the output if there is more than one output. In the case of more than one output, the next node may not be related at all! Check if any of the users of the node are an output instead. Reviewed By: metascroy, mcremon-meta Differential Revision: D57299853 fbshipit-source-id: 6a373181f6bdd58444e0c859fce320d576b7f749
Summary: Pull Request resolved: pytorch#3607 . Reviewed By: kirklandsign Differential Revision: D57348926 fbshipit-source-id: f867150138f2b8162ea51de245a980606022f018
Summary: Pull Request resolved: pytorch#3594 As title. Implementation is rather simple because the shaders just have to accumulate the `mat2` shader across the width dim rather than the height dim. Reviewed By: yipjustin Differential Revision: D57203869 fbshipit-source-id: 08932a75e66924a0dfb0816f8ccefa718a341dd8
…s ios ci issue (pytorch#3610) Summary: Pull Request resolved: pytorch#3610 as title Reviewed By: shoumikhin Differential Revision: D57353025 fbshipit-source-id: f45ffa81a1d877238cac3068bae9ecf3b365230f
Summary: Add CMake rule for tests, and a script to invoke it. Pull Request resolved: pytorch#3606 Reviewed By: larryliu0820 Differential Revision: D57343746 Pulled By: kirklandsign fbshipit-source-id: 289a37fb97c7f80cab44aa2ba30b859d1d527e59
Summary: Pull Request resolved: pytorch#3614 overriding_review_checks_triggers_an_audit_and_retroactive_review Oncall Short Name: executorch Differential Revision: D57368772 fbshipit-source-id: aea98235a4bf936bc462eca353d8f47463fd8a2e
Summary: Pull Request resolved: pytorch#3576 ^ bypass-github-export-checks Reviewed By: bigfootjon Differential Revision: D57225582 fbshipit-source-id: 30edd8ebe70468eee5c8586060e849a53c9828d9
Updating .gitmodules
Removing extra entry in .gitmodule
…utorch into dijopaul_add-nnlib
cad-audio
pushed a commit
that referenced
this pull request
Jun 28, 2024
Summary: Pull Request resolved: pytorch#3763 Reviewed By: itamaro, tarun292 Differential Revision: D51566750 fbshipit-source-id: 654c426d479833867e93083e9b55786abfc24a32
cad-audio
pushed a commit
that referenced
this pull request
Jun 27, 2025
Differential Revision: D75911655 Pull Request resolved: pytorch#11344
zonglinpeng
pushed a commit
that referenced
this pull request
Sep 11, 2025
BNNS copy crashes the process when the dtypes differ (pytorch#11714). With the example in this PR (pytorch#11714), we crash the process on main. Here is the stack trace from LLDB: ``` Process 19234 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8 libsystem_kernel.dylib`__pthread_kill: -> 0x190ac9388 <+8>: b.lo 0x190ac93a8 ; <+40> 0x190ac938c <+12>: pacibsp 0x190ac9390 <+16>: stp x29, x30, [sp, #-0x10]! 0x190ac9394 <+20>: mov x29, sp (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT * frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8 frame #1: 0x0000000190b0288c libsystem_pthread.dylib`pthread_kill + 296 frame #2: 0x0000000190a0bc60 libsystem_c.dylib`abort + 124 frame #3: 0x0000000190910174 libsystem_malloc.dylib`malloc_vreport + 892 frame #4: 0x0000000190913c90 libsystem_malloc.dylib`malloc_report + 64 frame #5: 0x000000019091821c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32 frame #6: 0x000000019d2f4084 libBNNS.dylib`___lldb_unnamed_symbol1620 + 564 frame #7: 0x000000019d2f5bac libBNNS.dylib`___lldb_unnamed_symbol1628 + 680 frame #8: 0x000000019d69ce48 libBNNS.dylib`BNNSCopy + 616 frame #9: 0x000000030c74d950 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy_using_bnns(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&) + 188 frame #10: 0x000000030c74cfdc _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) + 72 frame #11: 0x000000030c74ceec _portable_lib.cpython-310-darwin.so`executorchcoreml::MultiArray::copy(executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) const + 148 frame #12: 0x000000030c7488d4 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 376 frame #13: 0x000000030c748ac8 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 52 frame #14: 0x000000019ad33f4c CoreML`CoreML::MultiArrayBuffer::getBytesWithHandler(void (void const*, unsigned long) block_pointer) const + 340 frame #15: 0x000000019ad34138 CoreML`-[MLMultiArray(ScopedBufferAccess) getBytesWithHandler:] + 152 frame #16: 0x000000030c7485ec _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 296 frame #17: 0x000000030c744f68 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::set_outputs(std::__1::vector<executorchcoreml::MultiArray, std::__1::allocator<executorchcoreml::MultiArray>>&, NSArray<MLMultiArray*>*) + 180 ``` With this PR, the process succeeds.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.