Metal backend: Implement the AOTI MPS shim#15022

Merged

manuelcandales merged 46 commits intomainfrom

gh/manuelcandales/142/head

Oct 17, 2025

Contributor

manuelcandales commented Oct 10, 2025

Includes:

Shader library management
Kernel function handling
Command buffer execution
Metal memory operations

manuelcandales added 7 commits

October 10, 2025 13:29


          Update

[ghstack-poisoned]


          Update

d036c07

[ghstack-poisoned]


          Update

1a22c5e

[ghstack-poisoned]


          Update

d6f0bc9

[ghstack-poisoned]


          Update

7e11615

[ghstack-poisoned]


          Update

dfa435a

[ghstack-poisoned]


          Update

648ee07

[ghstack-poisoned]

Contributor Author

manuelcandales commented Oct 10, 2025 •

edited

Loading

Stack from ghstack (oldest at bottom):

manuelcandales requested review from cccclai and shoumikhin as code owners

October 10, 2025 21:01

pytorch-bot bot commented Oct 10, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15022

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Cancelled Job

As of commit 6f6fd58 with merge base 6e0c9f6 ():

CANCELLED JOB - The following job was cancelled. Please retry:

Test CUDA Builds (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This was referenced Oct 10, 2025

[ET][Metal] Update aoti_common with additional AOTI functions needed by Metal backend #15003

Merged

Add Metal backend Python preprocessing, partitioning, and tests #15015

Merged

Add Metal backend type definitions and utilities #15019

Merged

Add Metal backend core ETMetal runtime. #15020

Merged

Metal backend: Add AOTI shims for memory management #15021

Merged

Metal backend: Add operator implementations #15023

Merged

Add Metal backend build system and runtime integration #15024

Merged

meta-cla bot added the CLA Signed label

manuelcandales requested review from larryliu0820 and mergennachin and removed request for cccclai and shoumikhin

October 10, 2025 21:03

manuelcandales added 4 commits

October 11, 2025 15:47


          Update

ca5f1e5

[ghstack-poisoned]


          Update

7e971b0

[ghstack-poisoned]


          Update

f12117b

[ghstack-poisoned]


          Update

5dfcd4f

[ghstack-poisoned]

mergennachin reviewed

View reviewed changes

backends/apple/metal/runtime/shims/shim_mps.mm

Comment on lines +453 to +464

+                          auto src_mtl_buffer = (id<MTLBuffer>)src_buffer;
+                          auto dst_mtl_buffer = (id<MTLBuffer>)dst_buffer;
+                          uint8_t* src_contents = static_cast<uint8_t*>([src_mtl_buffer contents]);
+                          uint8_t* dst_contents = static_cast<uint8_t*>([dst_mtl_buffer contents]);
+                          if (!src_contents || !dst_contents) {
+                              ET_LOG(Error, "aoti_torch_mps_copy_buffer: Failed to get buffer contents");
+                              return Error::Internal;
+                          }
+                          memcpy(dst_contents + dst_offset, src_contents + src_offset, data_size);

Contributor

mergennachin Oct 12, 2025

aoti_torch_mps_free and aoti_torch_mps_memcpy expect contents ptr, but aoti_torch_mps_copy_buffer expects MTLBuffer objects. Is this intentional?

Isn't it something like this?

              auto src_it = ptr_to_mtl_buffer.find(src_buffer);
              auto dst_it = ptr_to_mtl_buffer.find(dst_buffer);

              if (src_it == ptr_to_mtl_buffer.end()) {
                  ET_LOG(Error, "aoti_torch_mps_copy_buffer: src_buffer %p not found", src_buffer);
                  return Error::InvalidArgument;
              }

              if (dst_it == ptr_to_mtl_buffer.end()) {
                  ET_LOG(Error, "aoti_torch_mps_copy_buffer: dst_buffer %p not found", dst_buffer);
                  return Error::InvalidArgument;
              }

              id<MTLBuffer> src_mtl_buffer = src_it->second;
              id<MTLBuffer> dst_mtl_buffer = dst_it->second;

              ETMetalStream* stream = getCurrentMetalStream();
              stream->copy(src_mtl_buffer, dst_mtl_buffer, data_size, src_offset, dst_offset, SyncType::NONE);

Contributor Author

manuelcandales Oct 15, 2025

It is intentional to make things work in this first landing. It is a workaround, because right now I can't create an ET tensor with a Metal buffer, I need the contents pointer. I think to avoid this workaround I need to make changes in executorch::extension::from_blob. I want to look into this later.

backends/apple/metal/runtime/shims/shim_mps.mm

Comment on lines +416 to +423

+                          id<MTLBuffer> subBuffer = [device newBufferWithBytesNoCopy:buffer_pointer + constant_offset
+                                                                              length:data_size
+                                                                             options:MTLResourceCPUCacheModeWriteCombined | MTLResourceStorageModeShared
+                                                                         deallocator:nil];
+                          if (constant_offset != 0) {
+                              ptr_to_mtl_buffer[buffer_pointer + constant_offset] = subBuffer;  // Map contents to buffer
+                          }

Contributor

mergennachin Oct 12, 2025

Why do you need this at all? subBuffer doesn't seem to be used anywhere

Contributor Author

manuelcandales Oct 15, 2025

This is another workaround, because right now AOTI MPS creates a single buffer with all the constants (data tensors). This is one key difference between AOTI MPS vs AOTI CUDA.
However, when I am calling ops implemented with MPSGraph, which take in data tensors (right now convolution and sdpa), I need to call initWithMTLBuffer to create an MPSGraphTensor from the buffer. But initWithMTLBuffer doesn't let me pass an offset. So, I need to have the data tensor in its own buffer.

backends/apple/metal/runtime/shims/shim_mps.mm

+                  }
+              }
+              AOTITorchError aoti_torch_mps_get_kernel_function(

Contributor

mergennachin Oct 12, 2025

Call @autoreleasepool?

backends/apple/metal/runtime/shims/shim_mps.mm

+                  }
+              }
+              AOTITorchError aoti_torch_mps_start_encoding(

Contributor

mergennachin Oct 12, 2025

call @autoreleasepool

backends/apple/metal/runtime/shims/shim_mps.mm

+              // Pure C dispatch functions - array versions
+              AOTITorchError aoti_torch_mps_dispatch_array(
+                  AOTIMetalKernelFunctionHandle func,
+                  const uint64_t* length,

Contributor

mergennachin Oct 12, 2025

Validate length != nullptr

manuelcandales added 2 commits

October 13, 2025 12:46


          Update

c4c16aa

[ghstack-poisoned]


          Update

ce0f085

[ghstack-poisoned]

manuelcandales added 4 commits

October 13, 2025 18:47


          Update

81c4588

[ghstack-poisoned]


          Update

8b1d309

[ghstack-poisoned]


          Update

422e4ba

[ghstack-poisoned]


          Update

d075361

[ghstack-poisoned]

manuelcandales added a commit to manuelcandales/executorch-1 that referenced this pull request


          Metal backend: Implement the AOTI MPS shim

a272344

Includes:
- Shader library management
- Kernel function handling
- Command buffer execution
- Metal memory operations


ghstack-source-id: cba3048
ghstack-comment-id: 3392300374
Pull-Request: pytorch#15022

manuelcandales added 20 commits

October 14, 2025 20:57


          Update

971a762

[ghstack-poisoned]


          Update

3425f17

[ghstack-poisoned]


          Update

c837491

[ghstack-poisoned]


          Update

aea11e8

[ghstack-poisoned]


          Update

f46adc5

[ghstack-poisoned]


          Update

16d863c

[ghstack-poisoned]


          Update

c80142d

[ghstack-poisoned]


          Update

c3e9d0a

[ghstack-poisoned]


          Update

b782bb5

[ghstack-poisoned]


          Update

cf93ffd

[ghstack-poisoned]


          Update

4eaa345

[ghstack-poisoned]


          Update

71f87b6

[ghstack-poisoned]


          Update

930f6b9

[ghstack-poisoned]


          Update

2667a0c

[ghstack-poisoned]


          Update

95a7024

[ghstack-poisoned]


          Update

f214162

[ghstack-poisoned]


          Update

e8b9828

[ghstack-poisoned]


          Update

d37e7ef

[ghstack-poisoned]


          Update

1506e5f

[ghstack-poisoned]


          Update

6f6fd58

[ghstack-poisoned]

larryliu0820 approved these changes

View reviewed changes

Base automatically changed from gh/manuelcandales/141/head to main

October 17, 2025 01:58

manuelcandales merged commit f4d801a into main

148 of 150 checks passed

manuelcandales deleted the gh/manuelcandales/142/head branch

October 17, 2025 01:58

jirioc pushed a commit to nxp-upstream/executorch that referenced this pull request


          Metal backend: Implement the AOTI MPS shim (pytorch#15022)

2c981e7

Includes:
- Shader library management
- Kernel function handling
- Command buffer execution
- Metal memory operations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed release notes: none