Add Metal backend core ETMetal runtime.#15020
Conversation
|
Stack from ghstack (oldest at bottom): |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15020
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit d37e7ef with merge base 6e0c9f6 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| // Commit buffer and allow immediate reuse for better performance | ||
| [commandBuffer_ commit]; | ||
| ET_LOG(Debug, "ETMetalStream::commitAndContinue: Committed buffer %p with continue", commandBuffer_); | ||
|
|
There was a problem hiding this comment.
Don't you need to release the buffer after commit?
[commandBuffer_ release];
commandBuffer_ = nil;
There was a problem hiding this comment.
With commitAndContinue we commit and continue reusing the buffer, until we flush. So, we don't release it after commit.
| if (cps_) [cps_ retain]; | ||
| if (func_) [func_ retain]; |
There was a problem hiding this comment.
Why do you need these retain lines? Aren't these already owned by the class?
There was a problem hiding this comment.
I think retaining ensures that the objects remain valid for the lifetime of the ETMetalKernelFunction instance, preventing them from being deallocated elsewhere. Without the retains cps_ and func_ could become dangling pointers.
Similar patter followed in PyTorch here
| // Don't release encoder_ here - the stream owns it | ||
| // Only clean up our own references | ||
| if (cps_) { | ||
| [cps_ release]; | ||
| cps_ = nil; | ||
| } | ||
| if (func_) { | ||
| [func_ release]; | ||
| func_ = nil; | ||
| } | ||
|
|
||
| encoder_ = nil; // Clear reference without releasing |
There was a problem hiding this comment.
Just this?
cps_ = nil;
func_ = nil;
encoder_ =nil;
There was a problem hiding this comment.
No, we need to release them, because we own them now.
Same pattern followed in PyTorch here
| resultsDictionary:results | ||
| executionDescriptor:nil]; | ||
|
|
||
| //synchronize(syncType); |
There was a problem hiding this comment.
left over of debugging, deleted.
| extern "C" { | ||
| #endif | ||
|
|
||
| // Memory management functions for Metal |
There was a problem hiding this comment.
Docblock about the memory management aspect of this design, such as buffer lifecycle, thread safety etc.
There was a problem hiding this comment.
Buffer management is something I want to change. I think I want to have some kind of RAII instead of a global map. But I was hoping to do that after this first landing. Is it ok if I add documentation here later?
|
|
||
| // C++ only - expose the Metal buffer mapping | ||
| #ifdef __OBJC__ | ||
| extern std::unordered_map<void*, MTLBuffer_t> ptr_to_mtl_buffer; |
There was a problem hiding this comment.
do you need a lock and thread safety to access ptr_to_mtl_buffer?
There was a problem hiding this comment.
It is currently working without one, because the operations are being executed in sequence. I don't see any CPU threading in the AOTI MPS side. But again, I want to replace this design with something more robust.
This commit introduces the foundational Metal backend runtime. Key features: - ETMetalStream for managing Metal devices, command queues, buffers, and synchronization. - ETMetalShaderLibrary for compiling Metal shader source and caching pipeline states. - ETMetalKernelFunction for kernel argument binding, dispatching, and synchronization with stream-managed encoders. - Added global buffer management and pointer tracking between host and Metal buffers. - Added global stream management utilities and synchronization helpers This provides the necessary runtime primitives for executing compute shaders and MPSGraph workloads. ghstack-source-id: ea4fbb5 ghstack-comment-id: 3392300041 Pull-Request: pytorch#15020
|
|
||
| @autoreleasepool { | ||
| // Case 1: Device-to-device copy - use GPU blit encoder (most efficient) | ||
| if (src_is_device && dst_is_device) { |
There was a problem hiding this comment.
Do we need this? Shouldn't we already have unified memory for all the M series?
| // Global storage to keep shared_ptr alive while raw pointers are used | ||
| static std::unordered_map<ETMetalKernelFunction*, std::shared_ptr<ETMetalKernelFunction>> function_storage; |
There was a problem hiding this comment.
Huh. Can you explain why we need this? Is the raw pointer mapping to its own shared_ptr version?
There was a problem hiding this comment.
We need this to keep the shared_ptr alive, while the raw pointer is still in use
| // ======================= | ||
| // ETMetalStream - Metal command buffer and synchronization management |
There was a problem hiding this comment.
noob question, how much of https://github.com/pytorch/executorch/blob/main/backends/apple/mps/runtime/MPSStream.h can be reused here?
There was a problem hiding this comment.
They are all adaptations of the PyTorch's MPSStream.h.
I think with enough time (not right now) we should refactor PyTorch's MPS classes in order to be ATen agnostic, and be able to reuse them.
There was a problem hiding this comment.
So, basically what I am saying is, I don't think we should reuse code from the ET MPS backend, instead, we should refactor PyTorch's code and reuse that.
This commit introduces the foundational Metal backend runtime. Key features: - ETMetalStream for managing Metal devices, command queues, buffers, and synchronization. - ETMetalShaderLibrary for compiling Metal shader source and caching pipeline states. - ETMetalKernelFunction for kernel argument binding, dispatching, and synchronization with stream-managed encoders. - Added global buffer management and pointer tracking between host and Metal buffers. - Added global stream management utilities and synchronization helpers This provides the necessary runtime primitives for executing compute shaders and MPSGraph workloads.
This commit introduces the foundational Metal backend runtime.
Key features:
This provides the necessary runtime primitives for executing compute shaders and MPSGraph workloads.