Skip to content

Add conversion to fp16#7

Merged
georgepaw merged 3 commits intomainfrom
dev/georgep/f32tof16
Dec 18, 2023
Merged

Add conversion to fp16#7
georgepaw merged 3 commits intomainfrom
dev/georgep/f32tof16

Conversation

@georgepaw
Copy link

Adapt the class which stores the intermediate tracing results to insert casts which will convert the model to fp16.

@georgepaw
Copy link
Author

@DenisVieriu97 I've not adopted the tests in test_mps - we currently document them as a way for exporting the models.

If we parameterize the tests, then the users will no longer be able to do that.

What kind of testing do you think we should do? Perhaps we can do some (ugly!) patching?

@georgepaw georgepaw force-pushed the dev/georgep/f32tof16 branch 3 times, most recently from 0cc41a5 to 54731c4 Compare December 12, 2023 14:42
@DenisVieriu97
Copy link
Owner

DenisVieriu97 commented Dec 12, 2023

@DenisVieriu97 I've not adopted the tests in test_mps - we currently document them as a way for exporting the models.

If we parameterize the tests, then the users will no longer be able to do that.

What kind of testing do you think we should do? Perhaps we can do some (ugly!) patching?

@georgepaw we could use use it as a command line parameter (commented in the code) - nvm, I see you are already doing that.

@georgepaw georgepaw force-pushed the dev/georgep/f32tof16 branch 5 times, most recently from 35a3b96 to 7825bcd Compare December 14, 2023 12:13
@georgepaw georgepaw force-pushed the dev/georgep/f32tof16 branch from a2661be to a480991 Compare December 15, 2023 16:12
@georgepaw georgepaw force-pushed the dev/georgep/f32tof16 branch from a480991 to 2f3dda4 Compare December 18, 2023 14:01
Copy link
Owner

@DenisVieriu97 DenisVieriu97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@georgepaw georgepaw merged commit aa5480c into main Dec 18, 2023
@georgepaw georgepaw deleted the dev/georgep/f32tof16 branch December 18, 2023 18:55
DenisVieriu97 pushed a commit that referenced this pull request Jan 19, 2024
Add an option (enabled by default) to convert the model from FP32 to FP16 (and preserve the input/output signature).
DenisVieriu97 pushed a commit that referenced this pull request Jan 19, 2024
Add an option (enabled by default) to convert the model from FP32 to FP16 (and preserve the input/output signature).
facebook-github-bot referenced this pull request in pytorch/executorch Jan 23, 2024
… (iOS15+, macOS12+) (#1655)

Summary:
This PR changes the MPS Backend runtime to support for **iOS15+/macOS12+** (previous runtime was limited to iOS17/macOS14 only). Additionally, this PR contains changes such as support for both lifted and unlifted graphs, support for torch.export API, optimizations for FP16 and faster model loading during runtime (more information in the summary).

**Summary of changes:**
- Add support for running the models in FP16 (https://github.com/DenisVieriu97/executorch/pull/7 georgepaw)
- Replace the previous MPS runtime from ExecuTorch which was relying on `iOS17` / `macOS` Sonoma APIs for serialization of the MPSGraphExecutable. Instead of creating the MPSGraph nodes and serializing them during AOT, create the corresponding entries of the EdgeIR nodes in the FlatBuffer, and parse them during runtime to construct the graph. This removes any dependency on iOS17 / macOS 14.0 APIs.
- Add support for node visitor pattern:
  - Each node visitor class visits an op and serializes the data into MPSTensors and MPSNodes which get appended in the flatbuffer
  - The entries from the FlatBuffer are parsed in the runtime and based on them the MPSGraph is constructed (for more info see `MPSGraphBuilder` class and corresponding ops from `operators/` folder.
  - This method removes the additional read and writes to disk of the MPSGraphExecutable (once during AOT and once during runtime).

**Models summary:**

| Model   |  FP16    | FP32    |
| :---:   | :---: | :---: |
mul          | <ul><li>- [x] </li> | <ul><li>- [x] </li>  |
add_mul      | <ul><li>- [x] </li> | <ul><li>- [x] </li>  |
linear       | <ul><li>- [x] </li> | <ul><li>- [x] </li>  |
edsr         | <ul><li>- [x] </li> | <ul><li>- [x] </li>  |
Mobilebert   | <ul><li>- [ ] </li> | <ul><li>- [x] </li>  |
mv2          | <ul><li>- [x] </li> | <ul><li>- [x] </li>  |
mv3          | <ul><li>- [x] </li> | <ul><li>- [x] </li>  |
vit          | <ul><li>- [x] </li> | <ul><li>- [x] </li>  |
w2l          | <ul><li>- [x] </li> | <ul><li>- [x] </li>  |
ic3          | <ul><li>- [x] </li> | <ul><li>- [x] </li>  |
ic4          | <ul><li>- [x] </li> | <ul><li>- [x] </li>  |
resnet18     | <ul><li>- [x] </li> | <ul><li>- [x] </li>  |
resnet50     | <ul><li>- [x] </li> | <ul><li>- [x] </li>  |
Llama2       | <ul><li>- [ ] </li> | <ul><li>- [x] </li>  |
emformer_join       | <ul><li>- [x] </li> | <ul><li>- [x] </li>  |
emformer_predict       | <ul><li>- [ ] </li> | <ul><li>- [ ] </li>  |
emformer_transcribe       | <ul><li>- [x] </li> | <ul><li>- [x] </li>  |
dl3       | <ul><li>- [ ] </li> | <ul><li>- [ ] </li>  |

Pull Request resolved: #1655

Reviewed By: cccclai

Differential Revision: D52929916

Pulled By: shoumikhin

fbshipit-source-id: 8bd2ed124311744ebe19fc17eb0ff508621f974a
manuelcandales pushed a commit that referenced this pull request Aug 27, 2025
BNNS copy crashes the process when the dtypes differ
(pytorch#11714).

With the example in this PR
(pytorch#11714), we crash the
process on main. Here is the stack trace from LLDB:

```
Process 19234 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8
libsystem_kernel.dylib`__pthread_kill:
->  0x190ac9388 <+8>:  b.lo   0x190ac93a8    ; <+40>
    0x190ac938c <+12>: pacibsp 
    0x190ac9390 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x190ac9394 <+20>: mov    x29, sp
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000190b0288c libsystem_pthread.dylib`pthread_kill + 296
    frame #2: 0x0000000190a0bc60 libsystem_c.dylib`abort + 124
    frame #3: 0x0000000190910174 libsystem_malloc.dylib`malloc_vreport + 892
    frame #4: 0x0000000190913c90 libsystem_malloc.dylib`malloc_report + 64
    frame #5: 0x000000019091821c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32
    frame #6: 0x000000019d2f4084 libBNNS.dylib`___lldb_unnamed_symbol1620 + 564
    frame #7: 0x000000019d2f5bac libBNNS.dylib`___lldb_unnamed_symbol1628 + 680
    frame #8: 0x000000019d69ce48 libBNNS.dylib`BNNSCopy + 616
    frame #9: 0x000000030c74d950 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy_using_bnns(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&) + 188
    frame #10: 0x000000030c74cfdc _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) + 72
    frame #11: 0x000000030c74ceec _portable_lib.cpython-310-darwin.so`executorchcoreml::MultiArray::copy(executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) const + 148
    frame #12: 0x000000030c7488d4 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 376
    frame #13: 0x000000030c748ac8 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 52
    frame #14: 0x000000019ad33f4c CoreML`CoreML::MultiArrayBuffer::getBytesWithHandler(void (void const*, unsigned long) block_pointer) const + 340
    frame pytorch#15: 0x000000019ad34138 CoreML`-[MLMultiArray(ScopedBufferAccess) getBytesWithHandler:] + 152
    frame pytorch#16: 0x000000030c7485ec _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 296
    frame pytorch#17: 0x000000030c744f68 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::set_outputs(std::__1::vector<executorchcoreml::MultiArray, std::__1::allocator<executorchcoreml::MultiArray>>&, NSArray<MLMultiArray*>*) + 180
```


With this PR, the process succeeds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants