[MPS] Add support for slice_scatter; enable index_put#3399
[MPS] Add support for slice_scatter; enable index_put#3399DenisVieriu97 wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3399
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit ae4940c with merge base 87d828a ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
I check out this pr and run but still can't repro... |
@cccclai could you please run |
| return n | ||
| return dim | ||
|
|
||
| def get_exapnded_index(self, idx, shape, dim): |
|
I see the metal kernel compilation path is not enabled. Any reason why indexing ops require metal kernels and any plan to enable metal kernel path? Asking because I'm thinking about hooking up a int4 mm kernel using the metal kernel flow. I have the metal kernel ready and trying to figure out how to inject that into the graph builder and so on. |
|
Have it working with this patch the root cause is that we're tagging the mutable buffers. If mps doesn't support buffer mutation, this line is good enough for tagging the constants and it will exclude the mutable buffers. |
Summary: Test with pytorch#3399 and this command passes ``` python -m examples.models.llama2.export_llama -kv --mps ``` Without this diff, it will error out ``` in _verify_exported_program_signature raise SpecViolationError( torch._export.verifier.SpecViolationError: Buffer output getitem_1 does not point to a buffer that exists. Dict of buffers that are mutated, in order: {'getitem_1': 'layers_0_attention_SDPA_kv_cache_k_cache', 'getitem': 'layers_0_attention_SDPA_kv_cache_v_cache', 'getitem_3': 'layers_1_attention_SDPA_kv_cache_k_cache', 'getitem_2': 'layers_1_attention_SDPA_kv_cache_v_cache', 'getitem_5': 'layers_2_attention_SDPA_kv_cache_k_cache', 'getitem_4': 'layers_2_attention_SDPA_kv_cache_v_cache', 'getitem_7': 'layers_3_attention_SDPA_kv_cache_k_cache', 'getitem_6': 'layers_3_attention_SDPA_kv_cache_v_cache', 'getitem_9': 'layers_4_attention_SDPA_kv_cache_k_cache', 'getitem_8': 'layers_4_attention_SDPA_kv_cache_v_cache'} Buffer nodes available: [] ``` The root cause is that by `is_parameter`, it tags all data including mutable buffers. Differential Revision: D56941763
Summary: Test with pytorch#3399 and this command passes ``` python -m examples.models.llama2.export_llama -kv --mps ``` Without this diff, it will error out ``` in _verify_exported_program_signature raise SpecViolationError( torch._export.verifier.SpecViolationError: Buffer output getitem_1 does not point to a buffer that exists. Dict of buffers that are mutated, in order: {'getitem_1': 'layers_0_attention_SDPA_kv_cache_k_cache', 'getitem': 'layers_0_attention_SDPA_kv_cache_v_cache', 'getitem_3': 'layers_1_attention_SDPA_kv_cache_k_cache', 'getitem_2': 'layers_1_attention_SDPA_kv_cache_v_cache', 'getitem_5': 'layers_2_attention_SDPA_kv_cache_k_cache', 'getitem_4': 'layers_2_attention_SDPA_kv_cache_v_cache', 'getitem_7': 'layers_3_attention_SDPA_kv_cache_k_cache', 'getitem_6': 'layers_3_attention_SDPA_kv_cache_v_cache', 'getitem_9': 'layers_4_attention_SDPA_kv_cache_k_cache', 'getitem_8': 'layers_4_attention_SDPA_kv_cache_v_cache'} Buffer nodes available: [] ``` The root cause is that by `is_parameter`, it tags all data including mutable buffers. Differential Revision: D56941763
Summary: Test with pytorch#3399 and this command passes ``` python -m examples.models.llama2.export_llama -kv --mps ``` Without this diff, it will error out ``` in _verify_exported_program_signature raise SpecViolationError( torch._export.verifier.SpecViolationError: Buffer output getitem_1 does not point to a buffer that exists. Dict of buffers that are mutated, in order: {'getitem_1': 'layers_0_attention_SDPA_kv_cache_k_cache', 'getitem': 'layers_0_attention_SDPA_kv_cache_v_cache', 'getitem_3': 'layers_1_attention_SDPA_kv_cache_k_cache', 'getitem_2': 'layers_1_attention_SDPA_kv_cache_v_cache', 'getitem_5': 'layers_2_attention_SDPA_kv_cache_k_cache', 'getitem_4': 'layers_2_attention_SDPA_kv_cache_v_cache', 'getitem_7': 'layers_3_attention_SDPA_kv_cache_k_cache', 'getitem_6': 'layers_3_attention_SDPA_kv_cache_v_cache', 'getitem_9': 'layers_4_attention_SDPA_kv_cache_k_cache', 'getitem_8': 'layers_4_attention_SDPA_kv_cache_v_cache'} Buffer nodes available: [] ``` The root cause is that by `is_parameter`, it tags all data including mutable buffers. Reviewed By: larryliu0820 Differential Revision: D56941763
Summary: Pull Request resolved: #3503 Test with #3399 and this command passes ``` python -m examples.models.llama2.export_llama -kv --mps ``` Without this diff, it will error out ``` in _verify_exported_program_signature raise SpecViolationError( torch._export.verifier.SpecViolationError: Buffer output getitem_1 does not point to a buffer that exists. Dict of buffers that are mutated, in order: {'getitem_1': 'layers_0_attention_SDPA_kv_cache_k_cache', 'getitem': 'layers_0_attention_SDPA_kv_cache_v_cache', 'getitem_3': 'layers_1_attention_SDPA_kv_cache_k_cache', 'getitem_2': 'layers_1_attention_SDPA_kv_cache_v_cache', 'getitem_5': 'layers_2_attention_SDPA_kv_cache_k_cache', 'getitem_4': 'layers_2_attention_SDPA_kv_cache_v_cache', 'getitem_7': 'layers_3_attention_SDPA_kv_cache_k_cache', 'getitem_6': 'layers_3_attention_SDPA_kv_cache_v_cache', 'getitem_9': 'layers_4_attention_SDPA_kv_cache_k_cache', 'getitem_8': 'layers_4_attention_SDPA_kv_cache_v_cache'} Buffer nodes available: [] ``` The root cause is that by `is_parameter`, it tags all data including mutable buffers. Reviewed By: larryliu0820 Differential Revision: D56941763 fbshipit-source-id: a0ed8e00f453bea345f3fdba2c5b30e0241eda8d
|
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
c852bf7 to
ae4940c
Compare
|
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary of changes:
With whole model delegation, I am seeing following crash in llama2:
Commands to lower llama2 to MPS: