Adopt runtime::FunctionRef in thread_parallel_interface.h and thread_parallel.h#10442
Adopt runtime::FunctionRef in thread_parallel_interface.h and thread_parallel.h#10442
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10442
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 4 PendingAs of commit 5bc4b0c with merge base 6feb623 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…parallel.h This saves a *ton* of size by no longer requiring callers to create std::function. Test Plan: bash test/build_optimized_size_test.sh && size test/size_test_all_optimized_ops before: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 24 13:11 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 24 13:11 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 11182136 Apr 24 13:11 cmake-out/test/size_test_all_optimized_ops __TEXT __DATA __OBJC others dec hex 5341184 98304 0 4300800000 4306239488 100ac0000 ``` after: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 24 13:07 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 24 13:07 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5927336 Apr 24 13:07 cmake-out/test/size_test_all_optimized_ops __TEXT __DATA __OBJC others dec hex 4505600 98304 0 4296376320 4300980224 1005bc000 ```` `__TEXT` size improvement is 835584 bytes, or 15%. ghstack-source-id: 2482ad7 ghstack-comment-id: 2828755250 Pull-Request-resolved: #10442
digantdesai
left a comment
There was a problem hiding this comment.
curious how did you find this optimization?
…parallel.h This saves a *ton* of size by no longer requiring callers to create std::function. Test Plan: bash test/build_optimized_size_test.sh && size test/size_test_all_optimized_ops before: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 24 13:11 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 24 13:11 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 11182136 Apr 24 13:11 cmake-out/test/size_test_all_optimized_ops __TEXT __DATA __OBJC others dec hex 5341184 98304 0 4300800000 4306239488 100ac0000 ``` after: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 24 13:07 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 24 13:07 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5927336 Apr 24 13:07 cmake-out/test/size_test_all_optimized_ops __TEXT __DATA __OBJC others dec hex 4505600 98304 0 4296376320 4300980224 1005bc000 ```` `__TEXT` size improvement is 835584 bytes, or 15%. ghstack-source-id: 516b5a7 ghstack-comment-id: 2828755250 Pull-Request-resolved: #10442
@digantdesai I was working on reapplying #9842 and #9841 (reverted due to size regression) and iteratively using bloaty on the result of bash test/build_optimized_size_test.sh. There was a clear huge size increase from std::function, and after struggling with how to avoid it, I found my way to function_ref. |
|
Unit tests look good despite two apparent flakes. will merge after rebase. |
…parallel.h This saves a *ton* of size by no longer requiring callers to create std::function. Test Plan: bash test/build_optimized_size_test.sh && size test/size_test_all_optimized_ops before: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 24 13:11 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 24 13:11 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 11182136 Apr 24 13:11 cmake-out/test/size_test_all_optimized_ops __TEXT __DATA __OBJC others dec hex 5341184 98304 0 4300800000 4306239488 100ac0000 ``` after: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 24 13:07 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 24 13:07 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5927336 Apr 24 13:07 cmake-out/test/size_test_all_optimized_ops __TEXT __DATA __OBJC others dec hex 4505600 98304 0 4296376320 4300980224 1005bc000 ```` `__TEXT` size improvement is 835584 bytes, or 15%. ghstack-source-id: 19ca499 ghstack-comment-id: 2828755250 Pull-Request-resolved: pytorch/executorch#10442
This saves a ton of size by no longer requiring callers to create std::function.
Test Plan: bash test/build_optimized_size_test.sh && size test/size_test_all_optimized_ops
before:
after:
__TEXTsize improvement is 835584 bytes, or 15%.