Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" by swolchok · Pull Request #11604 · pytorch/executorch

swolchok · 2025-06-12T16:34:59Z

Stack from ghstack (oldest at bottom):

-> Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11604
Define ET_HAS_EXCEPTIONS macro #11603

Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those.

New fixes:

straightforward op_sub build fixes
s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test
define ET_USE_PYTORCH_HEADERS to detect whether exceptions are
enabled, and use #if instead of #ifdef to check the macro so
that we don't use PyTorch headers if exceptions are
disabled. (otherwise, we might have problems with e.g. TORCH_CHECK)

Original summary for #11204:
Set of math functions that work on both scalars and at::vec::Vectorized,
to be used in #9432.

Original summary for #11205:
Make sure we test the optimized versions of portable kernels even if
they are shadowed by optimized implementations. Intended to support
#9432.

Original summary for #9432:

This is a first cut at #9241 . In this PR I've vectorized a small
initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul,
pow, and sigmoid. In addition, the following ops should have gotten
vectorized automatically because they already used generic lambdas: add,
div, rsub, sub. I've left covering ops that use the unary_ufunc_*
utilities in
pattern.h
for a follow-up push, because pattern.h and elementwise_util need some
work before we can migrate pattern.h's utilities to be backed by
elementwise_util.

This PR adds an interesting testing problem: in theory, all operators
might need test cases long enough to tickle vectorization, because we
might accidentally vectorize ops unexpectedly and break their lambdas
due to anticipated differences in semantics. I address this issue by
using Vectorized for the scalar prologue/epilogue in debug mode (we run
tests in both debug and release) so that we can detect broken lambdas. I
additionally intentionally introduced a bug in the vectorized path in
elementwise_util and manually verified that we saw test failures for
each vectorized op called out above.

Differential Revision: D76467389

…nels test (#11205)", and "Add vectorization in elementwise_util (#9432)" Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76467389](https://our.internmc.facebook.com/intern/diff/D76467389/) [ghstack-poisoned]

pytorch-bot · 2025-06-12T16:35:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11604

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit bc7c8f8 with merge base d660bde ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / linux / linux-job (gh) (trunk failure)
/pytorch/executorch/backends/vulkan/runtime/api/containers/Tensor.cpp:651:17: error: no matching constructor for initialization of 'vkcompute::api::vTensor::TextureLimits'
pull / unittest-editable / linux / linux-job (gh) (trunk failure)
/pytorch/executorch/backends/vulkan/runtime/api/containers/Tensor.cpp:651:17: error: no matching constructor for initialization of 'vkcompute::api::vTensor::TextureLimits'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…nels test (#11205)", and "Add vectorization in elementwise_util (#9432)" Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76467389](https://our.internmc.facebook.com/intern/diff/D76467389/) ghstack-source-id: 289985405 Pull Request resolved: #11604

facebook-github-bot · 2025-06-12T16:35:17Z

This pull request was exported from Phabricator. Differential Revision: D76467389

…table_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)"" Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76467389](https://our.internmc.facebook.com/intern/diff/D76467389/) [ghstack-poisoned]

…nels test (#11205)", and "Add vectorization in elementwise_util (#9432)" Pull Request resolved: #11604 Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76467389](https://our.internmc.facebook.com/intern/diff/D76467389/) ghstack-source-id: 289996914

facebook-github-bot · 2025-06-12T17:16:30Z

This pull request was exported from Phabricator. Differential Revision: D76467389

jathu · 2025-06-13T17:49:50Z

kernels/portable/CMakeLists.txt

+  )
  install(
-    TARGETS optimized_portable_kernels
+    TARGETS optimized_portable_kernels optimized_portable_ops_lib


Are the optimized_portable_ops_lib mutually exclusive with portable_ops_lib, if so should we only build one depending on EXECUTORCH_BUILD_KERNELS_OPTIMIZED?

cc @larryliu0820

…ortable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)"" Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76467389](https://our.internmc.facebook.com/intern/diff/D76467389/) [ghstack-poisoned]

…nels test (#11205)", and "Add vectorization in elementwise_util (#9432)" Pull Request resolved: #11604 Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. ghstack-source-id: 290334876 Differential Revision: [D76467389](https://our.internmc.facebook.com/intern/diff/D76467389/)

facebook-github-bot · 2025-06-13T20:19:47Z

This pull request was exported from Phabricator. Differential Revision: D76467389

swolchok requested review from JacobSzwejbka, jathu, kirklandsign, larryliu0820, lucylq and manuelcandales as code owners June 12, 2025 16:35

swolchok mentioned this pull request Jun 12, 2025

Define ET_HAS_EXCEPTIONS macro #11603

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 12, 2025

facebook-github-bot added the fb-exported label Jun 12, 2025

JacobSzwejbka approved these changes Jun 12, 2025

View reviewed changes

swolchok added the release notes: ops & kernels Changes to the opset and any new / changed kernel implementations label Jun 12, 2025

jathu reviewed Jun 13, 2025

View reviewed changes

facebook-github-bot merged commit 47bdf41 into gh/swolchok/457/base Jun 14, 2025
95 of 99 checks passed

facebook-github-bot deleted the gh/swolchok/457/head branch June 14, 2025 00:49

facebook-github-bot temporarily deployed to cherry-pick-bot June 14, 2025 00:49 — with GitHub Actions Inactive

pytorchbot mentioned this pull request Jun 14, 2025

Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11665

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)"#11604

Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)"#11604
facebook-github-bot merged 3 commits intogh/swolchok/457/basefrom
gh/swolchok/457/head

swolchok commented Jun 12, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 12, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 12, 2025

Uh oh!

facebook-github-bot commented Jun 12, 2025

Uh oh!

jathu Jun 13, 2025

Uh oh!

facebook-github-bot commented Jun 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

swolchok commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11604

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

facebook-github-bot commented Jun 12, 2025

Uh oh!

facebook-github-bot commented Jun 12, 2025

Uh oh!

jathu Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jun 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

swolchok commented Jun 12, 2025 •

edited

Loading

pytorch-bot bot commented Jun 12, 2025 •

edited

Loading