Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)"#11682
Conversation
Summary: To support passing ET_USE_PYTORCH_HEADERS only when exceptions are enabled. Differential Revision: D76470039
…ble_kernels test (pytorch#11205)", and "Add vectorization in elementwise_util (pytorch#9432)" Summary: Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if defined(...) && ...` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for pytorch#11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in pytorch#9432. Original summary for pytorch#11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support pytorch#9432. Original summary for pytorch#9432: This is a first cut at pytorch#9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: D76467389 *** fix ET_USE_PYTORCH_HEADERS
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11682
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 735f214 with merge base 56392aa ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D76467389 |
This PR needs a
|
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
2 similar comments
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Summary:
Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those.
New fixes:
enabled, and use
#if defined(...) && ...instead of#ifdefto check the macro sothat we don't use PyTorch headers if exceptions are
disabled. (otherwise, we might have problems with e.g. TORCH_CHECK)
Original summary for #11204:
Set of math functions that work on both scalars and at::vec::Vectorized,
to be used in #9432.
Original summary for #11205:
Make sure we test the optimized versions of portable kernels even if
they are shadowed by optimized implementations. Intended to support
#9432.
Original summary for #9432:
This is a first cut at #9241 . In this PR I've vectorized a small
initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul,
pow, and sigmoid. In addition, the following ops should have gotten
vectorized automatically because they already used generic lambdas: add,
div, rsub, sub. I've left covering ops that use the
unary_ufunc_*utilities in
pattern.h
for a follow-up push, because pattern.h and elementwise_util need some
work before we can migrate pattern.h's utilities to be backed by
elementwise_util.
This PR adds an interesting testing problem: in theory, all operators
might need test cases long enough to tickle vectorization, because we
might accidentally vectorize ops unexpectedly and break their lambdas
due to anticipated differences in semantics. I address this issue by
using Vectorized for the scalar prologue/epilogue in debug mode (we run
tests in both debug and release) so that we can detect broken lambdas. I
additionally intentionally introduced a bug in the vectorized path in
elementwise_util and manually verified that we saw test failures for
each vectorized op called out above.
Differential Revision:
D76467389
fix ET_USE_PYTORCH_HEADERS