perf: Optimize scalar performance for cot#19888
Conversation
| ColumnarValue::Scalar(_) => { | ||
| panic!("Expected an array value") | ||
| } | ||
| } |
There was a problem hiding this comment.
There are no tests for the Scalar input/output (the fast path).
Also it would be good to add tests for inputs like NULL, 0.0 and f64::consts::Pi
There was a problem hiding this comment.
The existing sqllogictests should already cover the functionality. Aren't the changes just optimization.
There was a problem hiding this comment.
There is no .slt test for the non-Spark cot function:
❯ rg cot datafusion/sqllogictest/
datafusion/sqllogictest/test_files/spark/math/cot.slt
24:## Original Query: SELECT cot(1);
27:#SELECT cot(1::int);
datafusion/sqllogictest/test_files/aggregates_topk.slt
203:('y', 'apricot'),
datafusion/sqllogictest/test_files/imdb.slt
850: (24, 'Ridley Scott', NULL, NULL, 'm', NULL, NULL, NULL, NULL),
Or maybe datafusion/sqllogictest/test_files/spark/math/cot.slt is not really for Spark because I see no cot in https://github.com/apache/datafusion/blob/main/datafusion/spark/src/function/math/mod.rs
Anyway, https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/spark/math/cot.slt contains only commented out code, so there are no SLT tests for cot.
There was a problem hiding this comment.
Added unit tests for these. Thanks for the feedback.
| .unary::<_, Float32Type>(|x: f32| compute_cot32(x)), | ||
| ) as ArrayRef), | ||
| other => exec_err!("Unsupported data type {other:?} for function cot"), | ||
| let return_type = args.return_type().clone(); |
There was a problem hiding this comment.
This variable is used just once - it could be moved inside if scalar.is_null() { to avoid the cloning if not used.
| .unary::<_, Float32Type>(compute_cot32), | ||
| ))), | ||
| other => { | ||
| internal_err!("Unexpected data type {other:?} for function cot") |
There was a problem hiding this comment.
Is it intentional to use internal_err!() instead of exec_err!() (old line 116) ?!
There was a problem hiding this comment.
If we reach the other => branch, it means the type coercion/signature code has a bug, this should never happen in normal execution, hence internal_err.
| .invoke_with_args(ScalarFunctionArgs { | ||
| args: scalar_f32_args.clone(), | ||
| arg_fields: scalar_f32_arg_fields.clone(), | ||
| number_rows: 1, |
There was a problem hiding this comment.
| number_rows: 1, | |
| number_rows: size, |
Currently the input is always the same for all values of size. Maybe the number_rows could be used to make it a bit different ?!
There was a problem hiding this comment.
The benchmark loop already varies size for array benchmarks. For scalar, the point is to measure single-value performance regardless of batch size context.
There was a problem hiding this comment.
In that case there is no need the Scalar bench to be inside for size in [1024, 4096, 8192] {. Currently it executes the very same logic with the very same config three times (once for each size).
| .invoke_with_args(ScalarFunctionArgs { | ||
| args: scalar_f64_args.clone(), | ||
| arg_fields: scalar_f64_arg_fields.clone(), | ||
| number_rows: 1, |
There was a problem hiding this comment.
| number_rows: 1, | |
| number_rows: size, |
|
Thanks @kumarUjjawal & @martin-g |
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Part of apache/datafusion-comet#2986. ## Rationale for this change The cot function currently converts scalar inputs to arrays before processing, even for single scalar values. This adds unnecessary overhead from array allocation and conversion. Adding a scalar fast path avoids this overhead. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? - Added scalar fast path - Added benchmark - Update tests | Type | Before | After | Speedup | |------|--------|-------|---------| | **cot_f64_scalar** | 229 ns | 67 ns | **3.4x** | | **cot_f32_scalar** | 247 ns | 59 ns | **4.2x** | <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
Which issue does this PR close?
Rationale for this change
The cot function currently converts scalar inputs to arrays before processing, even for single scalar values. This adds unnecessary overhead from array allocation and conversion. Adding a scalar fast path avoids this overhead.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?