perf: Optimize round scalar performance#19831
Conversation
| return Ok(ColumnarValue::Scalar(ScalarValue::Float64(None))); | ||
| } | ||
| _ => { | ||
| // Fall through to array path for non-Int32 decimal places |
There was a problem hiding this comment.
Is this arm possible in normal execution?
There was a problem hiding this comment.
No, it's unreachable because decimal_places is coerced to Int32 by the signature.
There was a problem hiding this comment.
If it is unreachable in normal execution then it should return an internal error to indicate so
| ScalarValue::Float32(None) => { | ||
| return Ok(ColumnarValue::Scalar(ScalarValue::Float32(None))); | ||
| } | ||
| // For decimals and other types: fall through to array path |
There was a problem hiding this comment.
Is there anything stopping us from supporting decimals as well?
There was a problem hiding this comment.
The round_decimal function requires the scale parameter from the decimal type (e.g., Decimal128(precision, scale)), which makes extracting and processing scalars more involved than for floats.
For float scalars, we just call round_float(*v, dp) directly. For decimals, we'd need to:
- Extract the native value from ScalarValue::Decimal128 (or other decimal variants)
- Get the scale from the type
- Call round_decimal(value, scale, dp)
- Reconstruct the ScalarValue with precision/scale
If you're comfortable with these changes in this PR for the decimal then I can introduce these as well. What do you think?
There was a problem hiding this comment.
We should do it in this PR
| let dp = match dp_scalar { | ||
| ScalarValue::Int32(Some(dp)) => *dp, | ||
| ScalarValue::Int32(None) => { | ||
| // Return type depends on input type, but for null dp we return null |
There was a problem hiding this comment.
This fails to take into account decimals
| ScalarValue::Float32(None) => { | ||
| return Ok(ColumnarValue::Scalar(ScalarValue::Float32(None))); | ||
| } | ||
| // For decimals and other types: fall through to array path |
There was a problem hiding this comment.
We should do it in this PR
| return Ok(ColumnarValue::Scalar(ScalarValue::Float64(None))); | ||
| } | ||
| _ => { | ||
| // Fall through to array path for non-Int32 decimal places |
There was a problem hiding this comment.
If it is unreachable in normal execution then it should return an internal error to indicate so
martin-g
left a comment
There was a problem hiding this comment.
I see only unit tests with (Float32/64) arrays in this file.
It would be good to add some tests for the new fast path. Or they are already tested by some .slt ?
| &default_decimal_places | ||
| }; | ||
|
|
||
| // Scalar fast path for float and decimal types - avoid array conversion overhead |
There was a problem hiding this comment.
if let (ColumnarValue::Scalar(value_scalar), ColumnarValue::Scalar(dp_scalar)) =
(&args.args[0], decimal_places)
{
if value_scalar.is_null() || dp_scalar.is_null() {
return ColumnarValue::Scalar(ScalarValue::Null)
.cast_to(args.return_type(), None);
}
let dp = if let ScalarValue::Int32(Some(dp)) = dp_scalar {
*dp
} else {
return internal_err!(
"Unexpected datatype for decimal_places: {}",
dp_scalar.data_type()
);
};
match value_scalar {
ScalarValue::Float32(Some(v)) => {
let rounded = round_float(*v, dp)?;
Ok(ColumnarValue::Scalar(ScalarValue::from(rounded)))
}
ScalarValue::Float64(Some(v)) => {
let rounded = round_float(*v, dp)?;
Ok(ColumnarValue::Scalar(ScalarValue::from(rounded)))
}
ScalarValue::Decimal128(Some(v), precision, scale) => {
let rounded = round_decimal(*v, *scale, dp)?;
let scalar =
ScalarValue::Decimal128(Some(rounded), *precision, *scale);
Ok(ColumnarValue::Scalar(scalar))
}
ScalarValue::Decimal256(Some(v), precision, scale) => {
let rounded = round_decimal(*v, *scale, dp)?;
let scalar =
ScalarValue::Decimal256(Some(rounded), *precision, *scale);
Ok(ColumnarValue::Scalar(scalar))
}
ScalarValue::Decimal64(Some(v), precision, scale) => {
let rounded = round_decimal(*v, *scale, dp)?;
let scalar =
ScalarValue::Decimal64(Some(rounded), *precision, *scale);
Ok(ColumnarValue::Scalar(scalar))
}
ScalarValue::Decimal32(Some(v), precision, scale) => {
let rounded = round_decimal(*v, *scale, dp)?;
let scalar =
ScalarValue::Decimal32(Some(rounded), *precision, *scale);
Ok(ColumnarValue::Scalar(scalar))
}
_ => {
internal_err!(
"Unexpected datatype for value: {}",
value_scalar.data_type()
)
}
}
} else {
round_columnar(&args.args[0], decimal_places, args.number_rows)
}Cleaner way of doing this
- Using
internal_errwhich are more appropriate here thanexec_err - Collapse null handling using
ScalarValue::is_nullandColumnarValue::cast_to - Don't need to map the error of
round_floatandround_decimalbecause using?does that for us
There was a problem hiding this comment.
Thanks, this look much better.
I will add more unit tests. |
| } | ||
|
|
||
| #[test] | ||
| fn test_round_scalar_f64() { |
There was a problem hiding this comment.
Removed the scalar unit tests, these are already tested in the SLTs.
|
Thanks @kumarUjjawal & @martin-g |
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Part of apache/datafusion-comet#2986 ## Rationale for this change The round function currently converts scalar inputs to arrays before processing, even when both value and decimal_places are scalar values. This adds unnecessary overhead for constant folding scenarios like <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? - Add scalar fast path in RoundFunc::invoke_with_args for Float64 and Float32 inputs - Directly compute the result when both inputs are scalars, avoiding array conversion overhead - Add benchmark <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes Type | Before | After | Speedup -- | -- | -- | -- round_f64_scalar | 570 ns | 195 ns | 2.9x round_f32_scalar | 564 ns | 192 ns | 2.9x <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? No <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
Which issue does this PR close?
Rationale for this change
The round function currently converts scalar inputs to arrays before processing, even when both value and decimal_places are scalar values. This adds unnecessary overhead for constant folding scenarios like
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
No