Skip to content

Conversation

@js8544
Copy link
Contributor

@js8544 js8544 commented Jun 10, 2023

Rationale for this change

Implement cumulative prod, max and min compute functions

What changes are included in this PR?

  1. Add implementations, docs and tests for the three functions.
  2. Refactor CumulativeSumOptions to CumulativeOptions for reusability.
  3. Fix a bug where GenericFromScalar(GenericToScalar(std::nullopt)) != std::nullopt.
  4. Remove an unnecessary Cast with the default start value.
  5. Add tests to check behavior with NaN.

I'll explain some of the changes in comments.

Are these changes tested?

Yes, in vector_accumulative_ops_test.cc and test_compute.py

Are there any user-facing changes?

No. The data members of CumulativeSumOptions are changed, but the member functions behave as before. And std::optional also can be constructed directly from T. So users should not feel any difference.

@github-actions
Copy link

⚠️ GitHub issue #32190 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 10, 2023
Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

This looks good, a few minor comments:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Does this need an alias?

Suggested change
CumulativeOptions,
CumulativeOptions,
CumulativeOptions as CumulativeSumOptions,

Copy link
Member

@jorisvandenbossche jorisvandenbossche Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backwards compatibility, that's probably useful, yes (also on the C++ side this was done).

(although we should maybe actually add it as its own class and deprecate it, so we can remove the alias at some point. But this can also be done as a follow-up later)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up issue here #36240

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jun 20, 2023
@js8544 js8544 force-pushed the jinshang/cumulative_functions branch from f4de821 to 4bd7ca2 Compare June 21, 2023 16:17
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 21, 2023
js8544 and others added 2 commits June 22, 2023 00:23
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
@js8544 js8544 requested a review from bkietz June 21, 2023 16:35
Comment on lines +689 to +690
template <typename Value>
static constexpr Value value{std::numeric_limits<Value>::min()};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cumulative ops aren't currently instantiated for decimal types anyway: https://github.com/apache/arrow/blob/44edc27/cpp/src/arrow/compute/kernels/codegen_internal.h#L1070-L1100

... if you feel like filing a follow up issue for them, it should be fairly straightforward to specify these as specializations:

Suggested change
template <typename Value>
static constexpr Value value{std::numeric_limits<Value>::min()};
template <typename Value>
static constexpr Value value{std::numeric_limits<Value>::min()};
template <>
static constexpr Decimal128 value<Decimal128> = Decimal128::GetMinSentinel();

Multiplication will require more effort since we can't construct 1 without knowing the scale.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove them for this pr. Will send another pr for decimals.

@github-actions github-actions bot removed the awaiting change review Awaiting change review label Jun 21, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 21, 2023
@js8544 js8544 requested a review from bkietz June 22, 2023 00:13
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 22, 2023
Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Jun 22, 2023
@bkietz
Copy link
Member

bkietz commented Jun 22, 2023

CI failure is a backpressure test flake-out in AsOfJoin, unrelated.

js8544 and others added 2 commits June 22, 2023 20:32
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
@icexelloss
Copy link
Contributor

Thanks @bkietz - created #36248 to track the Asof Join backpresure test issue

@conbench-apache-arrow
Copy link

Conbench analyzed the 6 benchmark runs on commit e3eb5898.

There were 6 benchmark results indicating a performance regression:

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Implement cumulative product, max, and min compute functions

4 participants