GH-15141: [C++] fix for unstable test due to unstable sort #15142

westonpace · 2022-12-31T20:47:55Z

The sorting done by orderby is not stable. This means, given the input:

a	b
1	false
1	true

the test could have generated both [false, true] and [true, false] for the b column. We likely did not encounter this before 498b645 because the entire thing was run serially (even though there was a parallel option it was not setup correctly).

Now that things are properly running parallel the results are non-deterministic. We could remove the b column but I feel it is a better stress test to have at least one payload column. So I changed the test to only compare the key array and not the payload array.

Closes: [CI] arrow-compute: ExecPlanExecution.StressSourceOrderBy may failed #15141

github-actions · 2022-12-31T20:48:17Z

Closes: [CI] arrow-compute: ExecPlanExecution.StressSourceOrderBy may failed #15141

github-actions · 2022-12-31T20:48:19Z

⚠️ GitHub issue #15141 has been automatically assigned in GitHub to PR creator.

github-actions · 2022-12-31T20:48:20Z

⚠️ GitHub issue #15141 has no components, please add labels for components.

westonpace · 2022-12-31T20:54:34Z

@vibhatha would you be interested in providing a review / sanity check?

vibhatha · 2022-12-31T23:36:12Z

@vibhatha would you be interested in providing a review / sanity check?

Sure @westonpace, I will.

vibhatha

@westonpace make sense to me. It’s not possible to guarantee the outcome of this.

Just a question, is there a sort option which gives precedence to the index of the row and decide which comes first, when we have a tie like this?

westonpace · 2023-01-01T01:27:06Z

Just a question, is there a sort option which gives precedence to the index of the row and decide which comes first, when we have a tie like this?

That's called a "stable sort". The underlying sort kernel (SortIndices) is stable. However, if the plan is run in parallel, then there is no guarantee the batches will accumulate in the same order. So even if the sort kernel is stable the sort node is not.

Once we add proper ordering we can add a stable option to the sort node which resequences the data before sorting so that the sort node can remain stable.

However, now that I write this, I realize it might be best to only apply my change when testing the parallel case, and to use the old comparison in the non-parallel case.

vibhatha · 2023-01-01T01:32:43Z

However, now that I write this, I realize it might be best to only apply my change when testing the parallel case, and to use the old comparison in the non-parallel case.

Yes, I also think it’s better that way.

westonpace · 2023-01-01T05:57:05Z

Yes, I also think it’s better that way.

Hmm, I tried this but it turns out not to be so simple. I'm going to proceed with this how it is for now. We can worry about a full comparison later when we add a stable sort. I'll add a new issue requesting that. I'll merge this so it doesn't bother CI

ursabot · 2023-01-01T08:42:08Z

Benchmark runs are scheduled for baseline = db6c59d and contender = 5a57e6d. 5a57e6d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.56% ⬆️1.11%] test-mac-arm
[Finished ⬇️1.79% ⬆️0.0%] ursa-i9-9960x
[Failed ⬇️0.0% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 5a57e6dd ec2-t3-xlarge-us-east-2
[Failed] 5a57e6dd test-mac-arm
[Finished] 5a57e6dd ursa-i9-9960x
[Failed] 5a57e6dd ursa-thinkcentre-m75q
[Finished] db6c59d1 ec2-t3-xlarge-us-east-2
[Failed] db6c59d1 test-mac-arm
[Finished] db6c59d1 ursa-i9-9960x
[Failed] db6c59d1 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot · 2023-01-01T08:42:25Z

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

…che#15142) The sorting done by orderby is not stable. This means, given the input: a | b --- | --- 1 | false 1 | true the test could have generated both `[false, true]` and `[true, false]` for the `b` column. We likely did not encounter this before apache@498b645 because the entire thing was run serially (even though there was a `parallel` option it was not setup correctly). Now that things are properly running parallel the results are non-deterministic. We could remove the `b` column but I feel it is a better stress test to have at least one payload column. So I changed the test to only compare the key array and not the payload array. * Closes: apache#15141 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>

fix for unstable test due to unstable sort

5b73efb

github-actions bot added the Component: C++ label Dec 31, 2022

vibhatha approved these changes Jan 1, 2023

View reviewed changes

westonpace merged commit 5a57e6d into apache:master Jan 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GH-15141: [C++] fix for unstable test due to unstable sort #15142

GH-15141: [C++] fix for unstable test due to unstable sort #15142

Uh oh!

westonpace commented Dec 31, 2022 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Dec 31, 2022

Uh oh!

github-actions bot commented Dec 31, 2022

Uh oh!

github-actions bot commented Dec 31, 2022

Uh oh!

westonpace commented Dec 31, 2022

Uh oh!

vibhatha commented Dec 31, 2022

Uh oh!

vibhatha left a comment

Uh oh!

westonpace commented Jan 1, 2023

Uh oh!

vibhatha commented Jan 1, 2023

Uh oh!

westonpace commented Jan 1, 2023

Uh oh!

ursabot commented Jan 1, 2023

Uh oh!

ursabot commented Jan 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GH-15141: [C++] fix for unstable test due to unstable sort #15142

GH-15141: [C++] fix for unstable test due to unstable sort #15142

Uh oh!

Conversation

westonpace commented Dec 31, 2022 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 31, 2022

Uh oh!

github-actions bot commented Dec 31, 2022

Uh oh!

github-actions bot commented Dec 31, 2022

Uh oh!

westonpace commented Dec 31, 2022

Uh oh!

vibhatha commented Dec 31, 2022

Uh oh!

vibhatha left a comment

Choose a reason for hiding this comment

Uh oh!

westonpace commented Jan 1, 2023

Uh oh!

vibhatha commented Jan 1, 2023

Uh oh!

westonpace commented Jan 1, 2023

Uh oh!

ursabot commented Jan 1, 2023

Uh oh!

ursabot commented Jan 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

westonpace commented Dec 31, 2022 •

edited by github-actions bot

Loading