Skip to content

Conversation

@mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

While working on another PR, I realized that some of the test results are not deterministic. This PR makes these test results deterministic.

What changes are included in this PR?

Are these changes tested?

Existing tests should work

Are there any user-facing changes?

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 13, 2023
Copy link
Contributor

@metesynnada metesynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jayzhan211
Copy link
Contributor

jayzhan211 commented Dec 13, 2023

Quite curious, if the test is not deterministic, why we did not fail on this test before?

Oh, I think probably because of set datafusion.execution.target_partitions = 1;

Maybe we can remove unneccessary set datafusion.execution.target_partitions = 1; before test, so we know what test need specific setting easily.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- the output order some of these queries likely depends both on the threading (e.g test_threads) as well as defails of how hash values are computed (and thus what the order of groups is), etc

# order by the third
query TI
SELECT DISTINCT ON (c1) c1, c2 FROM aggregate_test_100 ORDER BY c1, c3;
SELECT DISTINCT ON (c1) c1, c2 FROM aggregate_test_100 ORDER BY c1, c3, c9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changing the order by changes the results because DISTINCT ON basically picks the first values for each of the distinct values (of c1) in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed

@mustafasrepo
Copy link
Contributor Author

Quite curious, if the test is not deterministic, why we did not fail on this test before?

Oh, I think probably because of set datafusion.execution.target_partitions = 1;

Maybe we can remove unneccessary set datafusion.execution.target_partitions = 1; before test, so we know what test need specific setting easily.

Yes, because setting is not random. Results are deterministicat each run. However, also in terms of query specifications some results are not certain. For instance for the query

SELECT DISTINCT ON (c1) c1, c2 FROM aggregate_test_100 ORDER BY c1, c3;

Both

a 5
b 4
c 2
d 1
e 3

and

a 4
b 4
c 2
d 1
e 3

are valid. However for query

SELECT DISTINCT ON (c1) c1, c2 FROM aggregate_test_100 ORDER BY c1, c3, c9;

only valid result is

a 4
b 4
c 2
d 1
e 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants