start refactoring process by setting up base + init by logan-keede · Pull Request #14306 · apache/datafusion

logan-keede · 2025-01-26T13:22:41Z

Which issue does this PR close?

part of Sort out tests in aggregate.slt #13723

Rationale for this change

refer to #14301

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

logan-keede · 2025-01-26T13:23:28Z

cc @Rachelint

Rachelint · 2025-01-26T15:58:26Z

It seems the tests will be executed twice, how about we just left the one complete test file?

Because we will only move testcases incrementlly after this pr, seems we can ensure no cases are lost by this way:

move cases from complete_aggregate.slt to function1.slt
get diff between current moved complete_aggregate.slt
compare diff and function1.slt

And it seems great if we can make this an automatic process?

Rachelint · 2025-01-26T16:04:29Z

datafusion/sqllogictest/test_files/aggregate/complete_aggregate.slt

How about we name it old_aggregate.slt or old_testcases.slt.
And we can add a README to explain the background like what in string.
(I can help, and not a required thing about merging)

It seems the tests will be executed twice, how about we just left the one complete test file?

Because we will only move testcases incrementlly after this pr, seems we can ensure no cases are lost by this way:

move cases from complete_aggregate.slt to function1.slt

get diff between current moved complete_aggregate.slt

compare diff and function1.slt

And it seems great if we can make this an automatic process?

If I understood you correctly,
My fear is that we might lose track of what we have already moved,
we might not be able to make sure that sum of all funtions_*.slt is equal to old_aggregate.slt or not, but for base_aggregate.slt we know if something is present in it, it is not present in any of the functions file.
beside I can not think of test running twice as a bad thing, it is like an extra layer of security at the cost of ~5 sec of ci time(even on 1 thread).
I can maintain this extra file on my local system but that is like binding this issue to me, it will be easier for anyone to contribute in spliting this file if we keep both.

I definitely agree with making a README file, I was considering it myself.

Agree with it is most improtant to keep no tests are lost and I think it ok to execute tests twice temporarily.

But I think it a bit strange if we always need to execute tests in later.

If we choose to keep and run all exists cases, it seems good that?

Just don't modify the old cases in aggregate.slt, but only rename it to old_aggregate.slt

And we ensure no new cases will be added into old_aggregate.slt anymore, and guide contributors to add cases in the new way(create a file for the function, and add cases into it)

🤔 The alternative may can be following? Maybe actually make sense that we should keep a complete for ensuring no cases lost.

Keep the complete_aggregate.slt but just make it won't be executed

Perform extract and subtract for the base_aggregate.slt.
For example we extract min/max from it, and subtract them from base_aggregate.slt.
And get min_max.slt and base_aggregate.slt

Implement a simple program/script to check if min_max.slt + base_aggregate.slt = complete_aggregate.slt

It seems not only aggregate and string but also some other test files are too large, may be it can reused during sorting out them?

Keep the complete_aggregate.slt but just make it won't be executed

Perform extract and subtract for the base_aggregate.slt.

This approach looks good to me.

Implement a simple program/script to check if min_max.slt + base_aggregate.slt = complete_aggregate.slt

This looks fun to me, I will be working on this though it might take some time so, can we merge this portion in a separate PR.

Yes, it is nice to do it in follow on prs.

should I open a new issue for this or just a PR?

should I open a new issue for this or just a PR?

I think both of them are ok?

Maybe open a sub issue of #13723 ?

And we state what we want to do in later refactor in it?

datafusion/sqllogictest/test_files/aggregate/README.md

Rachelint

Thanks @logan-keede again, it looks good to me as a start of refactoring!

fix: typo Co-authored-by: kamille <3144148605@qq.com>

Rachelint

Oh, sorry... I think of some situations:

Contributors don't notice the README, and add new tests into base_aggregate.slt
Reviewers don't notice the README too, approve and merge the pr
Finally, the base_aggregate.slt become different with the archived complete_aggregate.slt

And it may be painful to solve such conflicts if it happen frequently.

Maybe we should include the check in this pr before merging. And when found new cases added into base_aggregate.slt, we throw an error and block it in ci.

Sorry again...

…n-keede/datafusion into diff_for_sqllogictests

logan-keede · 2025-01-27T20:28:44Z

Oh, sorry... I think of some situations:

Contributors don't notice the README, and add new tests into base_aggregate.slt

Reviewers don't notice the README too, approve and merge the pr

Finally, the base_aggregate.slt become different with the archived complete_aggregate.slt

And it may be painful to solve such conflicts if it happen frequently.

Maybe we should include the check in this pr before merging. And when found new cases added into base_aggregate.slt, we throw an error and block it in ci.

Sorry again...

No problem, I have added the diff function.
still figuring out how to add this to CI.
Thanks for your patience

…te_aggregate

logan-keede · 2025-01-30T04:21:35Z

@Rachelint I have added the test to CI,
Please review it whenever you can find some time.
Thanks

Rachelint · 2025-01-30T13:13:43Z

Thanks @logan-keede , I will review it in next few days.

Changes addressed.

logan-keede · 2025-02-09T19:36:45Z

@Rachelint this is just a reminder. Please disregard if this isn't needed.

Rachelint · 2025-02-16T09:38:51Z

@Rachelint this is just a reminder. Please disregard if this isn't needed.

Sorry, I am back and reviewing today.

github-actions · 2025-04-18T02:04:57Z

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

start refactoring process by setting up base + init

cda0145

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 26, 2025

fix: forgotten init file

3254063

Rachelint reviewed Jan 26, 2025

View reviewed changes

Readme + archival of old/complete aggregate

c0150d4

logan-keede requested a review from Rachelint January 27, 2025 15:20

logan-keede added 2 commits January 27, 2025 22:31

fix: empty forgotten README

61d1560

fix: prettier formatting

d5f0ed8

Rachelint reviewed Jan 27, 2025

View reviewed changes

datafusion/sqllogictest/test_files/aggregate/README.md Outdated Show resolved Hide resolved

Rachelint approved these changes Jan 27, 2025

View reviewed changes

logan-keede and others added 2 commits January 27, 2025 23:38

Update datafusion/sqllogictest/test_files/aggregate/README.md

2499af0

fix: typo Co-authored-by: kamille <3144148605@qq.com>

diff functionality

6bb002b

Rachelint self-requested a review January 27, 2025 20:08

Rachelint previously requested changes Jan 27, 2025

View reviewed changes

Merge branch 'aggregate_refactor_properly' of https://github.com/loga…

3017cf1

…n-keede/datafusion into diff_for_sqllogictests

checks to make sure there are no changes in base_aggregate and comple…

5959740

…te_aggregate

logan-keede requested a review from Rachelint January 27, 2025 21:30

fix: Clippy

0aaa778

improve code quality

5ac5596

github-actions bot added the Stale PR has not had any activity for some time label Apr 18, 2025

github-actions bot closed this May 1, 2025

Conversation

logan-keede commented Jan 26, 2025 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

logan-keede commented Jan 26, 2025

Uh oh!

Rachelint commented Jan 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rachelint Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Rachelint left a comment

Choose a reason for hiding this comment

Uh oh!

Rachelint left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

logan-keede commented Jan 27, 2025

Uh oh!

logan-keede commented Jan 30, 2025

Uh oh!

Rachelint commented Jan 30, 2025

Uh oh!

logan-keede commented Feb 9, 2025

Uh oh!

Rachelint commented Feb 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

logan-keede commented Jan 26, 2025 •

edited by alamb

Loading

Rachelint commented Jan 26, 2025 •

edited

Loading

Rachelint Jan 27, 2025 •

edited

Loading

Rachelint left a comment •

edited

Loading

Rachelint commented Feb 16, 2025 •

edited

Loading