-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add tpch test cases with data. #6435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
25b3ef9 to
4fb501f
Compare
jackwener
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @liurenjie1024 .
cc @alamb
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need to have all that inserts
Imho its better to use existing approach with sqllogictest like
CREATE EXTERNAL TABLE aggregate_test_100_by_sql (
c1 VARCHAR NOT NULL,
c2 TINYINT NOT NULL,
c3 SMALLINT NOT NULL,
c4 SMALLINT,
c5 INT,
c6 BIGINT NOT NULL,
c7 SMALLINT NOT NULL,
c8 INT NOT NULL,
c9 BIGINT UNSIGNED NOT NULL,
c10 VARCHAR NOT NULL,
c11 FLOAT NOT NULL,
c12 DOUBLE NOT NULL,
c13 VARCHAR NOT NULL
)
STORED AS CSV
WITH HEADER ROW
LOCATION '../../testing/data/csv/aggregate_test_100.csv'
SELECT avg(c12) FROM aggregate_test_100
agree with you, it's easier to maintain. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @liurenjie1024 . This is going to be awesome.
I was looking at what gets run on master for the tpch queries.
My reading is that the CI actually runs the tpch data generator and generates the SF1 data (about 1GB in total size)
Thus in terms of this test I suggest:
- remove all the data from the files
- use the
CREATE EXTERNAL TABLEcommand as suggested by @comphead and @jackwener to create table that point at the data generated by the tpch data generator - Figure out some way to run the tpch tests conditionally in CI (maybe an environment variable or a flag 🤔 ) so they still pass even when the data generator hasn't been run.
What do you think?
|
In terms of conditionally running the tests, perhaps we can special case tests that start with |
|
Oh, I didn't notice that we already have tpch verification in ci, so another round of tpch verification would not be necessary. The main goal of this pr is to move tpch test in benchmark into sqllogictest to make it easier to maintain, so I will remove unnecessary changes to avoid. |
0bfc53e to
142a93a
Compare
|
I think this is ready for review. cc @comphead @alamb @jackwener |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @liurenjie1024 -- I think this looks great.
One improvement I would like to look into is pre-creating the tpch data (perhaps we can create it in a docker image and then just copy it down). This would increase the CI speed greatly I think
| ./dbgen -f -s 1 | ||
| mv *.tbl ../benchmarks/data | ||
| mv ./answers/* ../benchmarks/data/answers/ | ||
| ./dbgen -f -s 0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this a little more -- and note we are running this CI check sf 0.1 (rather than SF1) It seems like a good idea and means it takes only 8 seconds to make the data 👍
https://github.com/apache/arrow-datafusion/actions/runs/5108217213/jobs/9181887855?pr=6435
|
I was able to run these tests locally following the directions 👍 |
Which issue does this PR close?
Closes #6405
Rationale for this change
Move tpch verification/plan test into sqllogictest.
What changes are included in this PR?
Are these changes tested?
Yes, tested with
cargo test -p datafusion --test sqllogictestsAre there any user-facing changes?
No.