Make a data driven SQL testing tool (so we can reuse duckdb test suite, example)

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
I would like to ensure that DataFusion gets the correct answers for SQL queries (especially in tricky corner cases like the one described in https://github.com/apache/arrow-datafusion/issues/4211)

From experience, both in DataFusion and in prior jobs, the effort required to maintain tests (both to add new tests as well as update existing tests) is substantial. Making it easier to add new tests and maintain existing ones will help us keep up velocity. 

Right now, we have two sql integration style tests:

1. the `integration` test from @Jimexist 🦾  https://github.com/apache/arrow-datafusion/tree/master/integration-tests: Runs a limited number of queries against data in both postgres and datafusion and compares the results
* The `sql_integration` test https://github.com/apache/arrow-datafusion/tree/master/datafusion/core/tests/sql: test setup, execution, and verification is written in rust. 

The challenge with sql_integration test is that to add new tests or update existing ones, we need to change rust code and recompile, which takes a *loong* time

Likewise, the integration test requires that the results are exactly the same as postgres which is not possible in all cases (like when testing for unsigned types, which postgres doesn't support, or testing some DataFusion specific thing)



**Describe the solution you'd like**
I would like some sort of data driven test to replace sql_integration

You can see this style of test in duckdb:  https://github.com/duckdb/duckdb/blob/master/test/sql/join/empty_joins.test

My ideal solution would be to implement a runner (ideally the same as [SQLLogicTests](https://duckdb.org/dev/testing#sqllogictests) from DuckDB)
2. Using the same data file format as duckdb (will mean we could reuse their tests without much modification)
3. Start porting as many of the tests in sql_integration over to this new format as possible)

I implemented a impler version of this approach in https://github.com/influxdata/influxdb_iox/blob/main/query_tests/README.md which runs sql queries from a file and compares the result to known output.  I think the duckdb way is superior

**Describe alternatives you've considered**
Leave things the same

**Additional context**
Add any other context or screenshots about the feature request here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make a data driven SQL testing tool (so we can reuse duckdb test suite, example) #4248

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make a data driven SQL testing tool (so we can reuse duckdb test suite, example) #4248

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions