Skip to content

Conversation

@gabotechs
Copy link
Contributor

Which issue does this PR close?

It probably does not fully closes it but it partially addresses:

Rationale for this change

Being able to execute the full sqllogictest suite adding a Substrait roundtrip. This means:

  • Parsing the test statement
  • Building a logical plan out of it
  • Converting the logical plan to Substrait
  • Converting the Substrait plan back to logical
  • Execute it

What changes are included in this PR?

Adds a new DataFusionSubstraitRoundTrip sqllogictest runner implementation that will do a Substrait roundtrip where it makes sense, and a new --substrait-round-trip option to the sqllogictest CLI.

Example:

Runs the full aggregate.slt file doing a Substrait roundtrip on each individual statement

cargo test --test sqllogictests -- --substrait-round-trip aggregate 

Runs an individual failing test reproducing one specific issue with the current Substrait conversions

cargo test --test sqllogictests -- --substrait-round-trip aggregate:1522

This is still not enforced in the CI, and I would say the expectations are that it's just another tool to catch existing bugs in the current Substrait conversion code and better communicate them so that contributors have a clear path forward for contributing fixes.

Are these changes tested?

N/A

Are there any user-facing changes?

No

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label May 24, 2025
@gabotechs gabotechs force-pushed the add-substrait-rountrip-option-in-sqllogictests branch 4 times, most recently from dd41d5b to 2cc9244 Compare May 30, 2025 08:56
@gabotechs gabotechs force-pushed the add-substrait-rountrip-option-in-sqllogictests branch from 2cc9244 to e481a15 Compare May 30, 2025 09:16
@gabotechs
Copy link
Contributor Author

This one is good for a review! cc @alamb @Blizzara

@Blizzara
Copy link
Contributor

Blizzara commented Jun 2, 2025

Code looks fine to me but I'm not familiar with the sqllogictests code so will leave it to Andrew to review properly.

I really like this additional testing for Substrait, thanks!

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I suggest adding a CI test that exercises this option with a passing .slt file (or create a minimal one). Otherwise this LGTM.

@github-actions github-actions bot added the development-process Related to development process of DataFusion label Jun 4, 2025
@gabotechs
Copy link
Contributor Author

I suggest adding a CI test that exercises this option with a passing .slt file

Something like this 1a05ba2?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @gabotechs -- thank you

I also tried it locally with tests I know fail and got an error as expected:

cargo test --test sqllogictests -- --substrait-round-trip agg

...

10. query failed: DataFusion error: This feature is not implemented: Producing a row from empty relation is unsupported
[SQL] with data as (
  select 1 as f,       4 as b
  union all
  select null as f,   99 as b
  union all
  select 2 as f,       5 as b
  union all
  select 98 as f,   null as b
  union all
  select 3 as f,       6 as b
  union all
  select null as f, null as b
)
select corr(f, b)
from data
at /Users/andrewlamb/Software/datafusion2/datafusion/sqllogictest/test_files/aggregate.slt:539

POSTGRES_PORT: ${{ job.services.postgres.ports[5432] }}

sqllogictest-substrait:
name: "Run sqllogictest in Substrait round-trip mode"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb
Copy link
Contributor

alamb commented Jun 5, 2025

Thank you for the review @2010YOUY01

@alamb alamb merged commit eeee6b0 into apache:main Jun 5, 2025
30 checks passed
kosiew pushed a commit to kosiew/datafusion that referenced this pull request Jun 9, 2025
* Add substrait roundtrip option in sqllogictests

* Fix doc link and missing license header

* Add README.md entry for the Substrait round-trip mode

* Link tracking issue in README.md

* Use clap's `conflicts_with` instead of manually checking flag compatibility

* Add sqllogictest-substrait job to the CI

* Revert committed formatting changes to README.md
Comment on lines +148 to +154
(_, _, true) => {
run_test_file_substrait_round_trip(
test_file,
validator,
m_clone,
m_style_clone,
filters.as_ref(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't ignore postgres_runner and complete arguments.
Unsupported arg combinations should be an error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think clap should be taking care of that

#[clap(
long,
conflicts_with = "complete",
conflicts_with = "postgres_runner",
help = "Before executing each query, convert its logical plan to Substrait and from Substrait back to its logical plan"
)]
substrait_round_trip: bool,

As suggested in #16183 (comment).

If you try to run incompatible commands you would get something like this:

error: the argument '--substrait-round-trip' cannot be used with '--complete'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point and i haven't checked that

in the past i've observed some param combinations being ignored...
it's best if the clap does that (prints best error messages), but IMO the code further down should not rely on clap validation (think: args struct could come from some other source) and thus should re-validate.

anyway, that's a nit, since we have the clap validation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of DataFusion sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants