Skip to content

Conversation

@tz70s
Copy link
Contributor

@tz70s tz70s commented May 3, 2023

Which issue does this PR close?

Closes #6194.

Rationale for this change

Implemented Display trait for FileScanConfig to be consistently applied in all Exec plan block formatting process.

Only added those in-used fields.

Highlight:

  • Add an extra materialization cost of project method while calling fmt_as without introducing cyclic dependency, should be manageable as explain is not critical.
  • The fields are fixed (file_groups, projection, limit, output_ordering) to make the output deterministic and consistent, e.g. previously we print limit=None rather than omitting it, keep this behavior. The drawback of this approach is each time we add a new field (even not appearing in plan) will make massive change on test output, but even we omit the field many not fully solve the problem for extending fields.

What changes are included in this PR?

Implemented Display trait for FileScanConfig to be consistently applied in all Exec plan block formatting process.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes (explain output)

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels May 3, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tz70s -- this looks great.

I had some small suggestions, but overall really nice work 👌 Thank you so much

"AggregateExec: mode=Final, gby=[], aggr=[COUNT(2)]",
"AggregateExec: mode=Partial, gby=[], aggr=[COUNT(1)]",
"ParquetExec: limit=None, partitions={1 group: [[x]]}, projection=[a, b, c]",
"ParquetExec: file_groups={1 group: [[x]]}, projection=[a, b, c], limit=None, output_ordering=[]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to keep the existing behavior and only print out limit, projection and output_ordering if they were non empty?

So like in this case it would be

file_groups={1 group: [[x]]}, projection=[a, b, c]",

Because limit is None and output_ordering is empty?

The rationale is to keep the display easier to read by keeping the output concise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup just now I second that a compact output will be better.

Made the change accordingly!

super::FileGroupsDisplay(&self.base_config.file_groups),
self.base_config.limit,
)
write!(f, "AvroExec: {}", self.base_config())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ that really looks much nicer

}
}

fn make_output_ordering_string(ordering: &[PhysicalSortExpr]) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could avoid the extra copy / materialization by doing something like

struct DisplayableOrdering( &[PhysicalSortExpr]);

impl Display for DisplayableOrdering {
    fn fmt(&self, f: &mut Formatter) -> FmtResult {
       // use self.0 here and call write! to f
    } 
}

Alternately you could probably use join here instead well: https://doc.rust-lang.org/std/primitive.slice.html#method.join
https://doc.rust-lang.org/std/primitive.slice.html#method.join

Copy link
Contributor Author

@tz70s tz70s May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I made the change with a wrapper.

However, I kept the imperative style instead of using join as I think it will do the actual copy, though I don't think performance matters that much for such path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- sounds good

TableScan: aggregate_test_100_with_order projection=[c1]
physical_plan
GlobalLimitExec: skip=0, fetch=10
CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1], limit=None, output_ordering=[c1@0 ASC NULLS LAST], has_header=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@tz70s tz70s force-pushed the add-output-ordering branch from c6c12c6 to f8d0bd3 Compare May 4, 2023 03:39
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tz70s -- this looks really nice

}
}

impl Display for FileScanConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great -- thank you

@alamb
Copy link
Contributor

alamb commented May 4, 2023

I took the liberty of merging this branch from main, fixing clippy and updating some remaining sqllogictests to get CI to pass

@alamb
Copy link
Contributor

alamb commented May 4, 2023

Thanks again @tz70s

@alamb alamb merged commit 6be75ff into apache:main May 4, 2023
@tz70s
Copy link
Contributor Author

tz70s commented May 4, 2023

Thanks @alamb for the review & helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explain plan does not always show ordering

2 participants