feat(sql): Postgres-style EXPLAIN (...) option list#21768
Draft
adriangb wants to merge 3 commits intoapache:mainfrom
Draft
feat(sql): Postgres-style EXPLAIN (...) option list#21768adriangb wants to merge 3 commits intoapache:mainfrom
EXPLAIN (...) option list#21768adriangb wants to merge 3 commits intoapache:mainfrom
Conversation
Extends DataFusion's `EXPLAIN` to accept a Postgres-style parenthesized option list alongside the existing keyword form, on dialects that enable it (the default `GenericDialect`, `PostgreSqlDialect`, `DuckDbDialect`, etc.). This surfaces the metric-category and verbosity knobs introduced in PR apache#21160 (currently only reachable via `SET`) directly in the statement, matching Postgres's one-liner ergonomics: EXPLAIN (ANALYZE, VERBOSE, METRICS 'rows,bytes', LEVEL dev) SELECT ... Options recognized: `ANALYZE`, `VERBOSE`, `FORMAT`, `METRICS`, `LEVEL`, `TIMING`, `SUMMARY`, `COSTS`. Statement-level values override the corresponding session config. Postgres-only options that DataFusion does not model (`BUFFERS`, `WAL`, `SETTINGS`, `GENERIC_PLAN`, `MEMORY`) return a clear unsupported-option error rather than silently accepting them. The legacy keyword form (`EXPLAIN ANALYZE VERBOSE FORMAT tree ...`) is unchanged. Parser delegates to sqlparser's `parse_utility_options()` under the dialect gate; a new `ExplainStatementOptions` struct in `datafusion-common` normalizes both forms into a single representation that flows through `explain_to_plan` into the `Analyze` / `Explain` logical plan nodes. `handle_analyze` / `handle_explain` in the physical planner prefer statement-level overrides over session config before constructing `AnalyzeExec` / `ExplainExec`. Proto serialization of the new fields is left as a follow-up (TODO comments in `datafusion/proto/src/logical_plan/mod.rs`); fields default to `None` on the other side, matching prior behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DataFusion has long accepted parentheses wrapping the query after `EXPLAIN` (e.g. `EXPLAIN (SELECT ...)` and `EXPLAIN (q1 EXCEPT q2) UNION ALL (q3 EXCEPT q4)`). The initial cut of the Postgres-style option-list parser treated every leading `(` as an option list, breaking those cases. Disambiguate by peeking one token past the `(`: if it starts a query (`SELECT`, `WITH`, `VALUES`, `TABLE`, `INSERT`, `UPDATE`, `DELETE`, `MERGE`, or another `(`), fall through to the legacy parser. Adds `explain_paren_grouping_query_is_not_mistaken_for_options` to cover the regression set that CI surfaced (`references.slt`, `union.slt`). Also reformats `docs/source/user-guide/explain-usage.md` to satisfy `prettier 2.7.1` (column-width alignment only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`parse_explain` is public and its doc linked to the private `token_starts_query` helper, which makes rustdoc error out. Drop the link; the prose already conveys the disambiguation logic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8303099 to
e7db7bb
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
(Follow-up to #21160, which introduced per-category metric filtering via session config. This PR lets users reach those knobs inline from the EXPLAIN statement.)
Rationale for this change
#21160 added metric categories (
Rows,Bytes,Timing,Uncategorized) and a verbosity level (Summary,Dev) to DataFusion's metrics, exposed today only via session config:datafusion.explain.analyze_categoriesdatafusion.explain.analyze_levelUsers have to
SETthese out-of-band before runningEXPLAIN ANALYZE, which is awkward for ad-hoc debugging. Postgres solves this with its parenthesized option list:EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS, WAL) SELECT ... ;This PR adds the same ergonomics to DataFusion, mapping option names to DataFusion's existing semantics rather than Postgres's buffer/WAL model.
What changes are included in this PR?
Parser. On dialects whose
supports_explain_with_utility_options()returns true (the defaultGenericDialect,PostgreSqlDialect,DuckDbDialect, etc.),DFParser::parse_explaindelegates to sqlparser'spub fn parse_utility_options()and feeds the result through a newExplainStatementOptions::from_utility_options. The legacy keyword form (EXPLAIN ANALYZE VERBOSE FORMAT tree ...) is unchanged.Normalized option type. A new
ExplainStatementOptionsindatafusion-commoncaptures the knobs parsed from either form. Argument parsing reuses existingExplainFormat::from_str,ExplainAnalyzeCategories::from_str, andMetricType::from_str.Options accepted:
ANALYZEANALYZEVERBOSEVERBOSEFORMATindent/tree/pgjson/graphvizMETRICS'all','none', or comma-separatedrows,bytes,timing,uncategorizedLEVELsummaryordevTIMINGtimingcategorySUMMARYsummary, FALSE →devCOSTSshow_statisticsoverride (not valid withANALYZE)Postgres-only options (
BUFFERS,WAL,SETTINGS,GENERIC_PLAN,MEMORY) return a helpful unsupported-option error.Logical plan.
Analyzegainsanalyze_level: Option<MetricType>andanalyze_categories: Option<ExplainAnalyzeCategories>.Explaingainsshow_statistics: Option<bool>.Nonemeans "fall back to session config" — existing callers are unchanged.Physical planner.
handle_analyzeandhandle_explainprefer statement-level overrides over session config before constructingAnalyzeExec/ExplainExec.AnalyzeExecitself needs no change — it already accepts the filters from #21160.Proto (follow-up, see TODOs in
datafusion/proto/src/logical_plan/mod.rs): the new override fields are not yet serialized. They default toNoneon the remote side, matching pre-PR behavior; round-trip tests still pass.Are these changes tested?
Yes:
datafusion/sql/src/parser.rscover legacy keyword form on PostgreSQL dialect, each option form (bare,= val,ON/OFF, quoted), unknown-option errors, dialect gating (the parenthesized form is rejected under a dialect that doesn't enable it), and the error path for unsupported Postgres-only options.datafusion/core/tests/sql/explain_analyze.rs—explain_analyze_paren_metrics_filtering,explain_analyze_paren_level_overrides_session_config,explain_analyze_paren_metrics_overrides_session_config,explain_paren_buffers_rejected.datafusion/sqllogictest/test_files/explain.sltcovering the parenthesized form, round-trip with the legacy form, and each error path.Ran
cargo fmt --allandcargo clippy --all-targets --all-features -- -D warnings(clean). Two pre-existing test failures onmain(test_display_pg_jsonsnapshot and apgjsonSLT case atexplain.slt:642) are unrelated to this change — verified by running them against a clean checkout of the same base commit.Are there any user-facing changes?
Yes — new syntax. User-facing docs updated at
docs/source/user-guide/explain-usage.mdwith a new section describing the option list and the dialect gate. No breaking changes: the legacy keyword form continues to work exactly as before.🤖 Generated with Claude Code