-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add LogicalPlan::SubqueryAlias #2172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d5d19a1
0ca7b69
332444b
fccd8d0
8c60e6a
8d2d964
575f2b9
2716f69
1f043b8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,7 +21,7 @@ | |
| use crate::error::{DataFusionError, Result}; | ||
| use crate::execution::context::ExecutionProps; | ||
| use crate::logical_plan::plan::{ | ||
| Aggregate, Analyze, Join, Projection, TableScan, Window, | ||
| Aggregate, Analyze, Join, Projection, SubqueryAlias, TableScan, Window, | ||
| }; | ||
| use crate::logical_plan::{ | ||
| build_join_schema, Column, DFField, DFSchema, DFSchemaRef, LogicalPlan, | ||
|
|
@@ -432,6 +432,34 @@ fn optimize_plan( | |
| alias: alias.clone(), | ||
| })) | ||
| } | ||
| LogicalPlan::SubqueryAlias(SubqueryAlias { input, alias, .. }) => { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it worth a test for projection pushdown specifically?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. Test added. |
||
| match input.as_ref() { | ||
| LogicalPlan::TableScan(TableScan { table_name, .. }) => { | ||
| let new_required_columns = new_required_columns | ||
| .iter() | ||
| .map(|c| match &c.relation { | ||
| Some(q) if q == alias => Column { | ||
| relation: Some(table_name.clone()), | ||
| name: c.name.clone(), | ||
| }, | ||
| _ => c.clone(), | ||
| }) | ||
| .collect(); | ||
| let new_inputs = vec![optimize_plan( | ||
| _optimizer, | ||
| input, | ||
| &new_required_columns, | ||
| has_projection, | ||
| _execution_props, | ||
| )?]; | ||
| let expr = vec![]; | ||
| utils::from_plan(plan, &expr, &new_inputs) | ||
| } | ||
| _ => Err(DataFusionError::Plan( | ||
| "SubqueryAlias should only wrap TableScan".to_string(), | ||
| )), | ||
| } | ||
| } | ||
| // all other nodes: Add any additional columns used by | ||
| // expressions in this node to the list of required columns | ||
| LogicalPlan::Limit(_) | ||
|
|
@@ -515,6 +543,24 @@ mod tests { | |
| Ok(()) | ||
| } | ||
|
|
||
| #[test] | ||
| fn aggregate_group_by_with_table_alias() -> Result<()> { | ||
| let table_scan = test_table_scan()?; | ||
|
|
||
| let plan = LogicalPlanBuilder::from(table_scan) | ||
| .alias("a")? | ||
| .aggregate(vec![col("c")], vec![max(col("b"))])? | ||
| .build()?; | ||
|
|
||
| let expected = "Aggregate: groupBy=[[#a.c]], aggr=[[MAX(#a.b)]]\ | ||
| \n SubqueryAlias: a\ | ||
| \n TableScan: test projection=Some([1, 2])"; | ||
|
|
||
| assert_optimized_plan_eq(&plan, expected); | ||
|
|
||
| Ok(()) | ||
| } | ||
|
|
||
| #[test] | ||
| fn aggregate_no_group_by_with_filter() -> Result<()> { | ||
| let table_scan = test_table_scan()?; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -644,16 +644,15 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |
| self.schema_provider.get_table_provider(name.try_into()?), | ||
| ) { | ||
| (Some(cte_plan), _) => Ok(cte_plan.clone()), | ||
| (_, Some(provider)) => LogicalPlanBuilder::scan( | ||
| // take alias into account to support `JOIN table1 as table2` | ||
| alias | ||
| .as_ref() | ||
| .map(|a| a.name.value.as_str()) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
| .unwrap_or(&table_name), | ||
| provider, | ||
| None, | ||
| )? | ||
| .build(), | ||
| (_, Some(provider)) => { | ||
| let scan = | ||
| LogicalPlanBuilder::scan(&table_name, provider, None); | ||
| let scan = match alias { | ||
| Some(ref name) => scan?.alias(name.name.value.as_str()), | ||
| _ => scan, | ||
| }; | ||
| scan?.build() | ||
| } | ||
| (None, None) => Err(DataFusionError::Plan(format!( | ||
| "Table or CTE with name '{}' not found", | ||
| name | ||
|
|
@@ -2492,7 +2491,8 @@ mod tests { | |
| FROM lineitem l (a, b, c)"; | ||
| let expected = "Projection: #l.a, #l.b, #l.c\ | ||
| \n Projection: #l.l_item_id AS a, #l.l_description AS b, #l.price AS c, alias=l\ | ||
| \n TableScan: l projection=None"; | ||
| \n SubqueryAlias: l\ | ||
| \n TableScan: lineitem projection=None"; | ||
| quick_test(sql, expected); | ||
| } | ||
|
|
||
|
|
@@ -3458,7 +3458,8 @@ mod tests { | |
| let expected = "Projection: #person.first_name, #person.id\ | ||
| \n Inner Join: Using #person.id = #person2.id\ | ||
| \n TableScan: person projection=None\ | ||
| \n TableScan: person2 projection=None"; | ||
| \n SubqueryAlias: person2\ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the appearance of
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I originally named this
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. me neither. Whatever you prefer is fine with me
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, let's leave it as is for now since we will make use of it for subqueries (soon, hopefully). I filed follow-on issues for use this approach for projection and union: |
||
| \n TableScan: person projection=None"; | ||
| quick_test(sql, expected); | ||
| } | ||
|
|
||
|
|
@@ -3471,7 +3472,8 @@ mod tests { | |
| let expected = "Projection: #lineitem.l_item_id, #lineitem.l_description, #lineitem.price, #lineitem2.l_description, #lineitem2.price\ | ||
| \n Inner Join: Using #lineitem.l_item_id = #lineitem2.l_item_id\ | ||
| \n TableScan: lineitem projection=None\ | ||
| \n TableScan: lineitem2 projection=None"; | ||
| \n SubqueryAlias: lineitem2\ | ||
| \n TableScan: lineitem projection=None"; | ||
| quick_test(sql, expected); | ||
| } | ||
|
|
||
|
|
@@ -4067,6 +4069,18 @@ mod tests { | |
| quick_test(sql, expected); | ||
| } | ||
|
|
||
| #[test] | ||
| fn join_with_aliases() { | ||
| let sql = "select peeps.id, folks.first_name from person as peeps join person as folks on peeps.id = folks.id"; | ||
| let expected = "Projection: #peeps.id, #folks.first_name\ | ||
| \n Inner Join: #peeps.id = #folks.id\ | ||
| \n SubqueryAlias: peeps\ | ||
| \n TableScan: person projection=None\ | ||
| \n SubqueryAlias: folks\ | ||
| \n TableScan: person projection=None"; | ||
| quick_test(sql, expected); | ||
| } | ||
|
|
||
| #[test] | ||
| fn cte_use_same_name_multiple_times() { | ||
| let sql = "with a as (select * from person), a as (select * from orders) select * from a;"; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way that this same aliasing can be represented is a
ProjectionnodeSo if the input has schema with columns
a,b,cThen to implement
alias("foo")you could build aLogicalPlanNodethat wasProject([a foo.a, b as foo.b, c as foo.c])There may be some good reason to introduce a new type of LogicalPlanNode too that I don't understand, but I wanted to point out this alternative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, and I see you don't want to do that :) in the description
I wonder if you could use the
DFSchema::relationname for this purpose?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need something to very specifically say that a table is being used as an alias. An easier change might be just to add an additional field to the
TableScanto record the original table name. I think I will put up a separate PR for that approach.So to explain why I need this. I want to use DataFusion as a SQL parser and planner but I want to execute the query in a different engine. I can provide a
TableProviderso that DataFuision can get the schema for tablepersonand I get a logical plan. When I go to translate that plan to a physical plan, it refers to a table calledpeeps(the alias) and I have no way to know the actual table name.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the Spark plan for this use case:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the approach taken in this PR, as it could handle subqueries rather than just table scans (as suggested by the name).
I think this is a cleaner, more generic approach.