Skip to content

Conversation

@jackwener
Copy link
Member

Which issue does this PR close?

Closes #2045 .

Rationale for this change

related: #1315

What changes are included in this PR?

Eliminate distinct in the aggregate instead of rewrite

Now the target is rewrite from

| initial_logical_plan | Projection: #Max(DISTINCT test.c1)                     
|                      |   Aggregate: groupBy=[[]], aggr=[[Max(DISTINCT #test.c1)]]                      
|                      |     TableScan: test projection=None

to

| initial_logical_plan | Projection: #Max(DISTINCT test.c1)
|                      |   Projection: #Max(#test.c1) AS Max(DISTINCT test.c1)                   
|                      |     Aggregate: groupBy=[[]], aggr=[[Max(#test.c1)]]                
|                      |       TableScan: test projection=None

instead of

| logical_plan  | Projection: #Max(DISTINCT test.c1)          
|               |   Projection: #Max(alias1) AS Max(DISTINCT test.c1)     
|               |     Aggregate: groupBy=[[]], aggr=[[Max(#alias1)]]        
|               |       Aggregate: groupBy=[[#test.c1 AS alias1]], aggr=[[]]                        
|               |       Aggregate: groupBy=[[#test.c1 AS alias1]], aggr=[[]]  
|               |         TableScan: test projection=Some([0])

Are there any user-facing changes?

None

@jackwener
Copy link
Member Author

@Dandandan @alamb @ic4y @houqp PTAL, Thanks! ❤

@jackwener jackwener changed the title optimizer: eliminate max/min distinct Eliminate max/min distinct Mar 25, 2022
@jackwener
Copy link
Member Author

I found that draft PR also trigger CI

@jackwener
Copy link
Member Author

I found that problem occur when call ExprSchema data_type(), it will be error.

Because index_of_column_by_name can't get test.c1 from max(test.c1)

@jackwener
Copy link
Member Author

Now I rewrite the logical plan. and println! log in terminal.

I think it's ok. but there will be error.

Projection: #MAX(#test.c1) AS MAX(DISTINCT test.c1) 
DFSchema { fields: [DFField { qualifier: None, field: Field { name: "MAX(DISTINCT test.c1)", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None } }], metadata: {} }
Expr[MAX(#test.c1) AS MAX(DISTINCT test.c1)]

	Aggregate: groupBy=[[]], aggr=[[MAX(#test.c1)]]
	DFSchema { fields: [DFField { qualifier: None, field: Field { name: "MAX(test.c1)", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None } }], metadata: {} }
	Expr[MAX(#test.c1)]

		TableScan: test projection=Some([0])
		DFSchema { fields: [DFField { qualifier: Some("test"), field: Field { name: "c1", data_type: Float32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None } }], metadata: {} }

@jackwener jackwener force-pushed the eliminate_agg_distinct branch from 8e29fb6 to 1e55fcb Compare May 18, 2022 14:09
@jackwener jackwener closed this May 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Optimizer] Eliminate the distinct

3 participants