Skip to content

Conversation

@yyy1000
Copy link
Contributor

@yyy1000 yyy1000 commented May 7, 2024

Which issue does this PR close?

Closes #10376 .

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label May 7, 2024
@yyy1000 yyy1000 changed the title enable user defined display_name Enable user defined display_name for ScalarUDF May 7, 2024
@jayzhan211
Copy link
Contributor

Do you mind adding display_name for get_field that shows what I expect in the example given in #10376?

@yyy1000
Copy link
Contributor Author

yyy1000 commented May 8, 2024

Do you mind adding display_name for get_field that shows what I expect in the example given in #10376?

Cool, will do that.

@yyy1000 yyy1000 marked this pull request as draft May 8, 2024 02:49
@github-actions github-actions bot added the core Core DataFusion crate label May 8, 2024
@yyy1000 yyy1000 marked this pull request as ready for review May 8, 2024 03:49
@yyy1000
Copy link
Contributor Author

yyy1000 commented May 8, 2024

It's good now. @jayzhan211 You can review it when you are available. :)
In this example

async fn test_udaf_returning_struct_subquery() {
let TestContext { ctx, test_state: _ } = TestContext::new();
let sql = "select sq.first['value'], sq.first['time'] from (SELECT first(value, time) as first from t) as sq";
let expected = [
"+-----------------+----------------------------+",
"| sq.first[value] | sq.first[time] |",
"+-----------------+----------------------------+",
"| 2.0 | 1970-01-01T00:00:00.000002 |",
"+-----------------+----------------------------+",
];
assert_batches_eq!(expected, &execute(&ctx, sql).await.unwrap());
}
, after apply_function_rewrites, the logical plan would not create an alias because the display_name of get_field is the same as sq.first[value]

In this PR:
"| initial_logical_plan | Projection: (sq.first)[value], (sq.first)[time] |",
"| | SubqueryAlias: sq |",
"| | Projection: first(t.value,t.time) AS first |",
"| | Aggregate: groupBy=[[]], aggr=[[first(t.value, t.time)]] |",
"| | TableScan: t |",
"| logical_plan after apply_function_rewrites | Projection: get_field(sq.first, Utf8("value")), get_field(sq.first, Utf8("time")) |",
"| | SubqueryAlias: sq |",
"| | Projection: first(t.value,t.time) AS first |",
"| | Aggregate: groupBy=[[]], aggr=[[first(t.value, t.time)]] |",
"| | TableScan: t |",

Before this PR:
"| initial_logical_plan | Projection: (sq.first)[value], (sq.first)[time] |",
"| | SubqueryAlias: sq |",
"| | Projection: first(t.value,t.time) AS first |",
"| | Aggregate: groupBy=[[]], aggr=[[first(t.value, t.time)]] |",
"| | TableScan: t |",
"| logical_plan after apply_function_rewrites | Projection: get_field(sq.first, Utf8("value")) AS sq.first[value], get_field(sq.first, Utf8("time")) AS sq.first[time] |",
"| | SubqueryAlias: sq |",
"| | Projection: first(t.value,t.time) AS first |",
"| | Aggregate: groupBy=[[]], aggr=[[first(t.value, t.time)]] |",
"| | TableScan: t |",
"| logical_plan after inline_table_scan | SAME TEXT AS ABOVE


Ok(format!("{}[{}]", args[0].display_name()?, name))
}

Copy link
Contributor

@jayzhan211 jayzhan211 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cli in this PR

DataFusion CLI v38.0.0
> select struct('a');
+-------------------+
| struct(Utf8("a")) |
+-------------------+
| {c0: a}           |
+-------------------+
1 row(s) fetched. 
Elapsed 0.023 seconds.

> select get_field(struct('a'), 'c0');
+-----------------------+
| struct(Utf8("a"))[c0] |
+-----------------------+
| a                     |
+-----------------------+
1 row(s) fetched. 
Elapsed 0.007 seconds.

main branch:

DataFusion CLI v38.0.0
> select struct('a');
+-------------------+
| struct(Utf8("a")) |
+-------------------+
| {c0: a}           |
+-------------------+
1 row(s) fetched. 
Elapsed 0.014 seconds.

> select get_field(struct('a'), 'c0');
+-----------------------------------------+
| get_field(struct(Utf8("a")),Utf8("c0")) |
+-----------------------------------------+
| a                                       |
+-----------------------------------------+
1 row(s) fetched. 
Elapsed 0.004 seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this example has better view. 🙌

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 7df085e into apache:main May 8, 2024
@alamb
Copy link
Contributor

alamb commented May 8, 2024

Thanks @yyy1000 and @jayzhan211

@yyy1000 yyy1000 deleted the display_name branch May 8, 2024 17:11
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* enable user defined display_name

* add display_name to get_field

* add physical name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support user defined display for UDF

3 participants