-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35636][SQL] Lambda keys should not be referenced outside of the lambda function #32773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Karen Feng <karen.feng@databricks.com>
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #139299 has finished for PR 32773 at commit
|
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
…e lambda function
Sets `references` for `NamedLambdaVariable` and `LambdaFunction`.
| Expression | NamedLambdaVariable | LambdaFunction |
| --- | --- | --- |
| References before | None | All function references |
| References after | self.toAttribute | Function references minus arguments' references |
In `NestedColumnAliasing`, this means that `ExtractValue(ExtractValue(attr, lv: NamedLambdaVariable), ...)` now references both `attr` and `lv`, rather than just `attr`. As a result, it will not be included in the nested column references.
Before, lambda key was referenced outside of lambda function.
Before:
```
Project [transform(keys#0, lambdafunction(_extract_v1#0, lambda key#0, false)) AS a#0]
+- 'Join Cross
:- Project [kvs#0[lambda key#0].v1 AS _extract_v1#0]
: +- LocalRelation <empty>, [kvs#0]
+- LocalRelation <empty>, [keys#0]
```
After:
```
Project [transform(keys#418, lambdafunction(kvs#417[lambda key#420].v1, lambda key#420, false)) AS a#419]
+- Join Cross
:- LocalRelation <empty>, [kvs#417]
+- LocalRelation <empty>, [keys#418]
```
Before:
```
Project [transform(keys#0, lambdafunction(kvs#0[lambda key#0].v1, lambda key#0, false)) AS a#0]
+- GlobalLimit 5
+- LocalLimit 5
+- Project [keys#0, _extract_v1#0 AS _extract_v1#0]
+- GlobalLimit 5
+- LocalLimit 5
+- Project [kvs#0[lambda key#0].v1 AS _extract_v1#0, keys#0]
+- LocalRelation <empty>, [kvs#0, keys#0]
```
After:
```
Project [transform(keys#428, lambdafunction(kvs#427[lambda key#430].v1, lambda key#430, false)) AS a#429]
+- GlobalLimit 5
+- LocalLimit 5
+- Project [keys#428, kvs#427]
+- GlobalLimit 5
+- LocalLimit 5
+- LocalRelation <empty>, [kvs#427, keys#428]
```
No
Scala unit tests for the examples above
Closes #32773 from karenfeng/SPARK-35636.
Authored-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 53a758b)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
|
Merged to master and branch-3.1. |
|
Reverted in branch-3.1 due to broken tests. |
…nside a nested struct ### What changes were proposed in this pull request? Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`. ### Why are the changes needed? Since #32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute. Talk more: During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Add new test Closes #35170 from ulysses-you/SPARK-37855. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
…ray inside a nested struct This is a backport of #35170 for branch-3.2. ### What changes were proposed in this pull request? Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`. ### Why are the changes needed? Since #32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute. Talk more: During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Add new test Closes #35175 from ulysses-you/SPARK-37855-branch-3.2. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
…nside a nested struct ### What changes were proposed in this pull request? Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`. ### Why are the changes needed? Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute. Talk more: During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Add new test Closes apache#35170 from ulysses-you/SPARK-37855. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
…ray inside a nested struct This is a backport of apache#35170 for branch-3.2. ### What changes were proposed in this pull request? Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`. ### Why are the changes needed? Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute. Talk more: During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Add new test Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
…ray inside a nested struct This is a backport of apache#35170 for branch-3.2. ### What changes were proposed in this pull request? Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`. ### Why are the changes needed? Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute. Talk more: During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Add new test Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
…ray inside a nested struct This is a backport of apache#35170 for branch-3.2. ### What changes were proposed in this pull request? Skip alias the `ExtractValue` whose children contains `NamedLambdaVariable`. ### Why are the changes needed? Since apache#32773, the `NamedLambdaVariable` can produce the references, however it cause the rule `NestedColumnAliasing` alias the `ExtractValue` which contains `NamedLambdaVariable`. It fails since we can not match a `NamedLambdaVariable` to an actual attribute. Talk more: During `NamedLambdaVariable#replaceWithAliases`, it uses the references of nestedField to match the output attributes of grandchildren. However `NamedLambdaVariable` is created at analyzer as a virtual attribute, and it is not resolved from the output of children. So we can not get any attribute when use the references of `NamedLambdaVariable` to match the grandchildren's output. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Add new test Closes apache#35175 from ulysses-you/SPARK-37855-branch-3.2. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com> (cherry picked from commit a58b8a8) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
Sets
referencesforNamedLambdaVariableandLambdaFunction.In
NestedColumnAliasing, this means thatExtractValue(ExtractValue(attr, lv: NamedLambdaVariable), ...)now references bothattrandlv, rather than justattr. As a result, it will not be included in the nested column references.Why are the changes needed?
Before, lambda key was referenced outside of lambda function.
Example 1
Before:
After:
Example 2
Before:
After:
Does this PR introduce any user-facing change?
No
How was this patch tested?
Scala unit tests for the examples above