fix: Issue 5188 eliminate_join_marks has multiple issues#5189
fix: Issue 5188 eliminate_join_marks has multiple issues#5189georgesittas merged 12 commits intotobymao:mainfrom
Conversation
|
Appreciate the effort @snovik75, we'll take a look soon! |
|
I will push some minor changes later today |
I think I am good now. Added an extra test and cleaned up a bit |
|
finally got the formatting right. as a possible suggestion - better have line length set at 100 for ruff in pyproject.toml instead of pre-commit |
|
Did you run |
nope. I did not know about it. it somewhat seems redundant to be honest. because if I have ruff and use an IDE which supports it like vsc, it will re-format the file to default settings which I think is line length 80. having it set in pyproject will be a clean setup. but ok, it is a side point |
|
We don't make any assumptions about the environment you're working in, so the |
|
I think I found a side case bug... |
|
likely it existed in the old codebase as well. now it should be good also with an extra test case |
georgesittas
left a comment
There was a problem hiding this comment.
Great work @snovik75. I've only a few minor comments and a couple of suggestions. Thanks for taking this on!
| continue | ||
| # knockout: we do not support left correlation (see point 2) | ||
| assert not any( | ||
| c.args.get("join_mark") for e in query.expressions for c in e.find_all(exp.Column) |
There was a problem hiding this comment.
I wonder if we could leverage Scope.is_correlated_subquery to detect the correlation here. Have you played around with it in this context to see how it behaves?
There was a problem hiding this comment.
this is where I got lost because while external_columns are correctly calculated, the can_be_correlated did not seem to lead anywhere because it was just a parameter into the Scope constructor. and I did not find any example on how to use it with e.g. traverse_scope
@property
def is_correlated_subquery(self):
"""Determine if this scope is a correlated subquery"""
return bool(self.can_be_correlated and self.external_columns)
There was a problem hiding this comment.
The can_be_correlated attribute gets set at branch time:
...
can_be_correlated=self.can_be_correlated
or scope_type in (ScopeType.SUBQUERY, ScopeType.UDTF),So it should be true in your case, given that the scope is a SUBQUERY if I'm not mistaken.
There was a problem hiding this comment.
self.can_be_correlated
I tried this already and it is either False or even None
There was a problem hiding this comment.
So with this diff I see all tests passing:
- assert not any(
- c.args.get("join_mark") for e in query.expressions for c in e.find_all(exp.Column)
- ), "Correlated queries are not supported"
+ assert not scope.is_correlated_subquery, "Correlated queries are not supported"
+ # assert not any(
+ # c.args.get("join_mark") for e in query.expressions for c in e.find_all(exp.Column)
+ # ), "Correlated queries are not supported"
Are you saying you tried it and it failed, or something else?
There was a problem hiding this comment.
you are right, it works fine. you can also remove the reversed from the traverse_scope.
when I tried it I used a select without a correlated subquery. scope generation is a little bit over my head still :)
georgesittas
left a comment
There was a problem hiding this comment.
I'll get this in as is and see if it makes sense to use is_correlated_subquery.
becomes a broken CASE statement
changes
ORtoANDtbh I am not really happy with the code. looks a bit ugly for my taste. looking forward to suggestions
I will have a fresh look tomorrow
Fixes #5188