Skip to content

fix: Issue 5188 eliminate_join_marks has multiple issues#5189

Merged
georgesittas merged 12 commits intotobymao:mainfrom
snovik75:issue_5188
Jun 10, 2025
Merged

fix: Issue 5188 eliminate_join_marks has multiple issues#5189
georgesittas merged 12 commits intotobymao:mainfrom
snovik75:issue_5188

Conversation

@snovik75
Copy link
Contributor

@snovik75 snovik75 commented Jun 8, 2025

select t1.a, t2.b
from t1, t2
where t1.id = case when t2.id (+) = "n/a" then null else t2.id (+) end

becomes a broken CASE statement

SELECT
  t1.a,
  t2.b
FROM t1
LEFT JOIN t2
  ON t2.id = "n/a" AND t1.id = CASE WHEN  THEN NULL ELSE t2.id END
select t1.a, t2.b
from t1, t2
where t1.id = t2.id1 (+) or t1.id = t2.id2 (+)

changes OR to AND

SELECT
  t1.a,
  t2.b
FROM t1
LEFT JOIN t2
  ON t1.id = t2.id1 AND t1.id = t2.id2

tbh I am not really happy with the code. looks a bit ugly for my taste. looking forward to suggestions

I will have a fresh look tomorrow

Fixes #5188

@georgesittas
Copy link
Collaborator

Appreciate the effort @snovik75, we'll take a look soon!

@snovik75
Copy link
Contributor Author

snovik75 commented Jun 9, 2025

I will push some minor changes later today

@snovik75
Copy link
Contributor Author

Appreciate the effort @snovik75, we'll take a look soon!

I think I am good now. Added an extra test and cleaned up a bit

@snovik75
Copy link
Contributor Author

finally got the formatting right.

as a possible suggestion - better have line length set at 100 for ruff in pyproject.toml instead of pre-commit

@georgesittas
Copy link
Collaborator

Did you run make style? This should be automated enough already for dev work.

@snovik75
Copy link
Contributor Author

Did you run make style? This should be automated enough already for dev work.

nope. I did not know about it. it somewhat seems redundant to be honest. because if I have ruff and use an IDE which supports it like vsc, it will re-format the file to default settings which I think is line length 80. having it set in pyproject will be a clean setup. but ok, it is a side point

@georgesittas
Copy link
Collaborator

We don't make any assumptions about the environment you're working in, so the Makefile rules are the source of truth. We have a section for this in the README, but I don't see it mentioned in the contribution guide. Will go ahead and include it.

@snovik75
Copy link
Contributor Author

I think I found a side case bug...

@snovik75
Copy link
Contributor Author

likely it existed in the old codebase as well. now it should be good also with an extra test case

Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @snovik75. I've only a few minor comments and a couple of suggestions. Thanks for taking this on!

continue
# knockout: we do not support left correlation (see point 2)
assert not any(
c.args.get("join_mark") for e in query.expressions for c in e.find_all(exp.Column)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could leverage Scope.is_correlated_subquery to detect the correlation here. Have you played around with it in this context to see how it behaves?

Copy link
Contributor Author

@snovik75 snovik75 Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where I got lost because while external_columns are correctly calculated, the can_be_correlated did not seem to lead anywhere because it was just a parameter into the Scope constructor. and I did not find any example on how to use it with e.g. traverse_scope

    @property
    def is_correlated_subquery(self):
        """Determine if this scope is a correlated subquery"""
        return bool(self.can_be_correlated and self.external_columns)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The can_be_correlated attribute gets set at branch time:

            ...
            can_be_correlated=self.can_be_correlated
            or scope_type in (ScopeType.SUBQUERY, ScopeType.UDTF),

So it should be true in your case, given that the scope is a SUBQUERY if I'm not mistaken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.can_be_correlated

I tried this already and it is either False or even None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with this diff I see all tests passing:

-        assert not any(
-            c.args.get("join_mark") for e in query.expressions for c in e.find_all(exp.Column)
-        ), "Correlated queries are not supported"
+        assert not scope.is_correlated_subquery, "Correlated queries are not supported"
+        # assert not any(
+        #     c.args.get("join_mark") for e in query.expressions for c in e.find_all(exp.Column)
+        # ), "Correlated queries are not supported"

Are you saying you tried it and it failed, or something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, it works fine. you can also remove the reversed from the traverse_scope.

when I tried it I used a select without a correlated subquery. scope generation is a little bit over my head still :)

@snovik75 snovik75 requested a review from georgesittas June 10, 2025 10:50
Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get this in as is and see if it makes sense to use is_correlated_subquery.

@georgesittas georgesittas merged commit d0eeb26 into tobymao:main Jun 10, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

eliminate_join_marks has multiple issues

2 participants