Skip to content

Conversation

@adriangb
Copy link
Contributor

Summary

  • Fix "Uncomparable values" error when push_down_filter optimizer tries to compare incompatible types during predicate simplification
  • Handle the case where scalar values with incompatible types (e.g., TimestampMicrosecond vs Time64Nanosecond) cannot be compared
  • Catch comparison errors in find_most_restrictive_predicate and return None to indicate predicate simplification cannot proceed
  • Add regression test for the issue

Test plan

  • Added regression test that reproduces the original issue and verifies the fix
  • Test passes with the fix and would fail without it
  • Existing tests continue to pass

Fixes #17512

🤖 Generated with Claude Code

Handle the case where scalar values with incompatible types (e.g.,
TimestampMicrosecond vs Time64Nanosecond) cannot be compared during
predicate simplification. The fix catches the comparison error in
find_most_restrictive_predicate and returns None to indicate we can't
simplify when types are incompatible, preventing the "Uncomparable values"
error.

Fixes apache#17512

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Sep 11, 2025
@adriangb adriangb requested a review from findepi September 11, 2025 12:04
Comment on lines 209 to 210
let Ok(comparison) = scalar.try_cmp(current_best) else {
// Can't compare - types are incompatible, so we can't simplify
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are predicates: &[Expr] supposed to be predicates on one column, or arbitrary predicates in the query?
In the first case, they must be same type, so comparable and error should be propagated. If they happen to be incomparable, maybe the plan got degenerated (eg sides of a comparison have different types).

If the predicates are not on one column, the comparison is meaningless and should not be attempted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is wrong, can we add another test that fails with this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, please do

this function is used inside fn simplify_column_predicates(predicates: Vec<Expr>) -> Result<Vec<Expr>>
returning Ok(None) is always correct from correctness perspective. i.e. we won't simplify something that we could simplify. this also leads as to a test we could add

SELECT ... FROM ... WHERE a < 5 AND CAST(a AS varchar) < 'abc' AND a < 6

this should be simplified to a < 5 AND CAST(a AS varchar). I.e. the a ? const comparisons can be simplified using find_most_restrictive_predicate. The f(a) ? const comparisons cannot.

current code in main fails (bad)
current code in PR does not fail (better), but the root cause (comparing unrelated values) is not fixed (bad)

probably a better fix would be to delete the line here

// Handle cases where the column might be wrapped in a cast or other operation
Expr::Cast(Cast { expr, .. }) => extract_column_from_expr(expr),

Comment on lines +418 to +420
WHERE start_timestamp <= '2025-01-01T00:00:00Z'::timestamptz
) AS t
WHERE t.start_timestamp::time < '00:00:01'::time;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like one predicate is on start_timestamp and the other is on start_timestamp::time.
from predicate pushdown perspective, the latter is useless.
from find_most_restrictive_predicate perspective, a predicate c < x1 and f(c) < x2 are incomparable. They need to take the f into account. The optimizer that does that is called "unwrap cast in comparison" AFAICT. The find_most_restrictive_predicate should operate only on predicates comparing column c directly, ignoring those which compare f(c).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need to take the f into account. The optimizer that does that is called "unwrap cast in comparison" AFAICT.

Yes I agree (assuming f is a cast expression)

The find_most_restrictive_predicate should operate only on predicates comparing column c directly, ignoring those which compare f(c).

That is my understanding of what this PR does. I am not sure if you are just confirming this change or if you are proposing / suggesting something more

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find_most_restrictive_predicate should operate only on predicates comparing column c directly, ignoring those which compare f(c).

That is my understanding of what this PR does. I am not sure if you are just confirming this change or if you are proposing / suggesting something more

At the time of writing it was a proposal.
Now that the proposed change has been applied, it can be read as a confirmation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense -- thank you

@adriangb adriangb requested a review from findepi September 11, 2025 15:37
@adriangb
Copy link
Contributor Author

@findepi thank you so much for reviewing, I've addressed your comments and added more unit style tests

@findepi
Copy link
Member

findepi commented Sep 12, 2025

cc @alamb @ozankabak

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @adriangb and @findepi

It seems like this came in via 969ed5e / #16362 from @xudong963 (FYI)

fn extract_column_from_expr(expr: &Expr) -> Option<Column> {
match expr {
Expr::Column(col) => Some(col.clone()),
// Handle cases where the column might be wrapped in a cast or other operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +418 to +420
WHERE start_timestamp <= '2025-01-01T00:00:00Z'::timestamptz
) AS t
WHERE t.start_timestamp::time < '00:00:01'::time;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need to take the f into account. The optimizer that does that is called "unwrap cast in comparison" AFAICT.

Yes I agree (assuming f is a cast expression)

The find_most_restrictive_predicate should operate only on predicates comparing column c directly, ignoring those which compare f(c).

That is my understanding of what this PR does. I am not sure if you are just confirming this change or if you are proposing / suggesting something more

@findepi findepi merged commit fdc54b7 into apache:main Sep 12, 2025
28 checks passed
samueleresca pushed a commit to samueleresca/datafusion that referenced this pull request Sep 12, 2025
…er (apache#17521)

* Fix predicate simplification for incompatible types in push_down_filter

Handle the case where scalar values with incompatible types (e.g.,
TimestampMicrosecond vs Time64Nanosecond) cannot be compared during
predicate simplification. The fix catches the comparison error in
find_most_restrictive_predicate and returns None to indicate we can't
simplify when types are incompatible, preventing the "Uncomparable values"
error.

Fixes apache#17512

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* minimize diff

* use proposed fix, add test

* revert other changes

---------

Co-authored-by: Claude <noreply@anthropic.com>
samueleresca pushed a commit to samueleresca/datafusion that referenced this pull request Sep 12, 2025
…er (apache#17521)

* Fix predicate simplification for incompatible types in push_down_filter

Handle the case where scalar values with incompatible types (e.g.,
TimestampMicrosecond vs Time64Nanosecond) cannot be compared during
predicate simplification. The fix catches the comparison error in
find_most_restrictive_predicate and returns None to indicate we can't
simplify when types are incompatible, preventing the "Uncomparable values"
error.

Fixes apache#17512

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* minimize diff

* use proposed fix, add test

* revert other changes

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical optimizer pushdown_filters rule fails with relatively simple query

3 participants