Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Feb 8, 2025

What problem does this PR solve?

Related PR: #43255

Problem Summary:
Example:

CREATE TABLE table_a (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_a VALUES
(1, null),
(2, 18),
(3, null),
(4, 25);

CREATE TABLE table_b (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_b VALUES
(1, null),
(2, null),
(3, 1000000),
(4, 100);

run sql

select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3);

When executing this SQL, the backend generates a runtime filter on the table_a side during the join operation, resulting in a condition like WHERE table_a.age IN (NULL, 1000000). It’s important to note that since <=> is a null-aware comparison operator, the IN predicate must also be null-aware. However, the ORC predicate pushdown API does not support null-aware IN predicates. As a result, our current approach ignores null values, leading to an empty result set for this query.

To fix this bug, we’ve adjusted the logic so that predicates with null-aware comparisons are not pushed down, ensuring the correct result as follows:

+------+------+------+------+
| id   | age  | id   | age  |
+------+------+------+------+
|    1 | NULL |    1 | NULL |
|    3 | NULL |    1 | NULL |
+------+------+------+------+

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31293 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d57a83aad6ed5a5cce8b5dc976632edc70d17352, data reload: false

------ Round 1 ----------------------------------
q1	17589	5210	5037	5037
q2	2055	300	166	166
q3	10418	1223	730	730
q4	10286	1009	549	549
q5	8215	2393	2299	2299
q6	185	166	130	130
q7	879	737	584	584
q8	9284	1279	1084	1084
q9	4883	4820	4577	4577
q10	6857	2283	1878	1878
q11	467	275	248	248
q12	339	357	217	217
q13	17782	3680	3052	3052
q14	228	240	204	204
q15	509	472	460	460
q16	624	608	590	590
q17	561	857	342	342
q18	6544	6244	6188	6188
q19	1524	946	537	537
q20	318	319	187	187
q21	2895	2177	1930	1930
q22	376	330	304	304
Total cold run time: 102818 ms
Total hot run time: 31293 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5088	5078	5089	5078
q2	234	324	230	230
q3	2187	2714	2316	2316
q4	1466	1793	1338	1338
q5	4200	4109	4154	4109
q6	204	162	124	124
q7	1867	1823	1634	1634
q8	2590	2641	2527	2527
q9	7217	7140	7123	7123
q10	3014	3200	2794	2794
q11	589	534	498	498
q12	680	772	616	616
q13	3554	3769	3314	3314
q14	288	295	284	284
q15	527	464	463	463
q16	650	702	623	623
q17	1144	1570	1355	1355
q18	7529	7259	7224	7224
q19	796	817	840	817
q20	1996	2010	1870	1870
q21	5338	5066	4856	4856
q22	669	606	527	527
Total cold run time: 51827 ms
Total hot run time: 49720 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190442 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d57a83aad6ed5a5cce8b5dc976632edc70d17352, data reload: false

query1	1302	967	930	930
query2	6229	1831	1832	1831
query3	10970	4520	4400	4400
query4	55249	25752	22879	22879
query5	4966	567	508	508
query6	336	201	199	199
query7	4914	505	286	286
query8	285	232	214	214
query9	5467	2546	2573	2546
query10	389	297	260	260
query11	15063	15229	14821	14821
query12	166	109	107	107
query13	1065	521	400	400
query14	10171	6856	6855	6855
query15	199	203	197	197
query16	7101	652	506	506
query17	1078	745	608	608
query18	1536	442	314	314
query19	203	203	175	175
query20	133	132	126	126
query21	238	122	103	103
query22	4348	4559	4425	4425
query23	33790	33423	33322	33322
query24	5646	2422	2439	2422
query25	452	475	396	396
query26	769	275	154	154
query27	2103	506	383	383
query28	2743	2413	2419	2413
query29	601	579	440	440
query30	217	189	158	158
query31	901	858	790	790
query32	81	64	58	58
query33	459	358	310	310
query34	769	885	507	507
query35	793	848	740	740
query36	948	975	898	898
query37	126	95	78	78
query38	4244	4510	4238	4238
query39	1475	1427	1410	1410
query40	204	111	101	101
query41	52	47	48	47
query42	115	110	107	107
query43	497	493	477	477
query44	1292	804	804	804
query45	193	178	172	172
query46	869	1069	644	644
query47	1876	1926	1802	1802
query48	380	427	323	323
query49	696	507	421	421
query50	702	785	419	419
query51	4323	4315	4193	4193
query52	102	101	91	91
query53	222	259	200	200
query54	484	478	429	429
query55	82	79	81	79
query56	293	278	274	274
query57	1163	1210	1145	1145
query58	260	239	264	239
query59	2682	2887	2734	2734
query60	287	278	268	268
query61	118	119	113	113
query62	724	734	702	702
query63	238	188	188	188
query64	2037	1055	699	699
query65	3456	3225	3324	3225
query66	821	395	296	296
query67	15758	15576	15385	15385
query68	5573	768	501	501
query69	516	295	272	272
query70	1158	1081	1122	1081
query71	464	299	263	263
query72	6452	3669	3778	3669
query73	1355	753	347	347
query74	8956	9027	9089	9027
query75	3183	3126	2715	2715
query76	3818	1171	749	749
query77	532	366	288	288
query78	10098	10096	9259	9259
query79	2652	812	609	609
query80	655	525	455	455
query81	499	279	245	245
query82	442	157	121	121
query83	179	177	163	163
query84	290	98	80	80
query85	768	347	299	299
query86	358	304	309	304
query87	4614	4644	4514	4514
query88	3578	2194	2185	2185
query89	408	388	287	287
query90	1746	198	190	190
query91	138	138	110	110
query92	74	60	55	55
query93	2481	999	574	574
query94	673	414	312	312
query95	345	269	252	252
query96	480	550	267	267
query97	2840	2854	2776	2776
query98	243	202	207	202
query99	1292	1392	1287	1287
Total cold run time: 294311 ms
Total hot run time: 190442 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.18 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d57a83aad6ed5a5cce8b5dc976632edc70d17352, data reload: false

query1	0.04	0.03	0.03
query2	0.06	0.03	0.03
query3	0.23	0.07	0.06
query4	1.62	0.10	0.10
query5	0.41	0.40	0.38
query6	1.17	0.64	0.67
query7	0.03	0.02	0.01
query8	0.04	0.03	0.03
query9	0.59	0.52	0.52
query10	0.57	0.57	0.56
query11	0.15	0.10	0.11
query12	0.15	0.12	0.11
query13	0.62	0.59	0.62
query14	2.68	2.69	2.70
query15	0.93	0.87	0.86
query16	0.38	0.38	0.39
query17	1.05	1.05	1.02
query18	0.21	0.20	0.20
query19	1.93	1.77	1.94
query20	0.02	0.00	0.01
query21	15.36	0.87	0.54
query22	0.75	1.18	0.65
query23	14.96	1.32	0.60
query24	7.39	1.68	0.54
query25	0.50	0.20	0.06
query26	0.65	0.17	0.13
query27	0.05	0.05	0.05
query28	8.83	0.86	0.44
query29	12.56	4.00	3.26
query30	0.24	0.09	0.07
query31	2.82	0.61	0.39
query32	3.22	0.56	0.46
query33	2.99	3.05	3.06
query34	16.20	5.12	4.52
query35	4.51	4.55	4.53
query36	0.68	0.52	0.49
query37	0.10	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.03	0.02
query40	0.18	0.14	0.13
query41	0.08	0.03	0.03
query42	0.04	0.02	0.02
query43	0.03	0.02	0.03
Total cold run time: 105.09 s
Total hot run time: 30.18 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 42.65% (11173/26196)
Line Coverage: 32.66% (93984/287734)
Region Coverage: 31.82% (48189/151446)
Branch Coverage: 27.74% (24334/87720)
Coverage Report: http://coverage.selectdb-in.cc/coverage/d57a83aad6ed5a5cce8b5dc976632edc70d17352_d57a83aad6ed5a5cce8b5dc976632edc70d17352/report/index.html

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2025

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 8, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2025

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 853e99b into apache:master Feb 8, 2025
23 of 26 checks passed
github-actions bot pushed a commit that referenced this pull request Feb 8, 2025
### What problem does this PR solve?

Related PR: #43255

Problem Summary:
Example:
```sql
CREATE TABLE table_a (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_a VALUES
(1, null),
(2, 18),
(3, null),
(4, 25);

CREATE TABLE table_b (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_b VALUES
(1, null),
(2, null),
(3, 1000000),
(4, 100);
```
run sql
```
select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3);
```
When executing this SQL, the backend generates a runtime filter on the
table_a side during the join operation, resulting in a condition like
WHERE table_a.age IN (NULL, 1000000). It’s important to note that since
<=> is a null-aware comparison operator, the IN predicate must also be
null-aware. However, the ORC predicate pushdown API does not support
null-aware IN predicates. As a result, our current approach ignores null
values, leading to an empty result set for this query.

To fix this bug, we’ve adjusted the logic so that predicates with
null-aware comparisons are not pushed down, ensuring the correct result
as follows:
```text
+------+------+------+------+
| id   | age  | id   | age  |
+------+------+------+------+
|    1 | NULL |    1 | NULL |
|    3 | NULL |    1 | NULL |
+------+------+------+------+
```
@suxiaogang223 suxiaogang223 deleted the fix_null_aware branch February 11, 2025 03:08
morningman pushed a commit that referenced this pull request Feb 17, 2025
### What problem does this PR solve?

Related PR: #43255

Problem Summary:
Example:
```sql
CREATE TABLE table_a (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_a VALUES
(1, null),
(2, 18),
(3, null),
(4, 25);

CREATE TABLE table_b (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_b VALUES
(1, null),
(2, null),
(3, 1000000),
(4, 100);
```
run sql
```
select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3);
```
When executing this SQL, the backend generates a runtime filter on the
table_a side during the join operation, resulting in a condition like
WHERE table_a.age IN (NULL, 1000000). It’s important to note that since
<=> is a null-aware comparison operator, the IN predicate must also be
null-aware. However, the ORC predicate pushdown API does not support
null-aware IN predicates. As a result, our current approach ignores null
values, leading to an empty result set for this query.

To fix this bug, we’ve adjusted the logic so that predicates with
null-aware comparisons are not pushed down, ensuring the correct result
as follows:
```text
+------+------+------+------+
| id   | age  | id   | age  |
+------+------+------+------+
|    1 | NULL |    1 | NULL |
|    3 | NULL |    1 | NULL |
+------+------+------+------+
```
lzyy2024 pushed a commit to lzyy2024/doris that referenced this pull request Feb 21, 2025
### What problem does this PR solve?

Related PR: apache#43255

Problem Summary:
Example:
```sql
CREATE TABLE table_a (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_a VALUES
(1, null),
(2, 18),
(3, null),
(4, 25);

CREATE TABLE table_b (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_b VALUES
(1, null),
(2, null),
(3, 1000000),
(4, 100);
```
run sql
```
select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3);
```
When executing this SQL, the backend generates a runtime filter on the
table_a side during the join operation, resulting in a condition like
WHERE table_a.age IN (NULL, 1000000). It’s important to note that since
<=> is a null-aware comparison operator, the IN predicate must also be
null-aware. However, the ORC predicate pushdown API does not support
null-aware IN predicates. As a result, our current approach ignores null
values, leading to an empty result set for this query.

To fix this bug, we’ve adjusted the logic so that predicates with
null-aware comparisons are not pushed down, ensuring the correct result
as follows:
```text
+------+------+------+------+
| id   | age  | id   | age  |
+------+------+------+------+
|    1 | NULL |    1 | NULL |
|    3 | NULL |    1 | NULL |
+------+------+------+------+
```
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
### What problem does this PR solve?

Related PR: apache#43255

Problem Summary:
Example:
```sql
CREATE TABLE table_a (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_a VALUES
(1, null),
(2, 18),
(3, null),
(4, 25);

CREATE TABLE table_b (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_b VALUES
(1, null),
(2, null),
(3, 1000000),
(4, 100);
```
run sql
```
select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3);
```
When executing this SQL, the backend generates a runtime filter on the
table_a side during the join operation, resulting in a condition like
WHERE table_a.age IN (NULL, 1000000). It’s important to note that since
<=> is a null-aware comparison operator, the IN predicate must also be
null-aware. However, the ORC predicate pushdown API does not support
null-aware IN predicates. As a result, our current approach ignores null
values, leading to an empty result set for this query.

To fix this bug, we’ve adjusted the logic so that predicates with
null-aware comparisons are not pushed down, ensuring the correct result
as follows:
```text
+------+------+------+------+
| id   | age  | id   | age  |
+------+------+------+------+
|    1 | NULL |    1 | NULL |
|    3 | NULL |    1 | NULL |
+------+------+------+------+
```
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Jun 24, 2025
Related PR: apache#43255

Problem Summary:
Example:
```sql
CREATE TABLE table_a (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_a VALUES
(1, null),
(2, 18),
(3, null),
(4, 25);

CREATE TABLE table_b (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_b VALUES
(1, null),
(2, null),
(3, 1000000),
(4, 100);
```
run sql
```
select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3);
```
When executing this SQL, the backend generates a runtime filter on the
table_a side during the join operation, resulting in a condition like
WHERE table_a.age IN (NULL, 1000000). It’s important to note that since
<=> is a null-aware comparison operator, the IN predicate must also be
null-aware. However, the ORC predicate pushdown API does not support
null-aware IN predicates. As a result, our current approach ignores null
values, leading to an empty result set for this query.

To fix this bug, we’ve adjusted the logic so that predicates with
null-aware comparisons are not pushed down, ensuring the correct result
as follows:
```text
+------+------+------+------+
| id   | age  | id   | age  |
+------+------+------+------+
|    1 | NULL |    1 | NULL |
|    3 | NULL |    1 | NULL |
+------+------+------+------+
```
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Jun 25, 2025
Related PR: apache#43255

Problem Summary:
Example:
```sql
CREATE TABLE table_a (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_a VALUES
(1, null),
(2, 18),
(3, null),
(4, 25);

CREATE TABLE table_b (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_b VALUES
(1, null),
(2, null),
(3, 1000000),
(4, 100);
```
run sql
```
select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3);
```
When executing this SQL, the backend generates a runtime filter on the
table_a side during the join operation, resulting in a condition like
WHERE table_a.age IN (NULL, 1000000). It’s important to note that since
<=> is a null-aware comparison operator, the IN predicate must also be
null-aware. However, the ORC predicate pushdown API does not support
null-aware IN predicates. As a result, our current approach ignores null
values, leading to an empty result set for this query.

To fix this bug, we’ve adjusted the logic so that predicates with
null-aware comparisons are not pushed down, ensuring the correct result
as follows:
```text
+------+------+------+------+
| id   | age  | id   | age  |
+------+------+------+------+
|    1 | NULL |    1 | NULL |
|    3 | NULL |    1 | NULL |
+------+------+------+------+
```
morningman pushed a commit to suxiaogang223/doris that referenced this pull request Jun 25, 2025
Related PR: apache#43255

Problem Summary:
Example:
```sql
CREATE TABLE table_a (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_a VALUES
(1, null),
(2, 18),
(3, null),
(4, 25);

CREATE TABLE table_b (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_b VALUES
(1, null),
(2, null),
(3, 1000000),
(4, 100);
```
run sql
```
select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3);
```
When executing this SQL, the backend generates a runtime filter on the
table_a side during the join operation, resulting in a condition like
WHERE table_a.age IN (NULL, 1000000). It’s important to note that since
<=> is a null-aware comparison operator, the IN predicate must also be
null-aware. However, the ORC predicate pushdown API does not support
null-aware IN predicates. As a result, our current approach ignores null
values, leading to an empty result set for this query.

To fix this bug, we’ve adjusted the logic so that predicates with
null-aware comparisons are not pushed down, ensuring the correct result
as follows:
```text
+------+------+------+------+
| id   | age  | id   | age  |
+------+------+------+------+
|    1 | NULL |    1 | NULL |
|    3 | NULL |    1 | NULL |
+------+------+------+------+
```
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Jun 26, 2025
Related PR: apache#43255

Problem Summary:
Example:
```sql
CREATE TABLE table_a (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_a VALUES
(1, null),
(2, 18),
(3, null),
(4, 25);

CREATE TABLE table_b (
    id INT,
    age INT
) STORED AS ORC;

INSERT INTO table_b VALUES
(1, null),
(2, null),
(3, 1000000),
(4, 100);
```
run sql
```
select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3);
```
When executing this SQL, the backend generates a runtime filter on the
table_a side during the join operation, resulting in a condition like
WHERE table_a.age IN (NULL, 1000000). It’s important to note that since
<=> is a null-aware comparison operator, the IN predicate must also be
null-aware. However, the ORC predicate pushdown API does not support
null-aware IN predicates. As a result, our current approach ignores null
values, leading to an empty result set for this query.

To fix this bug, we’ve adjusted the logic so that predicates with
null-aware comparisons are not pushed down, ensuring the correct result
as follows:
```text
+------+------+------+------+
| id   | age  | id   | age  |
+------+------+------+------+
|    1 | NULL |    1 | NULL |
|    3 | NULL |    1 | NULL |
+------+------+------+------+
```
morrySnow pushed a commit that referenced this pull request Jun 27, 2025
GoGoWen pushed a commit to GoGoWen/incubator-doris that referenced this pull request Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants