-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Search before asking
- I had searched in the issues and found no similar issues.
Version
2.0.3
What's Wrong?
when I insert data into sink_table from source_table, START_CALL_TIME filed is of DATE type in sink_table and is of DATETYPE type in source_table, the inconsistent types lead to data loss when inserting data using nereids planner, but it is ok using old planner。
this is my sql:
INSERT INTO sink_table SELECT *
FROM (
SELECT DISTINCT acct_month, call_date, device_number, OPPOSE_NUMBER, org_trm_id
, START_CALL_TIME, call_duration
FROM source_table
WHERE acct_month = SUBSTR("20240131", 1, 6) AND call_date = SUBSTR("20240131", -2)
UNION
SELECT DISTINCT acct_month, call_date, OPPOSE_NUMBER AS device_number, device_number AS OPPOSE_NUMBER, org_trm_id
, START_CALL_TIME, call_duration
FROM source_table
WHERE acct_month = SUBSTR("20240131", 1, 6) AND call_date = SUBSTR("20240131", -2)
) alias_17125550933410;
correct result is 27837957 rows in old planner, but the wrong result is 27531867 rows in nereids planner.
but when change START_CALL_TIME to DATETIME type in sink_table, nereids planner is right.
What You Expected?
is there something wrong in nereids optimizer?
How to Reproduce?
Because this problem occurs in a production environment, we can mock table to get similiar plan:
data in test.csv:
10000,2017-10-01,北京,20,0,2017-10-01 06:00:00,20,10,10
10000,2017-10-01,北京,20,0,2017-10-01 07:00:00,15,2,2
10001,2017-10-01,北京,30,1,2017-10-01 17:05:45,2,22,22
10002,2017-10-02,上海,20,1,2017-10-02 12:59:12,200,5,5
10003,2017-10-02,广州,32,0,2017-10-02 11:20:00,30,11,11
10004,2017-10-01,深圳,35,0,2017-10-01 10:00:15,100,3,3
10004,2017-10-03,深圳,35,0,2017-10-03 10:20:22,11,6,6
load test.csv to source_table:
curl --location-trusted -u root: -T test.csv -H "column_separator:," http://127.0.0.1:8030/api/demo/source_table/_stream_load
insert into sink_table from source_table:
insert into sink_table
select * from (
select DISTINCT user_id, date, city, age, sex,last_visit_date,cost,max_dwell_time,min_dwell_time from source_table where city='北京'
union
select DISTINCT user_id, date, city, age, sex,last_visit_date,cost,max_dwell_time,min_dwell_time from source_table where city='深圳'
) t;
Anything Else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct








