Skip to content

[Bug] data is lost when inserting into using nereids planner when type is inconsistent(DATETIME TO DATE) #34626

@yx-keith

Description

@yx-keith

Search before asking

  • I had searched in the issues and found no similar issues.

Version

2.0.3

What's Wrong?

when I insert data into sink_table from source_table, START_CALL_TIME filed is of DATE type in sink_table and is of DATETYPE type in source_table, the inconsistent types lead to data loss when inserting data using nereids planner, but it is ok using old planner。

this is my sql:
INSERT INTO sink_table SELECT *
FROM (
SELECT DISTINCT acct_month, call_date, device_number, OPPOSE_NUMBER, org_trm_id
, START_CALL_TIME, call_duration
FROM source_table
WHERE acct_month = SUBSTR("20240131", 1, 6) AND call_date = SUBSTR("20240131", -2)
UNION
SELECT DISTINCT acct_month, call_date, OPPOSE_NUMBER AS device_number, device_number AS OPPOSE_NUMBER, org_trm_id
, START_CALL_TIME, call_duration
FROM source_table
WHERE acct_month = SUBSTR("20240131", 1, 6) AND call_date = SUBSTR("20240131", -2)
) alias_17125550933410;

correct result is 27837957 rows in old planner, but the wrong result is 27531867 rows in nereids planner.
but when change START_CALL_TIME to DATETIME type in sink_table, nereids planner is right.

What You Expected?

is there something wrong in nereids optimizer?

How to Reproduce?

Because this problem occurs in a production environment, we can mock table to get similiar plan:

sink_table:
image

source_table:
image

data in test.csv:
10000,2017-10-01,北京,20,0,2017-10-01 06:00:00,20,10,10
10000,2017-10-01,北京,20,0,2017-10-01 07:00:00,15,2,2
10001,2017-10-01,北京,30,1,2017-10-01 17:05:45,2,22,22
10002,2017-10-02,上海,20,1,2017-10-02 12:59:12,200,5,5
10003,2017-10-02,广州,32,0,2017-10-02 11:20:00,30,11,11
10004,2017-10-01,深圳,35,0,2017-10-01 10:00:15,100,3,3
10004,2017-10-03,深圳,35,0,2017-10-03 10:20:22,11,6,6

load test.csv to source_table:
curl --location-trusted -u root: -T test.csv -H "column_separator:," http://127.0.0.1:8030/api/demo/source_table/_stream_load

insert into sink_table from source_table:
insert into sink_table
select * from (
select DISTINCT user_id, date, city, age, sex,last_visit_date,cost,max_dwell_time,min_dwell_time from source_table where city='北京'
union
select DISTINCT user_id, date, city, age, sex,last_visit_date,cost,max_dwell_time,min_dwell_time from source_table where city='深圳'
) t;

nereids plan:
image
image

old planner:
image
image
image
image
image

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions