Skip to content

[Bug] where条件带in的子查询过滤出现少数据问题 #29443

@guanghuan96

Description

@guanghuan96

Search before asking

  • I had searched in the issues and found no similar issues.

Version

2.0.3

What's Wrong?

完整sql语句如下:

select 
      count(distinct employee_no) as employee_no   
from retail_dim.dim_ehr_org_staff_position_all_1229 
where employee_no in (
    select distinct employee_no  
    from retail_dim.dim_ehr_org_staff_position_all_1229
    where office_start_date = '2023-12-01'
) 

-- 问题描述:
where条件里的子查询查出的结果集条数为:2012条
select count(distinct employee_no)
from retail_dim.dim_ehr_org_staff_position_all_1229
where office_start_date = '2023-12-01'

但是外面整个语句查询数量却少了一半,只有918条
select
count(distinct employee_no)
from retail_dim.dim_ehr_org_staff_position_all_1229
where employee_no in (
select distinct employee_no
from retail_dim.dim_ehr_org_staff_position_all_1229
where office_start_date = '2023-12-01'
)

CREATE TABLE `dim_ehr_org_staff_position_all_1229` (
  `employee_no` char(20) NULL COMMENT '员工工号',
  `organ_new_no` char(20) NULL COMMENT '组织编码',
  `position_no` char(10) NULL COMMENT '岗位编号',
  `office_start_date` date NULL COMMENT '任职开始日期',
  `office_status` tinyint(4) NULL COMMENT '在本组织任职状态(1:在职,0:失效)',
  `employee_name` char(50) NULL COMMENT '员工姓名',
  `parent_employee_no` char(10) NULL COMMENT '上级领导编码',
  `parent_employee_name` varchar(50) NULL COMMENT '上级领导姓名',
  `unit_type` char(20) NULL COMMENT '组织类型',
  `organ_new_name` varchar(200) NULL COMMENT '组织名称',
  `position_name` char(50) NULL COMMENT '岗位名称',
  `position_type` tinyint(4) NULL COMMENT '岗位类型(01:管理线,02:陈列线)',
  `brand_unit_no` char(20) NULL COMMENT '品牌部编码',
  `region_no` char(10) NULL COMMENT '小区编码',
  `region_name` char(30) NULL COMMENT '小区名称',
  `office_end_date` date NULL COMMENT '任职结束日期',
  `update_time` datetime NULL COMMENT '更新时间'
) ENGINE=OLAP
UNIQUE KEY(`employee_no`, `organ_new_no`, `position_no`, `office_start_date`, `office_status`)
COMMENT '员工任职拉链表'
DISTRIBUTED BY HASH(`employee_no`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 3",
"is_being_synced" = "false",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false"
);

-- 执行计划如下:
PLAN FRAGMENT 0
OUTPUT EXPRS:
employee_no[#46]
PARTITION: UNPARTITIONED

HAS_COLO_PLAN_NODE: false

VRESULT SINK

809:VAGGREGATE (merge finalize)
| output: count(employee_no[#45])[#46]
| group by:
| cardinality=1
|
806:VEXCHANGE
offset: 0

PLAN FRAGMENT 1

PARTITION: HASH_PARTITIONED: employee_no[#21]

HAS_COLO_PLAN_NODE: true

STREAM DATA SINK
EXCHANGE ID: 806
UNPARTITIONED

803:VAGGREGATE (update serialize)
| output: partial_count(employee_no[#44])[#45]
| group by:
| cardinality=1
|
800:VAGGREGATE (merge serialize)
| group by: employee_no[#43]
| cardinality=42,546
|
797:VAGGREGATE (update serialize)
| group by: employee_no[#42]
| cardinality=85,092
|
794:VHASH JOIN
| join op: LEFT SEMI JOIN(COLOCATE[])[]
| equal join conjunct: employee_no[#40] = employee_no[#20]
| runtime filters: RF000[in_or_bloom] <- employee_no#20
| cardinality=170,185
| vec output tuple id: 6
| vIntermediate tuple ids: 5
| hash output slot ids: 40
|
|----791:VAGGREGATE (update finalize)
| | group by: employee_no[#19]
| | cardinality=21,273
| |
| 781:VOlapScanNode
| TABLE: default_cluster:retail_dim.dim_ehr_org_staff_position_all_1229(dim_ehr_org_staff_position_all_1229), PREAGGREGATION: OFF. Reason: DORIS_DELETE_SIGN is used as conjuncts.
| PREDICATES: office_start_date[#3] = '2023-12-01' AND DORIS_DELETE_SIGN[#17] = 0
| partitions=1/1 (dim_ehr_org_staff_position_all_1229), tablets=1/1, tabletList=4534320
| cardinality=170185, avgRowSize=121.58043, numNodes=3
| pushAggOp=NONE
| projections: employee_no[#0]
| project output tuple id: 1
|
774:VOlapScanNode
TABLE: default_cluster:retail_dim.dim_ehr_org_staff_position_all_1229(dim_ehr_org_staff_position_all_1229), PREAGGREGATION: OFF. Reason: DORIS_DELETE_SIGN is used as conjuncts.
PREDICATES: DORIS_DELETE_SIGN[#38] = 0
runtime filters: RF000[in_or_bloom] -> employee_no[#21]
partitions=1/1 (dim_ehr_org_staff_position_all_1229), tablets=1/1, tabletList=4534320
cardinality=170185, avgRowSize=121.58043, numNodes=3
pushAggOp=NONE
projections: employee_no[#21]
project output tuple id: 4

What You Expected?

期望得到正确的结果:
整个sql查询出来的结果应是2012条

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions