Skip to content

Seeing WARN messages indicating native execution is disabled #180

@sagarlakshmipathy

Description

@sagarlakshmipathy

Describe the bug

While running Comet with OSS Spark, I noticed warning messages on some queries indicating that Comet native execution is disabled. Wondering why that is.

Here's the execution log:

====================================================================================================
RUNNING: Query # 15 (round 1) (1 statements)
----------------------------------------------------------------------------------------------------
24/03/09 23:16:27 WARN QueryPlanSerde: Comet native execution is disabled due to: unsupported Spark expression: 'might_contain(Subquery subquery#8915, [id=#74608], xxhash64(cs_sold_date_sk#277, 42))' of class 'org.apache.spark.sql.catalyst.expressions.BloomFilterMightContain
24/03/09 23:16:27 WARN QueryPlanSerde: Comet native execution is disabled due to: unsupported Spark expression: 'might_contain(Subquery subquery#8915, [id=#74608], xxhash64(cs_sold_date_sk#277, 42))' of class 'org.apache.spark.sql.catalyst.expressions.BloomFilterMightContain
24/03/09 23:16:27 WARN DAGScheduler: Broadcasting large task binary with size 1047.8 KiB
24/03/09 23:16:33 WARN DAGScheduler: Broadcasting large task binary with size 1096.7 KiB
24/03/09 23:16:33 WARN DAGScheduler: Broadcasting large task binary with size 1143.9 KiB
24/03/09 23:16:35 WARN DAGScheduler: Broadcasting large task binary with size 1131.6 KiB
Time taken: 8596 ms                                                             
----------------------------------------------------------------------------------------------------
FINISHED: Query # 15 (round 1)
====================================================================================================

Here's the query itself

--TPC-DS Q15
select  ca_zip
       ,sum(cs_sales_price)
 from catalog_sales
     ,customer
     ,customer_address
     ,date_dim
 where cs_bill_customer_sk = c_customer_sk
 	and c_current_addr_sk = ca_address_sk 
 	and ( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475',
                                   '85392', '85460', '80348', '81792')
 	      or ca_state in ('CA','WA','GA')
 	      or cs_sales_price > 500)
 	and cs_sold_date_sk = d_date_sk
 	and d_qoy = 2 and d_year = 2002
 group by ca_zip
 order by ca_zip
 limit 100;

Regardless, I could see that the queries ran faster.

Steps to reproduce

  1. Run a TPCDS query test, maybe just for query 15

Apologies for mentioning minimal steps here. Thats all thats needed fortunately.

Expected behavior

No WARN messages

Additional context

This only happened for some queries. For example, Q46 ran without any issues.

====================================================================================================
RUNNING: Query # 46 (round 1) (1 statements)
----------------------------------------------------------------------------------------------------
Time taken: 18658 ms                                                            ]
----------------------------------------------------------------------------------------------------
FINISHED: Query # 46 (round 1)
====================================================================================================
--TPC-DS Q46
select  c_last_name
       ,c_first_name
       ,ca_city
       ,bought_city
       ,ss_ticket_number
       ,amt,profit 
 from
   (select ss_ticket_number
          ,ss_customer_sk
          ,ca_city bought_city
          ,sum(ss_coupon_amt) amt
          ,sum(ss_net_profit) profit
    from store_sales,date_dim,store,household_demographics,customer_address 
    where store_sales.ss_sold_date_sk = date_dim.d_date_sk
    and store_sales.ss_store_sk = store.s_store_sk  
    and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk
    and store_sales.ss_addr_sk = customer_address.ca_address_sk
    and (household_demographics.hd_dep_count = 3 or
         household_demographics.hd_vehicle_count= 1)
    and date_dim.d_dow in (6,0)
    and date_dim.d_year in (1999,1999+1,1999+2) 
    and store.s_city in ('Midway','Fairview','Fairview','Midway','Fairview') 
    group by ss_ticket_number,ss_customer_sk,ss_addr_sk,ca_city) dn,customer,customer_address current_addr
    where ss_customer_sk = c_customer_sk
      and customer.c_current_addr_sk = current_addr.ca_address_sk
      and current_addr.ca_city <> bought_city
  order by c_last_name
          ,c_first_name
          ,ca_city
          ,bought_city
          ,ss_ticket_number
  limit 100;

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions