Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
10 changes: 5 additions & 5 deletions docs/source/contributor-guide/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,15 +108,15 @@ Note that the output files get written to `$SPARK_HOME`.
The tests can be run with:

```sh
export SPARK_HOME=`pwd` COMET_PARQUET_SCAN_IMPL=native_comet
export SPARK_HOME=`pwd`
./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-3.4 -nsu test
./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-3.5 -nsu test
./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-4.0 -nsu test
```

and
```sh
export SPARK_HOME=`pwd` COMET_PARQUET_SCAN_IMPL=native_comet
export SPARK_HOME=`pwd`
./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-3.4 -nsu test
./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-3.5 -nsu test
./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-4.0 -nsu test
Expand All @@ -126,16 +126,16 @@ If your pull request changes the query plans generated by Comet, you should rege
To regenerate the golden files, you can run the following commands.

```sh
export SPARK_HOME=`pwd` COMET_PARQUET_SCAN_IMPL=native_comet
export SPARK_HOME=`pwd`
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-3.4 -nsu test
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-3.5 -nsu test
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV1_4_PlanStabilitySuite" -Pspark-4.0 -nsu test
```

and
```sh
export SPARK_HOME=`pwd` COMET_PARQUET_SCAN_IMPL=native_comet
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -nsu test
export SPARK_HOME=`pwd`
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-3.4 -nsu test
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-3.5 -nsu test
SPARK_GENERATE_GOLDEN_FILES=1 ./mvnw -pl spark -Dsuites="org.apache.spark.sql.comet.CometTPCDSV2_7_PlanStabilitySuite" -Pspark-4.0 -nsu test
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@
: : : +- CometProject (8)
: : : +- CometBroadcastHashJoin (7)
: : : :- CometFilter (2)
: : : : +- CometScan [native_comet] parquet spark_catalog.default.store_returns (1)
: : : : +- CometScan [native_iceberg_compat] parquet spark_catalog.default.store_returns (1)
: : : +- CometBroadcastExchange (6)
: : : +- CometProject (5)
: : : +- CometFilter (4)
: : : +- CometScan [native_comet] parquet spark_catalog.default.date_dim (3)
: : : +- CometScan [native_iceberg_compat] parquet spark_catalog.default.date_dim (3)
: : +- CometBroadcastExchange (25)
: : +- CometFilter (24)
: : +- CometHashAggregate (23)
Expand All @@ -30,19 +30,19 @@
: : +- CometProject (17)
: : +- CometBroadcastHashJoin (16)
: : :- CometFilter (14)
: : : +- CometScan [native_comet] parquet spark_catalog.default.store_returns (13)
: : : +- CometScan [native_iceberg_compat] parquet spark_catalog.default.store_returns (13)
: : +- ReusedExchange (15)
: +- CometBroadcastExchange (31)
: +- CometProject (30)
: +- CometFilter (29)
: +- CometScan [native_comet] parquet spark_catalog.default.store (28)
: +- CometScan [native_iceberg_compat] parquet spark_catalog.default.store (28)
+- CometBroadcastExchange (37)
+- CometProject (36)
+- CometFilter (35)
+- CometScan [native_comet] parquet spark_catalog.default.customer (34)
+- CometScan [native_iceberg_compat] parquet spark_catalog.default.customer (34)


(1) CometScan [native_comet] parquet spark_catalog.default.store_returns
(1) CometScan [native_iceberg_compat] parquet spark_catalog.default.store_returns
Output [4]: [sr_customer_sk#1, sr_store_sk#2, sr_return_amt#3, sr_returned_date_sk#4]
Batched: true
Location: InMemoryFileIndex []
Expand All @@ -54,7 +54,7 @@ ReadSchema: struct<sr_customer_sk:int,sr_store_sk:int,sr_return_amt:decimal(7,2)
Input [4]: [sr_customer_sk#1, sr_store_sk#2, sr_return_amt#3, sr_returned_date_sk#4]
Condition : (isnotnull(sr_store_sk#2) AND isnotnull(sr_customer_sk#1))

(3) CometScan [native_comet] parquet spark_catalog.default.date_dim
(3) CometScan [native_iceberg_compat] parquet spark_catalog.default.date_dim
Output [2]: [d_date_sk#6, d_year#7]
Batched: true
Location [not included in comparison]/{warehouse_dir}/date_dim]
Expand Down Expand Up @@ -100,7 +100,7 @@ Functions [1]: [sum(UnscaledValue(sr_return_amt#3))]
Input [3]: [ctr_customer_sk#9, ctr_store_sk#10, ctr_total_return#11]
Condition : isnotnull(ctr_total_return#11)

(13) CometScan [native_comet] parquet spark_catalog.default.store_returns
(13) CometScan [native_iceberg_compat] parquet spark_catalog.default.store_returns
Output [4]: [sr_customer_sk#12, sr_store_sk#13, sr_return_amt#14, sr_returned_date_sk#15]
Batched: true
Location: InMemoryFileIndex []
Expand Down Expand Up @@ -169,7 +169,7 @@ Arguments: [ctr_store_sk#10], [ctr_store_sk#19], Inner, (cast(ctr_total_return#1
Input [5]: [ctr_customer_sk#9, ctr_store_sk#10, ctr_total_return#11, (avg(ctr_total_return) * 1.2)#23, ctr_store_sk#19]
Arguments: [ctr_customer_sk#9, ctr_store_sk#10], [ctr_customer_sk#9, ctr_store_sk#10]

(28) CometScan [native_comet] parquet spark_catalog.default.store
(28) CometScan [native_iceberg_compat] parquet spark_catalog.default.store
Output [2]: [s_store_sk#24, s_state#25]
Batched: true
Location [not included in comparison]/{warehouse_dir}/store]
Expand Down Expand Up @@ -197,7 +197,7 @@ Arguments: [ctr_store_sk#10], [s_store_sk#24], Inner, BuildRight
Input [3]: [ctr_customer_sk#9, ctr_store_sk#10, s_store_sk#24]
Arguments: [ctr_customer_sk#9], [ctr_customer_sk#9]

(34) CometScan [native_comet] parquet spark_catalog.default.customer
(34) CometScan [native_iceberg_compat] parquet spark_catalog.default.customer
Output [2]: [c_customer_sk#26, c_customer_id#27]
Batched: true
Location [not included in comparison]/{warehouse_dir}/customer]
Expand Down Expand Up @@ -239,10 +239,10 @@ BroadcastExchange (46)
+- * CometColumnarToRow (45)
+- CometProject (44)
+- CometFilter (43)
+- CometScan [native_comet] parquet spark_catalog.default.date_dim (42)
+- CometScan [native_iceberg_compat] parquet spark_catalog.default.date_dim (42)


(42) CometScan [native_comet] parquet spark_catalog.default.date_dim
(42) CometScan [native_iceberg_compat] parquet spark_catalog.default.date_dim
Output [2]: [d_date_sk#6, d_year#7]
Batched: true
Location [not included in comparison]/{warehouse_dir}/date_dim]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,19 @@ WholeStageCodegen (1)
CometProject [sr_customer_sk,sr_store_sk,sr_return_amt]
CometBroadcastHashJoin [sr_customer_sk,sr_store_sk,sr_return_amt,sr_returned_date_sk,d_date_sk]
CometFilter [sr_customer_sk,sr_store_sk,sr_return_amt,sr_returned_date_sk]
CometScan [native_comet] parquet spark_catalog.default.store_returns [sr_customer_sk,sr_store_sk,sr_return_amt,sr_returned_date_sk]
CometScan [native_iceberg_compat] parquet spark_catalog.default.store_returns [sr_customer_sk,sr_store_sk,sr_return_amt,sr_returned_date_sk]
SubqueryBroadcast [d_date_sk] #1
BroadcastExchange #2
WholeStageCodegen (1)
CometColumnarToRow
InputAdapter
CometProject [d_date_sk]
CometFilter [d_date_sk,d_year]
CometScan [native_comet] parquet spark_catalog.default.date_dim [d_date_sk,d_year]
CometScan [native_iceberg_compat] parquet spark_catalog.default.date_dim [d_date_sk,d_year]
CometBroadcastExchange [d_date_sk] #3
CometProject [d_date_sk]
CometFilter [d_date_sk,d_year]
CometScan [native_comet] parquet spark_catalog.default.date_dim [d_date_sk,d_year]
CometScan [native_iceberg_compat] parquet spark_catalog.default.date_dim [d_date_sk,d_year]
CometBroadcastExchange [(avg(ctr_total_return) * 1.2),ctr_store_sk] #4
CometFilter [(avg(ctr_total_return) * 1.2),ctr_store_sk]
CometHashAggregate [(avg(ctr_total_return) * 1.2),ctr_store_sk,sum,count,avg(ctr_total_return)]
Expand All @@ -39,14 +39,14 @@ WholeStageCodegen (1)
CometProject [sr_customer_sk,sr_store_sk,sr_return_amt]
CometBroadcastHashJoin [sr_customer_sk,sr_store_sk,sr_return_amt,sr_returned_date_sk,d_date_sk]
CometFilter [sr_customer_sk,sr_store_sk,sr_return_amt,sr_returned_date_sk]
CometScan [native_comet] parquet spark_catalog.default.store_returns [sr_customer_sk,sr_store_sk,sr_return_amt,sr_returned_date_sk]
CometScan [native_iceberg_compat] parquet spark_catalog.default.store_returns [sr_customer_sk,sr_store_sk,sr_return_amt,sr_returned_date_sk]
ReusedSubquery [d_date_sk] #1
ReusedExchange [d_date_sk] #3
CometBroadcastExchange [s_store_sk] #7
CometProject [s_store_sk]
CometFilter [s_store_sk,s_state]
CometScan [native_comet] parquet spark_catalog.default.store [s_store_sk,s_state]
CometScan [native_iceberg_compat] parquet spark_catalog.default.store [s_store_sk,s_state]
CometBroadcastExchange [c_customer_sk,c_customer_id] #8
CometProject [c_customer_id] [c_customer_sk,c_customer_id]
CometFilter [c_customer_sk,c_customer_id]
CometScan [native_comet] parquet spark_catalog.default.customer [c_customer_sk,c_customer_id]
CometScan [native_iceberg_compat] parquet spark_catalog.default.customer [c_customer_sk,c_customer_id]
Original file line number Diff line number Diff line change
Expand Up @@ -15,40 +15,40 @@ TakeOrderedAndProject (47)
: : : :- * CometColumnarToRow (12)
: : : : +- CometBroadcastHashJoin (11)
: : : : :- CometFilter (2)
: : : : : +- CometScan [native_comet] parquet spark_catalog.default.customer (1)
: : : : : +- CometScan [native_iceberg_compat] parquet spark_catalog.default.customer (1)
: : : : +- CometBroadcastExchange (10)
: : : : +- CometProject (9)
: : : : +- CometBroadcastHashJoin (8)
: : : : :- CometScan [native_comet] parquet spark_catalog.default.store_sales (3)
: : : : :- CometScan [native_iceberg_compat] parquet spark_catalog.default.store_sales (3)
: : : : +- CometBroadcastExchange (7)
: : : : +- CometProject (6)
: : : : +- CometFilter (5)
: : : : +- CometScan [native_comet] parquet spark_catalog.default.date_dim (4)
: : : : +- CometScan [native_iceberg_compat] parquet spark_catalog.default.date_dim (4)
: : : +- BroadcastExchange (18)
: : : +- * CometColumnarToRow (17)
: : : +- CometProject (16)
: : : +- CometBroadcastHashJoin (15)
: : : :- CometScan [native_comet] parquet spark_catalog.default.web_sales (13)
: : : :- CometScan [native_iceberg_compat] parquet spark_catalog.default.web_sales (13)
: : : +- ReusedExchange (14)
: : +- BroadcastExchange (25)
: : +- * CometColumnarToRow (24)
: : +- CometProject (23)
: : +- CometBroadcastHashJoin (22)
: : :- CometScan [native_comet] parquet spark_catalog.default.catalog_sales (20)
: : :- CometScan [native_iceberg_compat] parquet spark_catalog.default.catalog_sales (20)
: : +- ReusedExchange (21)
: +- BroadcastExchange (33)
: +- * CometColumnarToRow (32)
: +- CometProject (31)
: +- CometFilter (30)
: +- CometScan [native_comet] parquet spark_catalog.default.customer_address (29)
: +- CometScan [native_iceberg_compat] parquet spark_catalog.default.customer_address (29)
+- BroadcastExchange (40)
+- * CometColumnarToRow (39)
+- CometProject (38)
+- CometFilter (37)
+- CometScan [native_comet] parquet spark_catalog.default.customer_demographics (36)
+- CometScan [native_iceberg_compat] parquet spark_catalog.default.customer_demographics (36)


(1) CometScan [native_comet] parquet spark_catalog.default.customer
(1) CometScan [native_iceberg_compat] parquet spark_catalog.default.customer
Output [3]: [c_customer_sk#3, c_current_cdemo_sk#4, c_current_addr_sk#5]
Batched: true
Location [not included in comparison]/{warehouse_dir}/customer]
Expand All @@ -59,14 +59,14 @@ ReadSchema: struct<c_customer_sk:int,c_current_cdemo_sk:int,c_current_addr_sk:in
Input [3]: [c_customer_sk#3, c_current_cdemo_sk#4, c_current_addr_sk#5]
Condition : (isnotnull(c_current_addr_sk#5) AND isnotnull(c_current_cdemo_sk#4))

(3) CometScan [native_comet] parquet spark_catalog.default.store_sales
(3) CometScan [native_iceberg_compat] parquet spark_catalog.default.store_sales
Output [2]: [ss_customer_sk#6, ss_sold_date_sk#7]
Batched: true
Location: InMemoryFileIndex []
PartitionFilters: [isnotnull(ss_sold_date_sk#7), dynamicpruningexpression(ss_sold_date_sk#7 IN dynamicpruning#8)]
ReadSchema: struct<ss_customer_sk:int>

(4) CometScan [native_comet] parquet spark_catalog.default.date_dim
(4) CometScan [native_iceberg_compat] parquet spark_catalog.default.date_dim
Output [3]: [d_date_sk#9, d_year#10, d_moy#11]
Batched: true
Location [not included in comparison]/{warehouse_dir}/date_dim]
Expand Down Expand Up @@ -106,7 +106,7 @@ Arguments: [c_customer_sk#3], [ss_customer_sk#6], LeftSemi, BuildRight
(12) CometColumnarToRow [codegen id : 5]
Input [3]: [c_customer_sk#3, c_current_cdemo_sk#4, c_current_addr_sk#5]

(13) CometScan [native_comet] parquet spark_catalog.default.web_sales
(13) CometScan [native_iceberg_compat] parquet spark_catalog.default.web_sales
Output [2]: [ws_bill_customer_sk#12, ws_sold_date_sk#13]
Batched: true
Location: InMemoryFileIndex []
Expand Down Expand Up @@ -138,7 +138,7 @@ Right keys [1]: [ws_bill_customer_sk#12]
Join type: ExistenceJoin(exists#2)
Join condition: None

(20) CometScan [native_comet] parquet spark_catalog.default.catalog_sales
(20) CometScan [native_iceberg_compat] parquet spark_catalog.default.catalog_sales
Output [2]: [cs_ship_customer_sk#16, cs_sold_date_sk#17]
Batched: true
Location: InMemoryFileIndex []
Expand Down Expand Up @@ -178,7 +178,7 @@ Condition : (exists#2 OR exists#1)
Output [2]: [c_current_cdemo_sk#4, c_current_addr_sk#5]
Input [5]: [c_customer_sk#3, c_current_cdemo_sk#4, c_current_addr_sk#5, exists#2, exists#1]

(29) CometScan [native_comet] parquet spark_catalog.default.customer_address
(29) CometScan [native_iceberg_compat] parquet spark_catalog.default.customer_address
Output [2]: [ca_address_sk#20, ca_county#21]
Batched: true
Location [not included in comparison]/{warehouse_dir}/customer_address]
Expand Down Expand Up @@ -210,7 +210,7 @@ Join condition: None
Output [1]: [c_current_cdemo_sk#4]
Input [3]: [c_current_cdemo_sk#4, c_current_addr_sk#5, ca_address_sk#20]

(36) CometScan [native_comet] parquet spark_catalog.default.customer_demographics
(36) CometScan [native_iceberg_compat] parquet spark_catalog.default.customer_demographics
Output [9]: [cd_demo_sk#22, cd_gender#23, cd_marital_status#24, cd_education_status#25, cd_purchase_estimate#26, cd_credit_rating#27, cd_dep_count#28, cd_dep_employed_count#29, cd_dep_college_count#30]
Batched: true
Location [not included in comparison]/{warehouse_dir}/customer_demographics]
Expand Down Expand Up @@ -274,10 +274,10 @@ BroadcastExchange (52)
+- * CometColumnarToRow (51)
+- CometProject (50)
+- CometFilter (49)
+- CometScan [native_comet] parquet spark_catalog.default.date_dim (48)
+- CometScan [native_iceberg_compat] parquet spark_catalog.default.date_dim (48)


(48) CometScan [native_comet] parquet spark_catalog.default.date_dim
(48) CometScan [native_iceberg_compat] parquet spark_catalog.default.date_dim
Output [3]: [d_date_sk#9, d_year#10, d_moy#11]
Batched: true
Location [not included in comparison]/{warehouse_dir}/date_dim]
Expand Down
Loading
Loading