Spark 3.4: Action to compute table stats #11106

karuppayya · 2024-09-10T18:16:04Z

Backport of #10288

huaxingao · 2024-09-11T18:11:25Z

spark/v3.3/build.gradle

    implementation project(':iceberg-parquet')
    implementation project(':iceberg-arrow')
    implementation("org.scala-lang.modules:scala-collection-compat_${scalaVersion}:${libs.versions.scala.collection.compat.get()}")
+    implementation("org.apache.datasketches:datasketches-java:${libs.versions.datasketches.get()}")


Do we need to change v3.3 build.gradle?

dramaticlly · 2024-09-11T20:40:07Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/NDVSketchUtil.java

+    return spark
+        .read()
+        .format("iceberg")
+        .option(SparkReadOptions.SNAPSHOT_ID, snapshot.snapshotId())
+        .load(table.name())
+        .select(toAggColumns(colNames))
+        .first();


do we need backport #10984 for spark 3.4 as well per Anton's comment in https://github.com/apache/iceberg/pull/10288/files#r1726000959? Happy to help

dramaticlly

LGTM, looks like CI run into some transient test failure

backport of apache#10984, tests can be backport in together with apache#11106

szehon-ho · 2024-09-13T18:27:34Z

Merged, thanks @karuppayya and all for additional review.

tedyu · 2024-10-03T23:24:28Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/NDVSketchUtil.java

+        .option(SparkReadOptions.SNAPSHOT_ID, snapshot.snapshotId())
+        .load(table.name())
+        .select(toAggColumns(colNames))
+        .first();


should we consider calling .cache() before .first() ?

…n) (apache#1343) * API, Spark 3.5: Action to compute table stats (apache#10288) (cherry picked from commit 2f6e7e6) * Spark 3.4: Action to compute table stats (apache#11106) (cherry picked from commit 5582b0c) * Spark 3.4: Add utility to load table state reliably (apache#11115) (cherry picked from commit d5b21d8) * Cheery-pick data-sketches lib version chnage from apache@cbe391d#diff-697f70cdd88ba88fe77eebda60c7e143f6ad1286bca75017421e93ad84fb87df --------- Co-authored-by: Karuppayya <karuppayya1990@gmail.com> Co-authored-by: Hongyue/Steve Zhang <steveiszhy@gmail.com>

github-actions bot added spark build labels Sep 10, 2024

karuppayya force-pushed the backport_stats_collection branch from 0d30697 to eee58dd Compare September 10, 2024 18:25

singhpk234 approved these changes Sep 11, 2024

View reviewed changes

huaxingao reviewed Sep 11, 2024

View reviewed changes

krajendran4 added 2 commits September 11, 2024 12:06

Spark 3.4: Action to compute table stats

e2c7190

Fix style check: BanJUnit5Assertions

230a19b

karuppayya force-pushed the backport_stats_collection branch from 68afca0 to 230a19b Compare September 11, 2024 19:06

dramaticlly reviewed Sep 11, 2024

View reviewed changes

dramaticlly approved these changes Sep 11, 2024

View reviewed changes

dramaticlly added a commit to dramaticlly/iceberg that referenced this pull request Sep 11, 2024

Spark 3.5: Add utility to load table state reliably

1fb5fd7

backport of apache#10984, tests can be backport in together with apache#11106

dramaticlly mentioned this pull request Sep 11, 2024

Spark 3.4: Add utility to load table state reliably #11115

Merged

szehon-ho approved these changes Sep 11, 2024

View reviewed changes

karuppayya closed this Sep 12, 2024

karuppayya reopened this Sep 12, 2024

szehon-ho merged commit 5582b0c into apache:main Sep 13, 2024

tedyu reviewed Oct 3, 2024

View reviewed changes

saitharun15 mentioned this pull request Nov 13, 2024

Spark 3.4: Support Spark Column Stats #11532

Merged

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024

Spark 3.4: Action to compute table stats (apache#11106)

de12373

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 3.4: Action to compute table stats #11106

Spark 3.4: Action to compute table stats #11106

Uh oh!

karuppayya commented Sep 10, 2024 •

edited

Loading

Uh oh!

huaxingao Sep 11, 2024

Uh oh!

dramaticlly Sep 11, 2024

Uh oh!

dramaticlly left a comment

Uh oh!

szehon-ho commented Sep 13, 2024

Uh oh!

tedyu Oct 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Spark 3.4: Action to compute table stats #11106

Spark 3.4: Action to compute table stats #11106

Uh oh!

Conversation

karuppayya commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huaxingao Sep 11, 2024

Choose a reason for hiding this comment

Uh oh!

dramaticlly Sep 11, 2024

Choose a reason for hiding this comment

Uh oh!

dramaticlly left a comment

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Sep 13, 2024

Uh oh!

tedyu Oct 3, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

karuppayya commented Sep 10, 2024 •

edited

Loading