-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark 3.4: Action to compute table stats #11106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0d30697 to
eee58dd
Compare
spark/v3.3/build.gradle
Outdated
| implementation project(':iceberg-parquet') | ||
| implementation project(':iceberg-arrow') | ||
| implementation("org.scala-lang.modules:scala-collection-compat_${scalaVersion}:${libs.versions.scala.collection.compat.get()}") | ||
| implementation("org.apache.datasketches:datasketches-java:${libs.versions.datasketches.get()}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change v3.3 build.gradle?
68afca0 to
230a19b
Compare
| return spark | ||
| .read() | ||
| .format("iceberg") | ||
| .option(SparkReadOptions.SNAPSHOT_ID, snapshot.snapshotId()) | ||
| .load(table.name()) | ||
| .select(toAggColumns(colNames)) | ||
| .first(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need backport #10984 for spark 3.4 as well per Anton's comment in https://github.com/apache/iceberg/pull/10288/files#r1726000959? Happy to help
dramaticlly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, looks like CI run into some transient test failure
backport of apache#10984, tests can be backport in together with apache#11106
|
Merged, thanks @karuppayya and all for additional review. |
| .option(SparkReadOptions.SNAPSHOT_ID, snapshot.snapshotId()) | ||
| .load(table.name()) | ||
| .select(toAggColumns(colNames)) | ||
| .first(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we consider calling .cache() before .first() ?
…n) (apache#1343) * API, Spark 3.5: Action to compute table stats (apache#10288) (cherry picked from commit 2f6e7e6) * Spark 3.4: Action to compute table stats (apache#11106) (cherry picked from commit 5582b0c) * Spark 3.4: Add utility to load table state reliably (apache#11115) (cherry picked from commit d5b21d8) * Cheery-pick data-sketches lib version chnage from apache@cbe391d#diff-697f70cdd88ba88fe77eebda60c7e143f6ad1286bca75017421e93ad84fb87df --------- Co-authored-by: Karuppayya <karuppayya1990@gmail.com> Co-authored-by: Hongyue/Steve Zhang <steveiszhy@gmail.com>
Backport of #10288
cc: @aokolnychyi @szehon-ho