Skip to content

Conversation

@karuppayya
Copy link
Contributor

@karuppayya karuppayya commented Sep 10, 2024

Backport of #10288

cc: @aokolnychyi @szehon-ho

implementation project(':iceberg-parquet')
implementation project(':iceberg-arrow')
implementation("org.scala-lang.modules:scala-collection-compat_${scalaVersion}:${libs.versions.scala.collection.compat.get()}")
implementation("org.apache.datasketches:datasketches-java:${libs.versions.datasketches.get()}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change v3.3 build.gradle?

@karuppayya karuppayya force-pushed the backport_stats_collection branch from 68afca0 to 230a19b Compare September 11, 2024 19:06
Comment on lines +76 to +82
return spark
.read()
.format("iceberg")
.option(SparkReadOptions.SNAPSHOT_ID, snapshot.snapshotId())
.load(table.name())
.select(toAggColumns(colNames))
.first();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need backport #10984 for spark 3.4 as well per Anton's comment in https://github.com/apache/iceberg/pull/10288/files#r1726000959? Happy to help

Copy link
Contributor

@dramaticlly dramaticlly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, looks like CI run into some transient test failure

dramaticlly added a commit to dramaticlly/iceberg that referenced this pull request Sep 11, 2024
backport of apache#10984, tests can be backport in together with apache#11106
@karuppayya karuppayya closed this Sep 12, 2024
@karuppayya karuppayya reopened this Sep 12, 2024
@szehon-ho szehon-ho merged commit 5582b0c into apache:main Sep 13, 2024
@szehon-ho
Copy link
Member

Merged, thanks @karuppayya and all for additional review.

.option(SparkReadOptions.SNAPSHOT_ID, snapshot.snapshotId())
.load(table.name())
.select(toAggColumns(colNames))
.first();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider calling .cache() before .first() ?

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
parthchandra pushed a commit to parthchandra/iceberg that referenced this pull request Oct 22, 2025
…n) (apache#1343)

* API, Spark 3.5: Action to compute table stats (apache#10288)

(cherry picked from commit 2f6e7e6)

* Spark 3.4: Action to compute table stats (apache#11106)

(cherry picked from commit 5582b0c)

* Spark 3.4: Add utility to load table state reliably (apache#11115)

(cherry picked from commit d5b21d8)

* Cheery-pick data-sketches lib version chnage from
apache@cbe391d#diff-697f70cdd88ba88fe77eebda60c7e143f6ad1286bca75017421e93ad84fb87df

---------

Co-authored-by: Karuppayya <karuppayya1990@gmail.com>
Co-authored-by: Hongyue/Steve Zhang <steveiszhy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants