Spark 3.5: Add utility to load table state reliably #10984

aokolnychyi · 2024-08-22T00:47:44Z

While reviewing #10288, I realized we don't have a reliable way to load Iceberg table state as Dataset in Spark. We shouldn't use load(table.name()) as it is not clear if the name already includes the catalog name. This PR extends what we currently do for metadata tables to regular tables.

spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java

karuppayya · 2024-08-22T02:03:59Z

spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java


+  public static Dataset<Row> loadTable(SparkSession spark, Table table, long snapshotId) {
+    SparkTable sparkTable = new SparkTable(table, snapshotId, false);
+    DataSourceV2Relation relation = createRelation(sparkTable, ImmutableMap.of());


The snapshotId(and timestamp) could also be supplied as an option in the future Spark versions.
Should we have an method to take options as well?

We actually bypass the resolution completely and manually create Dataset in this case.

We may need to pass options in the future but let's add that once there is a use case (we will simply overload this method).

aokolnychyi · 2024-08-23T23:39:19Z

...k/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestComputeTableStatsAction.java

          required(5, "stringCol", Types.StringType.get()));

+  @TestTemplate
+  public void testLoadingTableDirectly() {


This test would previously fail.

Suggestion: Should we move this test to org.apache.iceberg.spark.TestSparkTableUtil

I feel it belongs here as it is important to check the action can be invoked without loading tables via the Spark catalog (as that one will set the catalog name correctly).

This is the only test that goes via validationCatalog.

karuppayya

lgtm,
left a nitpick, thanks @aokolnychyi for the change and @nastra for reviewing.

aokolnychyi · 2024-08-24T00:31:29Z

Thanks, @karuppayya @nastra!

backport of apache#10984, tests can be backport in together with apache#11106

backport of apache#10984

github-actions bot added the spark label Aug 22, 2024

aokolnychyi mentioned this pull request Aug 22, 2024

Spark Action to Analyze table #10288

Merged

karuppayya reviewed Aug 22, 2024

View reviewed changes

nastra approved these changes Aug 23, 2024

View reviewed changes

Spark 3.5: Add utility to load table state reliably

e834e11

aokolnychyi force-pushed the load-spark-tables branch from 1fa595e to e834e11 Compare August 23, 2024 23:38

aokolnychyi commented Aug 23, 2024

View reviewed changes

karuppayya approved these changes Aug 24, 2024

View reviewed changes

aokolnychyi merged commit 5864850 into apache:main Aug 24, 2024

dramaticlly mentioned this pull request Sep 11, 2024

Spark 3.4: Action to compute table stats #11106

Merged

dramaticlly added a commit to dramaticlly/iceberg that referenced this pull request Sep 11, 2024

Spark 3.5: Add utility to load table state reliably

1fb5fd7

backport of apache#10984, tests can be backport in together with apache#11106

dramaticlly mentioned this pull request Sep 11, 2024

Spark 3.4: Add utility to load table state reliably #11115

Merged

dramaticlly added a commit to dramaticlly/iceberg that referenced this pull request Sep 13, 2024

Spark 3.4: Add utility to load table state reliably

a565199

backport of apache#10984

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024

Spark 3.5: Add utility to load table state reliably (apache#10984)

8636c17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 3.5: Add utility to load table state reliably #10984

Spark 3.5: Add utility to load table state reliably #10984

Uh oh!

aokolnychyi commented Aug 22, 2024 •

edited

Loading

Uh oh!

Uh oh!

karuppayya Aug 22, 2024

Uh oh!

aokolnychyi Aug 23, 2024

Uh oh!

aokolnychyi Aug 23, 2024 •

edited

Loading

Uh oh!

aokolnychyi Aug 23, 2024

Uh oh!

karuppayya Aug 24, 2024

Uh oh!

aokolnychyi Aug 24, 2024 •

edited

Loading

Uh oh!

karuppayya left a comment

Uh oh!

aokolnychyi commented Aug 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Spark 3.5: Add utility to load table state reliably #10984

Spark 3.5: Add utility to load table state reliably #10984

Uh oh!

Conversation

aokolnychyi commented Aug 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

karuppayya Aug 22, 2024

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Aug 23, 2024

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Aug 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Aug 23, 2024

Choose a reason for hiding this comment

Uh oh!

karuppayya Aug 24, 2024

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Aug 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karuppayya left a comment

Choose a reason for hiding this comment

Uh oh!

aokolnychyi commented Aug 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aokolnychyi commented Aug 22, 2024 •

edited

Loading

aokolnychyi Aug 23, 2024 •

edited

Loading

aokolnychyi Aug 24, 2024 •

edited

Loading