Skip to content

Conversation

@varun-lakhyani
Copy link
Contributor

@varun-lakhyani varun-lakhyani commented Dec 27, 2025

Description

Implements location overlap validation for SnapshotTableAction to prevent destination table location from overlapping with source table location.

Resolves TODO comment in SnapshotTableSparkAction.java:127 in spark v4.1.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

  • Added unit tests covering snapshot and source location overlap scenarios:
    • Exact same location
    • Source as a subdirectory of destination
    • Destination as a subdirectory of source
    • Completely non-overlapping locations
  • All CI checks pass (31/31)

Checklist

  • Implementation follows existing code patterns and Iceberg contribution guidelines
  • Code compiles without errors
  • Test coverage added
  • No breaking changes to public APIs

…ng destination table location from overlaping source table location. Resolves TODO comment in SnapshotTableSparkAction.java for spark v4.1
…ng destination table location from overlaping source table location. Resolves TODO comment in SnapshotTableSparkAction.java for spark v4.1
@github-actions github-actions bot added the spark label Dec 27, 2025
…ng destination table location from overlaping source table location. Resolves TODO comment in SnapshotTableSparkAction.java for spark v4.1
@RussellSpitzer
Copy link
Member

Definitely needs a test case, not sure why windows would make testing impossible since our locations are stored as strings

@RussellSpitzer
Copy link
Member

Iceberg errors follow this format

"Cannot X because Y (possibly fix Z)"

So in this case

Cannot create a snapshot at location ... because it would overlap with source table ... Overlapping snapshot and source would ....

@varun-lakhyani
Copy link
Contributor Author

@RussellSpitzer

Iceberg errors follow this format

"Cannot X because Y (possibly fix Z)"

So in this case

Cannot create a snapshot at location ... because it would overlap with source table ... Overlapping snapshot and source would ....

Thanks for the feedback.
Added unit tests covering overlapping snapshot and source locations (same path, parent/child cases, and non-overlapping paths) and have updated the error message to follow the Iceberg format.

All CI checks are passing now.

@varun-lakhyani
Copy link
Contributor Author

varun-lakhyani commented Dec 31, 2025

Iceberg errors follow this format

"Cannot X because Y (possibly fix Z)"

So in this case

Cannot create a snapshot at location ... because it would overlap with source table ... Overlapping snapshot and source would ....

@RussellSpitzer
I have addressed all your feedback - added tests and updated error formatting.
All CI checks passing. Would appreciate a review when you are available, Happy to make any changes if required.
Thanks

@varun-lakhyani
Copy link
Contributor Author

varun-lakhyani commented Jan 7, 2026

@RussellSpitzer PTAL

@varun-lakhyani
Copy link
Contributor Author

PTAL @pvary


// TODO: Check the dest table location does not overlap with the source table location
String sourceTableLocation = sourceTableLocation();
String actualDestTableLocation = (icebergTable.location());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe stagedTableLocation?
Also, please remove the extra parentheses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely Looks more appropriate, Will Commit with other changes.

@TestTemplate
public void testSnapshotWithOverlappingLocation() throws IOException {
String catalogType = catalogConfig.get(ICEBERG_CATALOG_TYPE);
assumeThat(catalogType).isNotEqualTo(ICEBERG_CATALOG_TYPE_HADOOP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we don't test with hadoop?

Copy link
Contributor Author

@varun-lakhyani varun-lakhyani Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we don't test with hadoop?

Hadoop doesn't allow custom destination location via .tableLocation() when done this it fails even before the check while StagedSparkTable stagedTable = stageDestTable();

Error Message:
TestSnapshotTableAction > testSnapshotWithOverlappingLocation() > catalogName = testhadoop, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hadoop, cache-enabled=false} FAILED java.lang.IllegalArgumentException: Cannot set a custom location for a path-based table. Expected file:/tmp/warehouse62334905503302693.tmp/default/table but got /tmp/junit-17415081344650343329/newJunit10343828823716601592

because of HadoopCatalog withLocation() check

public TableBuilder withLocation(String location) { Preconditions.checkArgument( location == null || location.equals(defaultLocation), "Cannot set a custom location for a path-based table. Expected " + defaultLocation + " but got " + location); return this; }

So, when we can't give custom location and TestBase predefine the warehouse using which hadoop always derives non-overlapping location the test seems useless unless we change the warehouse.
For the same reason I skipped Hadoop catalog for the tests including .tableLocation().
Do let me know If I should think otherwise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to just add a note here, "//Hadoop Catalogs do not Support Custom Table Locations"

Comment on lines 127 to 133
String validDestLocation = new Path(parentLocation, "newDestination").toUri().toString();
SparkActions.get()
.snapshotTable(SOURCE_NAME)
.as(tableName)
.tableLocation(validDestLocation)
.execute();
assertThat(sql("SELECT * FROM %s", tableName)).hasSize(2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this test here?
Should this be a different test case for the happy path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with Overlapping location tests, A Non Overlapping test should also be tested.

Yes, This test should not be in TestSnapshotWithOverlappingLocation so I have separated TestSnapshotWithNonOverlappingLocation and placed this assertion in it

@varun-lakhyani
Copy link
Contributor Author

varun-lakhyani commented Jan 9, 2026

@pvary I have addressed the reviews. Please take another look and let me know if any concerns

import java.nio.file.Files;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;
import org.apache.hadoop.fs.Path;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need Hadoop's Path class here, let's stick to Java built ins if we can

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only remaining concern here is we are using the Hadoop Path class and I"m pretty sure we don't need that dependency here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants