Summary
In format-version 3, SparkWriteConf.deleteFileFormat() unconditionally returns FileFormat.PUFFIN for all non-metadata tables, bypassing the FormatModel API entirely for position deletes:
// SparkWriteConf.java
public FileFormat deleteFileFormat() {
if (!(table instanceof BaseMetadataTable) && TableUtil.formatVersion(table) >= 3) {
return FileFormat.PUFFIN; // hardcoded, no table property override
}
// v2: reads write.delete.format.default → delegates to FormatModel
}
This means FormatModelRegistry.positionDeleteWriteBuilder() and any registered FormatModel<PositionDelete<?>, Void> implementations are dead code for v3 tables. SparkPositionDeltaWrite.newDeleteWriter() routes directly to PartitioningDVWriter → BaseDVFileWriter → Puffin, with no way to override.
Question
Was this intentional to simplify the v3 spec, or is there room for the deletion vector format to be pluggable via FormatModel (or a similar registry)?
Context
We're building a custom storage format that integrates with Iceberg via the FormatModel API. For v2, we registered a FormatModel<PositionDelete<?>, Void> that writes position deletes in our native format — this works correctly.
When we upgraded to v3 for row lineage, we discovered that our delete FormatModel is never called. The Puffin DV path works fine (Iceberg handles it internally), but it means:
- Custom formats cannot control the delete file representation in v3
- The
FormatModel<PositionDelete<?>, Void> registration pattern that works for v2 silently becomes unused
- There's no way to opt into position delete files (via FormatModel) instead of DVs for v3 tables, even via table properties
Possible approaches
- Status quo: DVs are always Puffin in v3. Document that
FormatModel<PositionDelete<?>, Void> is v2-only.
- Make it configurable: Allow
write.delete.format.default to override the DV format in v3, similar to how it works in v2. Fall back to Puffin if no override is set.
- Extend FormatModel for DVs: Add a DV-aware FormatModel variant that custom formats can implement.
Happy to hear the rationale for the current design.
Summary
In format-version 3,
SparkWriteConf.deleteFileFormat()unconditionally returnsFileFormat.PUFFINfor all non-metadata tables, bypassing the FormatModel API entirely for position deletes:This means
FormatModelRegistry.positionDeleteWriteBuilder()and any registeredFormatModel<PositionDelete<?>, Void>implementations are dead code for v3 tables.SparkPositionDeltaWrite.newDeleteWriter()routes directly toPartitioningDVWriter→BaseDVFileWriter→ Puffin, with no way to override.Question
Was this intentional to simplify the v3 spec, or is there room for the deletion vector format to be pluggable via FormatModel (or a similar registry)?
Context
We're building a custom storage format that integrates with Iceberg via the FormatModel API. For v2, we registered a
FormatModel<PositionDelete<?>, Void>that writes position deletes in our native format — this works correctly.When we upgraded to v3 for row lineage, we discovered that our delete FormatModel is never called. The Puffin DV path works fine (Iceberg handles it internally), but it means:
FormatModel<PositionDelete<?>, Void>registration pattern that works for v2 silently becomes unusedPossible approaches
FormatModel<PositionDelete<?>, Void>is v2-only.write.delete.format.defaultto override the DV format in v3, similar to how it works in v2. Fall back to Puffin if no override is set.Happy to hear the rationale for the current design.