Skip to content

Spark Iceberg manifest reports wrong parquet file sizes. #1980

@dmgcodevil

Description

@dmgcodevil

We are using spark iceberg and some iceberg manifest files report the wrong data file (parquet) size, it's ~ 2x larger than the actual parquet file size. The issue was found while investigating Presto Iceberg iss6369

the problem might be in ParquetWriter#length(), method

return writer.getPos() + (writeStore.isColumnFlushNeeded() ? writeStore.getBufferedSize() : 0);

maybe that's why a parquet file size in manifest > actual file size on drive

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions