Skip to content

S3DataSegmentPusher writes incomplete descriptor.json segment data to S3 #4170

@jerchung

Description

@jerchung

In the code for the S3DataSegmentPusher, it appears that it tries to add some data to the outSegment (size, loadSpec, binaryVersion), but then it writes the old inSegment without the added data to deep storage, meaning that the descriptor.json files found in S3 deep storage are missing information.

https://github.com/druid-io/druid/blob/druid-0.10.0-rc2/extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3DataSegmentPusher.java#L109-L123

The metadata that gets pushed to the metadata datastore is correct as it's the outSegment that gets returned from the function that gets written to the datastore.

https://github.com/druid-io/druid/blob/druid-0.10.0-rc2/extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3DataSegmentPusher.java#L140

https://github.com/druid-io/druid/blob/druid-0.10.0-rc2/server/src/main/java/io/druid/segment/realtime/plumber/RealtimePlumber.java#L430-L435

I ran into this issue because when I was using the insert-segment-to-db tool to try to copy data into an entirely new cluster, it was not able to load the segments into the new historical nodes.

If this is an actual issue, and I didn't miss something when setting up my Druid cluster, I would be happy to make the changes to S3DataSegmentPusher and S3DataSegmentPusherTest to get it working.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions