In the code for the S3DataSegmentPusher, it appears that it tries to add some data to the outSegment (size, loadSpec, binaryVersion), but then it writes the old inSegment without the added data to deep storage, meaning that the descriptor.json files found in S3 deep storage are missing information.
https://github.com/druid-io/druid/blob/druid-0.10.0-rc2/extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3DataSegmentPusher.java#L109-L123
The metadata that gets pushed to the metadata datastore is correct as it's the outSegment that gets returned from the function that gets written to the datastore.
https://github.com/druid-io/druid/blob/druid-0.10.0-rc2/extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3DataSegmentPusher.java#L140
https://github.com/druid-io/druid/blob/druid-0.10.0-rc2/server/src/main/java/io/druid/segment/realtime/plumber/RealtimePlumber.java#L430-L435
I ran into this issue because when I was using the insert-segment-to-db tool to try to copy data into an entirely new cluster, it was not able to load the segments into the new historical nodes.
If this is an actual issue, and I didn't miss something when setting up my Druid cluster, I would be happy to make the changes to S3DataSegmentPusher and S3DataSegmentPusherTest to get it working.
In the code for the
S3DataSegmentPusher, it appears that it tries to add some data to theoutSegment(size,loadSpec,binaryVersion), but then it writes the oldinSegmentwithout the added data to deep storage, meaning that thedescriptor.jsonfiles found in S3 deep storage are missing information.https://github.com/druid-io/druid/blob/druid-0.10.0-rc2/extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3DataSegmentPusher.java#L109-L123
The metadata that gets pushed to the metadata datastore is correct as it's the
outSegmentthat gets returned from the function that gets written to the datastore.https://github.com/druid-io/druid/blob/druid-0.10.0-rc2/extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3DataSegmentPusher.java#L140
https://github.com/druid-io/druid/blob/druid-0.10.0-rc2/server/src/main/java/io/druid/segment/realtime/plumber/RealtimePlumber.java#L430-L435
I ran into this issue because when I was using the
insert-segment-to-dbtool to try to copy data into an entirely new cluster, it was not able to load the segments into the new historical nodes.If this is an actual issue, and I didn't miss something when setting up my Druid cluster, I would be happy to make the changes to
S3DataSegmentPusherandS3DataSegmentPusherTestto get it working.