After upgrading druid from 0.17.2 to 25.0.0 I noticed huge performance degradation in Hadoop ingestion.
The MapReduce job takes about 30 minutes, but now after its done the task performs multiple s3 renames which takes 5 hours!
Affected Version
25.0.0
Description
I run a daily Hadoop ingestion job that performs "multi" ingestion from s3 & the datasource itself.
It used to run ~30 mins in Druid 0.17.2, but after upgrading to 25.0.0 it takes 5h30m!
The MapReduce phase still takes ~30 mins, and then I see in the logs multiple messages like this which lasts for 5 hours:
2023-10-24T00:15:46,199 INFO [task-runner-0-priority-0] org.apache.druid.indexer.JobHelper - Attempting rename from [s3n://xl8druid-storage/druid/segments/reach/2023-10-22T00:00:00.000Z_2023-10-23T00:00:00.000Z/2023-10-23T18:41:39.192Z/98/index.zip.0] to [s3n://xl8druid-storage/druid/segments/reach/2023-10-22T00:00:00.000Z_2023-10-23T00:00:00.000Z/2023-10-23T18:41:39.192Z/98/index.zip]
2023-10-24T00:16:16,545 INFO [task-runner-0-priority-0] org.apache.druid.indexer.JobHelper - Attempting rename from [s3n://xl8druid-storage/druid/segments/reach/2023-10-22T00:00:00.000Z_2023-10-23T00:00:00.000Z/2023-10-23T18:41:39.192Z/99/index.zip.0] to [s3n://xl8druid-storage/druid/segments/reach/2023-10-22T00:00:00.000Z_2023-10-23T00:00:00.000Z/2023-10-23T18:41:39.192Z/99/index.zip]
After upgrading druid from 0.17.2 to 25.0.0 I noticed huge performance degradation in Hadoop ingestion.
The MapReduce job takes about 30 minutes, but now after its done the task performs multiple s3 renames which takes 5 hours!
Affected Version
25.0.0
Description
I run a daily Hadoop ingestion job that performs "multi" ingestion from s3 & the datasource itself.
It used to run ~30 mins in Druid 0.17.2, but after upgrading to 25.0.0 it takes 5h30m!
The MapReduce phase still takes ~30 mins, and then I see in the logs multiple messages like this which lasts for 5 hours: