Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions#11075
Conversation
| { | ||
| boolean succeeded = job.run(); | ||
|
|
||
| if (!config.getSchema().getTuningConfig().isLeaveIntermediate()) { |
There was a problem hiding this comment.
Hm, would removing the cleanup here prevent HadoopDruidDetermineConfigurationJob from doing any necessary cleanup?
There was a problem hiding this comment.
Good catch! Fixed by adding a subsequent call to do cleanup in HadoopDruidDetermineConfigurationJob. Also found another place in CliInternalHadoopIndexer where we were maybe not doing necessary cleanup, and fixed in similar way.
|
|
||
| public class FileSystemHelper | ||
| { | ||
| public static FileSystem get(URI uri, Configuration conf) throws IOException |
There was a problem hiding this comment.
This class seems unnecessary
There was a problem hiding this comment.
I needed it for the test that I'm using it in. I wasnt able to mock the raw Filesystem.get routine, kept running into assist issues.
There was a problem hiding this comment.
Can you add a comment here about that?
| HadoopIngestionSpec indexerSchema) | ||
| { | ||
| HadoopDruidIndexerConfig config = HadoopDruidIndexerConfig.fromSpec(indexerSchema); | ||
| final Configuration configuration = JobHelper.injectSystemProperties(new Configuration(), config); |
There was a problem hiding this comment.
I think you could reuse this Configuration within the try block below
| } | ||
|
|
||
| public static void maybeDeleteIntermediatePath( | ||
| boolean indexerGeneratorJobSucceeded, |
There was a problem hiding this comment.
I think this should just be jobSucceeded since it's used by more than one job
| ); | ||
| } | ||
|
|
||
| public static void writeSegmentDescriptor( |
There was a problem hiding this comment.
this method also deletes and creates a file, I think the descriptor creation should also be moved into the main task (it could be handled in renameIndexFile where you have access to a FileSystem). The mappers/reducers would produce a segment file at a temp location and the main task would handle the rename->create descriptor->publish flow
There was a problem hiding this comment.
We seem to be storing the information about the segments that we publish in the descriptor file, and then read the data written to this this file / directory in main task in order to know the list of segments that were produced. If we dont create this file in the sub task / job, how will we know what segments were created?
There was a problem hiding this comment.
Changed it so that the segment descriptor for a segment is not deleted, and overwritten if it already exists, as we discussed.
|
Hm, we don't have any existing unit tests for HadoopIndexTask, so I think it'd be fine to ignore the coverage failure from that. Can you run ITHadoopIndexTest locally and check if that passes? |
Ran IT, everything passed. |
I have a suspicion that this PR may have had an impact on jdk11 execution of Indexing Modules Test. @zachjsh were you able to pass tests locally with jdk11 by chance? If so, I could be wrong here Edit: It seems to be powermock trying to access java internals from what I can tell |
…al race conditions (apache#11075)" This reverts commit a2892d9.
|
@capistrant Thanks, I've reverted that patch |
… potential race conditions (apache#11075)" (apache#11151)" This reverts commit 49a9c3f.
thanks @capistrant! I believe the recent updates I've added fix the issue. I ran both the |
* Do stuff * Do more stuff * * Do more stuff * * Do more stuff * * working * * cleanup * * more cleanup * * more cleanup * * add license header * * Add unit tests * * add java docs * * add more unit tests * * Cleanup test * * Move removing of workingPath to index task rather than in hadoop job. * * Address review comments * * remove unused import * * Address review comments * Do not overwrite segment descriptor for segment if it already exists. * * add comments to FileSystemHelper class * * fix local hadoop integration test * * Fix failing test failures when running with java11 * Revert "Revert "Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions (#11075)" (#11151)" This reverts commit 49a9c3f. * * remove JobHelperPowerMockTest * * remove FileSystemHelper class
Description
Segment index file zips are now renamed from the index task, rather than in the hadoop reduce job. When index file renaming
occurs in the hadoop reduce job, it was found that at times, the final index file would get deleted because of a race condition
between between job retries.
Manually tested this using the hadoop ingest tutorial: https://druid.apache.org/docs/latest/tutorials/tutorial-batch-hadoop.html
This PR has: