Skip to content

Hadoop segment index file rename#11194

Merged
zachjsh merged 30 commits intoapache:masterfrom
zachjsh:hadoop-segment-index-file-rename2
May 5, 2021
Merged

Hadoop segment index file rename#11194
zachjsh merged 30 commits intoapache:masterfrom
zachjsh:hadoop-segment-index-file-rename2

Conversation

@zachjsh
Copy link
Copy Markdown
Contributor

@zachjsh zachjsh commented May 4, 2021

Description

Segment index file zips are now renamed from the index task, rather than in the hadoop reduce job. When index file renaming
occurs in the hadoop reduce job, it was found that at times, the final index file would get deleted because of a race condition
between between job retries.

Manually tested this using the hadoop ingest tutorial: https://druid.apache.org/docs/latest/tutorials/tutorial-batch-hadoop.html

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

zachjsh added 29 commits April 6, 2021 20:12
@zachjsh zachjsh requested a review from jon-wei May 4, 2021 00:55
@zachjsh zachjsh changed the title Hadoop segment index file rename2 Hadoop segment index file rename May 4, 2021
* This class exists for testing purposes, see {@link JobHelperPowerMockTest}. Using the
* raw {@link FileSystem} class resulted in errors with java assist.
*/
public class FileSystemHelper
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class still needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Thanks

@zachjsh zachjsh requested a review from jon-wei May 4, 2021 05:38
@zachjsh
Copy link
Copy Markdown
Contributor Author

zachjsh commented May 4, 2021

I ran both the indexing-hadoop and indexing-service module unit tests with both java8 and java11 locally, and ran ITHadoopIndexTest locally with both java8 and java11.

Copy link
Copy Markdown
Contributor

@jon-wei jon-wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 , the JDK11 tests are passing now (see #11075), and I think the coverage failures for the HadoopIndexTask related changes can be ignored due to lack of existing tests, and you've run ITHadoopIndexTask successfully

@zachjsh zachjsh merged commit 99f39c7 into apache:master May 5, 2021
@zachjsh zachjsh deleted the hadoop-segment-index-file-rename2 branch May 5, 2021 00:22
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Dec 7, 2023

Some additional color- this was done to fix a race condition that was suspected to be something with s3 deep storage (using Hadoop's s3n adapter) that goes like:

  1. Reducer A launches
  2. Reducer A successfully uploads a segment to temp location
  3. Reducer A starts renaming the temp segment to the final location (this is really a delete of the final location if it exists, then a copy of the temp to the final, and finally deletion of the temp)
  4. Reducer A takes too long, triggering a replica Reducer B to be launched speculatively
  5. While Reducer A is still copying the temp to the final, Reducer B finishes ingesting and uploads its temp file.
  6. While Reducer A is still copying the temp to the final, Reducer B issues a delete for the final file being written by A, because is "renaming" its temp file
  7. Reducer A finishes its rename
  8. Deletion issued by Reducer B in step 6 takes effect and deletes the final segment
  9. Reducer A completes, because it finished the rename
  10. Reducer A finishes, so it's the "winning" speculative execution replica, and Reducer B gets terminated before it can finish moving its temp file to the final location, leaving just its temp file (is this how Hadoop speculative execution works?)

The intent was to fix a problem where sometimes, instead of index.zip being present in the final location, it's index.zip.1 or something like that. We do believe this fixed the problem, although the cost is that all the renames got moved to the launcher index_hadoop job rather than being spread throughout the various reducers. It would potentially slow down jobs that create large numbers of segments using large numbers of reducers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants