Skip to content

Lazy init of fullyQualifiedStorageDirectory in HDFS pusher#5684

Merged
b-slim merged 4 commits intoapache:masterfrom
jon-wei:lazy_hdfs_pusher
Apr 29, 2018
Merged

Lazy init of fullyQualifiedStorageDirectory in HDFS pusher#5684
b-slim merged 4 commits intoapache:masterfrom
jon-wei:lazy_hdfs_pusher

Conversation

@jon-wei
Copy link
Copy Markdown
Contributor

@jon-wei jon-wei commented Apr 24, 2018

Some users have encountered issues when running Druid 0.12.0 against a Hadoop cluster with namenode HA enabled, where the mapper encounters an UnknownHostException when it tries to interpret the logical nameservice string as a hostname (e.g.: #5552). The users encountering issue also report that their hadoop ingestion was working in Druid 0.10.0 with the same hadoop configs.

In a local test environment, I was able to run a hadoop ingestion task successfully using namenode HA (#5552 (comment)), so this issue might be a misconfiguration in some cases.

However, in another case, we observed this issue in a customer environment where the Hadoop .xml files in the Druid classpath were correctly configured for namenode HA, but the mapper failed to pick up the configurations. The root cause was unclear.

Given that the mapper does not actually need the DataSegmentPusher, this PR is a workaround that lazily evaluates the the fullyQualifiedStorageDirectory variable in HdfsDataSegmentPusher, to unblock users who are encountering this issue, until a more complete understanding of the issue is reached.

I suspect that these users did not encounter issues in 0.10.0 because in that version HadoopDruidIndexerConfig did not have the DataSegmentPusher as a field (see #4116).

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Apr 24, 2018

@jon-wei LGTM but i don't this this is a bug, i bet it is a config issue.
Also am not really sure how the Reducer will be able to pick the right configs if my memory is correct the classpath is the same.

);
}

private void initFullyQualifiedStorageDirectory()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add a comment about why this should be lazily initialized?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, forgot to push that change, added a comment

@jon-wei jon-wei removed the Bug label Apr 24, 2018
@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Apr 24, 2018

@b-slim Thanks for the review, I removed the bug label for now

private final ObjectMapper jsonMapper;
private final String fullyQualifiedStorageDirectory;
private final Path storageDir;
private String fullyQualifiedStorageDirectory;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be volatile since the object is shared amongst multiple threads.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with the memoize approach

private void initFullyQualifiedStorageDirectory()
{
try {
if (fullyQualifiedStorageDirectory == null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was a conscious decision to not synchronize this, say why in a comment. Or if it wasn't a conscious decision, consider synchronizing it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with the memoize approach

}


// We lazily initialiize fullQualifiedStorageDirectory to avoid potential issues with Hadoop namenode HA.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initialize (spelling)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@b-slim b-slim merged commit 513fab7 into apache:master Apr 29, 2018
sathishsri88 pushed a commit to sathishs/druid that referenced this pull request May 8, 2018
* Lazy init of fullyQualifiedStorageDirectory in HDFS pusher

* Comment

* Fix test

* PR comments
@dclim dclim added this to the 0.13.0 milestone Oct 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants