-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11105][yarn] Distribute log4j.properties to executors #9118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@vanzin can you please review the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd prefer vendor neutral vocabulary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Mark mentioned, this can throw a NPE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, a more scala-y way of doing this would be:
val log4jConf = oldLog4jConf.orElse(Option(getClass.getResource("/log4j.properties")).map(_.toString))
Also, it might be better to use Utils.getContextOrSparkClassLoader.getResource instead, so that it picks up the user's log4j.properties it they embed it in their app's jar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
ok to test |
|
LGTM pending tests. |
|
Test build #43756 has finished for PR 9118 at commit
|
|
LGTM too, the tests seem to have passed as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be over-estimating the intended scope, but this is not the only place that log4j.properties can occur. Generally callers can set -Dlog4j.configuration=... to set its location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but we're trying to make the default case easier. If the user is messing with log4j.configuration then he probably knows what he's doing, and this change would not affect that.
|
So, I thought about this a little bit more and I think it would be worth it to put that file in Could you try that? Thanks! |
|
Test build #43929 has finished for PR 9118 at commit
|
|
This isn't really a bug as it was kind of designed this way. Originally you had to either set SPARK_LOG4J_CONF (which does SPARK_LOG4J_CONF still work properly?) or you had to upload it with the --files. I'm definitely fine with having this happen automatically but we need to update the documentation at http://spark.apache.org/docs/latest/running-on-yarn.html. It talks about using the --files and setting the java options. I assume --files still works because it will show up on the classpath first because it looks like it will still upload the one from the conf dir? |
|
Thanks for the review. Yes, it make sense to update the documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, you should only look at this URL if its protocol is file, otherwise even if the path matches a local file, it wouldn't be correct.
Also, style:
.map { path =>
// code
}
|
Test build #43943 has finished for PR 9118 at commit
|
|
Test build #43952 has finished for PR 9118 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: url => goes on the previous line.
|
Test build #43958 has finished for PR 9118 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does --files work? Is it just because it gets put on classpath first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thats correct and also in this case user knows what he is doing and his intension is to overwrite the default behavior
|
@vundela Does SPARK_LOG4J_CONF still work? |
|
@tgravescs Yes, it still works and user see a deprecated message. |
|
changes lgtm. @vanzin are you ok with these? |
|
I have one more noob question. Let's say I have a 4 node hadoop/yarn cluster and a 5th node, I am driving a yarn-client app from. I use puppet to set up my log4j configuration (in $SPARK_CONF_DIR of each of the executor nodes) and it's set up differently on the 4 cluster nodes, vs. the 5th driver node. After this change, would the 5th node's log4j configuration always override my log4j configuration on the 4 executor nodes? Would that come as a surprise to some users? |
|
@markgrover you can't have that scenario on YARN because when running Spark through YARN there is no "cluster configuration". Everything is configured based on the gateway node's configuration. You don't even need Spark jars available in the cluster. The change LGTM, I'll merge to master. |
|
Got it, thanks! |
…ally with distributed cache ## What changes were proposed in this pull request? 1. Currently log4j which uses distributed cache only adds to AM's classpath, not executor's, this is introduced in #9118, which breaks the original meaning of that PR, so here add log4j file to the classpath of both AM and executors. 2. Automatically upload metrics.properties to distributed cache, so that it could be used by remote driver and executors implicitly. ## How was this patch tested? Unit test and integration test is done. Author: jerryshao <sshao@hortonworks.com> Closes #11885 from jerryshao/SPARK-14062.
Currently log4j.properties file is not uploaded to executor's which is leading them to use the default values. This fix will make sure that file is always uploaded to distributed cache so that executor will use the latest settings.
If user specifies log configurations through --files then executors will be picking configs from --files instead of $SPARK_CONF_DIR/log4j.properties