Skip to content

Conversation

@vundela
Copy link

@vundela vundela commented Oct 14, 2015

Currently log4j.properties file is not uploaded to executor's which is leading them to use the default values. This fix will make sure that file is always uploaded to distributed cache so that executor will use the latest settings.

If user specifies log configurations through --files then executors will be picking configs from --files instead of $SPARK_CONF_DIR/log4j.properties

@vundela
Copy link
Author

vundela commented Oct 14, 2015

@vanzin can you please review the pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd prefer vendor neutral vocabulary here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Mark mentioned, this can throw a NPE.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, a more scala-y way of doing this would be:

val log4jConf = oldLog4jConf.orElse(Option(getClass.getResource("/log4j.properties")).map(_.toString))

Also, it might be better to use Utils.getContextOrSparkClassLoader.getResource instead, so that it picks up the user's log4j.properties it they embed it in their app's jar.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@vanzin
Copy link
Contributor

vanzin commented Oct 15, 2015

ok to test

@vanzin
Copy link
Contributor

vanzin commented Oct 15, 2015

LGTM pending tests.

@SparkQA
Copy link

SparkQA commented Oct 15, 2015

Test build #43756 has finished for PR 9118 at commit 9669a32.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@markgrover
Copy link
Member

LGTM too, the tests seem to have passed as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be over-estimating the intended scope, but this is not the only place that log4j.properties can occur. Generally callers can set -Dlog4j.configuration=... to set its location.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but we're trying to make the default case easier. If the user is messing with log4j.configuration then he probably knows what he's doing, and this change would not affect that.

@vanzin
Copy link
Contributor

vanzin commented Oct 15, 2015

So, I thought about this a little bit more and I think it would be worth it to put that file in createConfArchive instead. That way, it's more efficient, since it's one less tiny file we're uploading to HDFS before starting the job. It also would maintain the current behavior of not overriding --files, since the configuration dir is added later in the classpath.

Could you try that? Thanks!

@SparkQA
Copy link

SparkQA commented Oct 19, 2015

Test build #43929 has finished for PR 9118 at commit 187eb99.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

This isn't really a bug as it was kind of designed this way. Originally you had to either set SPARK_LOG4J_CONF (which does SPARK_LOG4J_CONF still work properly?) or you had to upload it with the --files.

I'm definitely fine with having this happen automatically but we need to update the documentation at http://spark.apache.org/docs/latest/running-on-yarn.html. It talks about using the --files and setting the java options.

I assume --files still works because it will show up on the classpath first because it looks like it will still upload the one from the conf dir?

@vundela
Copy link
Author

vundela commented Oct 19, 2015

@tgravescs

Thanks for the review. Yes, it make sense to update the documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, you should only look at this URL if its protocol is file, otherwise even if the path matches a local file, it wouldn't be correct.

Also, style:

.map { path =>
  // code
}

@SparkQA
Copy link

SparkQA commented Oct 19, 2015

Test build #43943 has finished for PR 9118 at commit 39c75a2.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 20, 2015

Test build #43952 has finished for PR 9118 at commit e1e474a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: url => goes on the previous line.

@SparkQA
Copy link

SparkQA commented Oct 20, 2015

Test build #43958 has finished for PR 9118 at commit b97a2b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does --files work? Is it just because it gets put on classpath first?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thats correct and also in this case user knows what he is doing and his intension is to overwrite the default behavior

@tgravescs
Copy link
Contributor

@vundela Does SPARK_LOG4J_CONF still work?

@vundela
Copy link
Author

vundela commented Oct 20, 2015

@tgravescs Yes, it still works and user see a deprecated message.

@tgravescs
Copy link
Contributor

changes lgtm. @vanzin are you ok with these?

@markgrover
Copy link
Member

I have one more noob question. Let's say I have a 4 node hadoop/yarn cluster and a 5th node, I am driving a yarn-client app from. I use puppet to set up my log4j configuration (in $SPARK_CONF_DIR of each of the executor nodes) and it's set up differently on the 4 cluster nodes, vs. the 5th driver node. After this change, would the 5th node's log4j configuration always override my log4j configuration on the 4 executor nodes? Would that come as a surprise to some users?

@vanzin
Copy link
Contributor

vanzin commented Oct 20, 2015

@markgrover you can't have that scenario on YARN because when running Spark through YARN there is no "cluster configuration". Everything is configured based on the gateway node's configuration. You don't even need Spark jars available in the cluster.

The change LGTM, I'll merge to master.

@asfgit asfgit closed this in 2f6dd63 Oct 20, 2015
@markgrover
Copy link
Member

Got it, thanks!

asfgit pushed a commit that referenced this pull request Mar 31, 2016
…ally with distributed cache

## What changes were proposed in this pull request?

1. Currently log4j which uses distributed cache only adds to AM's classpath, not executor's, this is introduced in #9118, which breaks the original meaning of that PR, so here add log4j file to the classpath of both AM and executors.
2. Automatically upload metrics.properties to distributed cache, so that it could be used by remote driver and executors implicitly.

## How was this patch tested?

Unit test and integration test is done.

Author: jerryshao <sshao@hortonworks.com>

Closes #11885 from jerryshao/SPARK-14062.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants