[SPARK-14062][Yarn] Fix log4j and upload metrics.properties automatically with distributed cache #11885

jerryshao · 2016-03-22T10:07:46Z

What changes were proposed in this pull request?

Currently log4j which uses distributed cache only adds to AM's classpath, not executor's, this is introduced in [SPARK-11105][yarn] Distribute log4j.properties to executors #9118, which breaks the original meaning of that PR, so here add log4j file to the classpath of both AM and executors.
Automatically upload metrics.properties to distributed cache, so that it could be used by remote driver and executors implicitly.

How was this patch tested?

Unit test and integration test is done.

SparkQA · 2016-03-22T10:29:22Z

Test build #53763 has finished for PR 11885 at commit 6c20d37.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-23T07:35:46Z

Test build #53912 has finished for PR 11885 at commit f9cb06b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-23T07:35:57Z

Test build #53911 has finished for PR 11885 at commit 6c20d37.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2016-03-23T08:02:03Z

CC @vanzin @tgravescs please help to review, thanks a lot.

SparkQA · 2016-03-23T08:20:17Z

Test build #53918 has finished for PR 11885 at commit 260ff0e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-03-23T11:02:50Z

LGTM, FWIW. You're just uploading an additional file here and cleaning up the code.

tgravescs · 2016-03-23T14:59:18Z

yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

+    // Also uploading metrics.properties to distributed cache if exists in classpath.
+    // If user specify this file using --files then executors will use the one
+    // from --files instead.
+    for { prop <- Seq("log4j.properties", "metrics.properties")


does this break the oldLog4jConf functionality above? I think it will throw an exception if both exist.

I haven't tried yet, I will do a quick test on this.

Hi @tgravescs , I just did a quick test on this.

If oldLog4jConf points to the same log4j file under <SPARK_HOME>/conf, it will be added to distributed cache once and get a warning for the following one. If oldLog4jConf points to a different log4j file other than the default one under <SPARK_HOME>/conf, so the one under conf took precedence.

I think since SPARK_LOG4J_CONF is deprecated, so there should be no problem, and semantically still keep the consistent.

Since its deprecated and I would like to see it removed I don't think its that big of deal, but I disagree with the order if we are keeping it.

If I explicitly specify something in SPARK_LOG4J_CONF it should take precendence over anything in the <SPARK_HOME>/conf dir.

Agree with Tom, but I'd rather just remove support for that env variable now. It's basically one line of code and a warning log in this file...

Sure, I will remove the support of this env variable.

vanzin · 2016-03-24T02:30:57Z

I'd prefer if these files were uploaded inside the config archive generated by Spark, as the code you're deleting does for log4j.properties. That avoids creating more small files in HDFS and speeds things up even if a tiny bit.

Is the problem here that the archive is not distributed to executors? If so, then maybe the better solution is to do that instead.

jerryshao · 2016-03-24T02:38:19Z

@vanzin , thanks for your review. I know that putting into confArchive is a more elegant way, but here confArchive is only added to AM's classpath. I read your patch why it only adds to AM's classpath,

These are only used by the AM, since executors will use the configuration object broadcast by
the driver. The files are zipped and added to the job as an archive, so that YARN will explode
it when distributing to the AM. This directory is then added to the classpath of the AM
process, just to make sure that everybody is using the same default config.

So I'm not sure if there's any side-effect if we add this confArchive to executor's classpath.

vanzin · 2016-03-24T03:13:23Z

I don't think there's any harm in using the archive everywhere; it's currently only used in the AM mostly as an optimization, since it wasn't really used in the executors (aside from the oversight of log4j.properties).

jerryshao · 2016-03-24T03:22:23Z

My concern is about hadoop related configurations, who will take the precedence if several paths have different configurations.

vanzin · 2016-03-24T03:26:47Z

There's no "several paths". Spark will broadcast the hadoop configs before running tasks and use that in the executors, so Spark won't use whatever is in the executor's classpath anyway.

jerryshao · 2016-03-24T03:37:42Z

Thanks a lot for your explanation.

I'm not sure if I understand correctly, currently we will add <spark_home>/etc/hadoop into the classpath by default for AM and executors. And now if we add __spark_conf__ into classpath of executors, there will be another copy of hadoop conf, and we create Configuration() at executor start, which will add some specific configurations like s3 and spark.hadoop.xxx send from driver.

If the two copies, one in cluster's hadoop home and one send from client, has difference, not sure if there's any side-effect.

It's just my concern, we haven't yet met such issue.

vanzin · 2016-03-24T03:46:04Z

As I've said above, spark does not use the Hadoop configuration from the classpath in the executors. It uses the hadoop configuration broadcast from the driver.

So no matter what you add to the executor's classpath, it will not be used.

And in any case, using the configuration present in the submitting node is more correct than using whatever configuration might or might not be available on the cluster nodes, which was the whole point of uploading the configuration archive to the AM in the first place.

jerryshao · 2016-03-24T03:51:52Z

OK, thanks a lot for your explanation 😄 .

… executor

SparkQA · 2016-03-24T06:41:56Z

Test build #54011 has finished for PR 11885 at commit 6702927.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-24T07:21:54Z

Test build #54019 has finished for PR 11885 at commit b1da8e5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2016-03-25T19:47:52Z

LGTM, just need to fix the env variable thing one way or another.

jerryshao · 2016-03-28T02:59:08Z

yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala


    val statCache: Map[URI, FileStatus] = HashMap[URI, FileStatus]()

-    val oldLog4jConf = Option(System.getenv("SPARK_LOG4J_CONF"))


Here I removed the support of SPARK_LOG4J_CONF, though I already did it in #11603 , I can handle the merge conflicts.

SparkQA · 2016-03-28T03:11:58Z

Test build #54293 has finished for PR 11885 at commit ea17176.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2016-03-28T03:15:30Z

The Mima failure is not related to this patch. Jenkins, retest this please.

SparkQA · 2016-03-28T03:34:25Z

Test build #54294 has finished for PR 11885 at commit ea17176.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2016-03-29T02:17:24Z

@vazin, please help to review again, thanks a lot.

jerryshao · 2016-03-31T01:56:27Z

CC @tgravescs @vanzin , any further comment about this patch?

tgravescs · 2016-03-31T13:45:27Z

yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

@@ -545,8 +528,7 @@ private[spark] class Client(
    // Distribute an archive with Hadoop and Spark configuration for the AM.


update comment since now going everywhere

Thanks, I will update the comment.

tgravescs · 2016-03-31T13:47:06Z

minor comment update, otherwise +1

SparkQA · 2016-03-31T14:22:25Z

Test build #54627 has finished for PR 11885 at commit a619dfd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2016-03-31T17:24:31Z

LGTM, merging to master.

Upload metrics.properties automatically with distributed cache

6c20d37

jerryshao closed this Mar 22, 2016

jerryshao reopened this Mar 23, 2016

jerryshao changed the title ~~[SPARK=14062][Yarn] Upload metrics.properties automatically with distributed cache~~ [SPARK-14062][Yarn] Fix log4j and upload metrics.properties automatically with distributed cache Mar 23, 2016

Fix bug and add unit test

f9cb06b

revert the unit test since it cannot fully cover it

260ff0e

jerryshao force-pushed the SPARK-14062 branch from 93af4d9 to 260ff0e Compare March 23, 2016 08:00

tgravescs reviewed Mar 23, 2016
View reviewed changes

Add log4j and metrics properties back to conf dir, also be visible to…

6702927

… executor

Fix test compile failure

b1da8e5

Remove the support SPARK_LOG4J_CONF

ea17176

jerryshao reviewed Mar 28, 2016
View reviewed changes

tgravescs reviewed Mar 31, 2016
View reviewed changes

Fix the comment

a619dfd

asfgit closed this in 3b3cc76 Mar 31, 2016


		val statCache: Map[URI, FileStatus] = HashMap[URI, FileStatus]()

		val oldLog4jConf = Option(System.getenv("SPARK_LOG4J_CONF"))

		@@ -545,8 +528,7 @@ private[spark] class Client(
		// Distribute an archive with Hadoop and Spark configuration for the AM.

[SPARK-14062][Yarn] Fix log4j and upload metrics.properties automatically with distributed cache #11885

[SPARK-14062][Yarn] Fix log4j and upload metrics.properties automatically with distributed cache #11885

Uh oh!

Conversation

jerryshao commented Mar 22, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Mar 22, 2016

Uh oh!

SparkQA commented Mar 23, 2016

Uh oh!

SparkQA commented Mar 23, 2016

Uh oh!

jerryshao commented Mar 23, 2016

Uh oh!

SparkQA commented Mar 23, 2016

Uh oh!

srowen commented Mar 23, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanzin commented Mar 24, 2016

Uh oh!

jerryshao commented Mar 24, 2016

Uh oh!

vanzin commented Mar 24, 2016

Uh oh!

jerryshao commented Mar 24, 2016

Uh oh!

vanzin commented Mar 24, 2016

Uh oh!

jerryshao commented Mar 24, 2016

Uh oh!

vanzin commented Mar 24, 2016

Uh oh!

jerryshao commented Mar 24, 2016

Uh oh!

SparkQA commented Mar 24, 2016

Uh oh!

SparkQA commented Mar 24, 2016

Uh oh!

vanzin commented Mar 25, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 28, 2016

Uh oh!

jerryshao commented Mar 28, 2016

Uh oh!

SparkQA commented Mar 28, 2016

Uh oh!

jerryshao commented Mar 29, 2016

Uh oh!

jerryshao commented Mar 31, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tgravescs commented Mar 31, 2016

Uh oh!

SparkQA commented Mar 31, 2016

Uh oh!

vanzin commented Mar 31, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects