[SPARK-17802] Improved caller context logging. #15377

lins05 · 2016-10-06T08:02:32Z

What changes were proposed in this pull request?

SPARK-16757 sets the hadoop CallerContext when calling hadoop/hdfs apis to make spark applications more diagnosable in hadoop/hdfs logs. However, the org.apache.hadoop.ipc.CallerContext class is only added since hadoop 2.8, which is not officially releaed yet. So each time utils.CallerContext.setCurrentContext() is called (e.g when a task is created), a "java.lang.ClassNotFoundException: org.apache.hadoop.ipc.CallerContext"
error is logged, which pollutes the spark logs when there are lots of tasks.

This patch improves this behaviour by only logging the ClassNotFoundException once.

How was this patch tested?

Existing tests.

lins05 · 2016-10-06T08:09:07Z

cc @weiqingy (who worked on SPARK-16757.)

srowen · 2016-10-06T08:14:24Z

core/src/main/scala/org/apache/spark/util/Utils.scala

Why not just Boolean?

Makes sense. I have simplied the logic.

srowen · 2016-10-06T08:14:49Z

core/src/main/scala/org/apache/spark/util/Utils.scala

This was also superfluous, just end the method with the right return value

srowen · 2016-10-06T08:45:06Z

core/src/main/scala/org/apache/spark/util/Utils.scala

Just put false at the end of the method like before. Do you really want to log the exception at info level? it looks like a problem when it's not. Nits: Builder -> builder, hadoop -> Hadoop. Don't put a space before the colons, and the new callerContextSupported flag should be private.

SparkQA · 2016-10-06T10:20:06Z

Test build #66440 has finished for PR 15377 at commit b8621ab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

Also, in the case of another non-fatal error, would you not also consider the context unsupported? it seems more like a warning than error because you can complete without it.

srowen · 2016-10-06T10:32:58Z

core/src/main/scala/org/apache/spark/util/Utils.scala

I still think you perhaps should not log the exception but just log the message (so that the caller sees what class was not found, like logInfo(s"... or later: ${e.getMessage}")

SparkQA · 2016-10-06T10:47:15Z

Test build #66443 has finished for PR 15377 at commit 3b5f9e4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-06T11:38:15Z

Test build #66447 has finished for PR 15377 at commit 112ddcb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-10-06T13:00:20Z

core/src/main/scala/org/apache/spark/util/Utils.scala

I guess I mean: set callerContextSupported = false here too right? or else you're still going to fail over and over and log it if something besides ClassNotFoundException happens. Which is probably a pretty fatal error actually, but hey.

Emm, I prefer not to suppress the non-ClassNotFound errors, because they are real errors (either on spark side or on hadoop side), compared to the ClassNotFound error, which is rather a conditional feature based on the hadoop environment.

There's some different deployment situations here.

you are running on Hadoop 2.8, want caller context. A failure here is something to mention.

you are running on Hadoop <= 2.7, don't want context or care about it. Here another stack trace is going to be a distraction; if it's not a support call then it gets added to the list of "error messages you learn to ignore". (this is my current state, BTW)

you want caller context, but are running on an incompatible version of Hadoop. Again, here, logging the CNFE makes sense.

Question is: do you need anything if the caller context is disabled? As I don't see you do. And there's a Hadoop config option hadoop.caller.context.enabled (default false), which controls that.

What about looking for the config option, if it is set going through the introspection work, reporting problems with stack traces. And if unset: don't even bother with the introspection?

Thanks for the tip about hadoop.caller.context.enabled and the suggestion, i'll do that.

SparkQA · 2016-10-06T13:38:34Z

Test build #66449 has finished for PR 15377 at commit bc2fb25.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

OK, it's a corner case anyway.

weiqingy · 2016-10-06T21:29:27Z

Thanks for proposing this PR. LGTM.

lins05 · 2016-10-07T16:42:47Z

core/src/main/scala/org/apache/spark/util/Utils.scala

@srowen I updated the code to make callerContextSupported a static variable because the usage pattern of CallerContext is new CallerContext(...).setCurrentContext()

OK, could it even be private[util]?

Makes sense, done.

SparkQA · 2016-10-07T18:57:17Z

Test build #66507 has finished for PR 15377 at commit f6557d6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-07T19:01:17Z

Test build #66506 has finished for PR 15377 at commit f1962e4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-08T03:35:17Z

Test build #66556 has finished for PR 15377 at commit dc6951d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-11T06:38:57Z

Test build #66715 has finished for PR 15377 at commit 7485ffa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-11T06:54:21Z

Test build #66719 has finished for PR 15377 at commit df28bdd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2016-10-12T06:20:44Z

core/src/main/scala/org/apache/spark/util/Utils.scala

What is the usage of this flag? I don't see any other place use it, all just setters.

It's used below, and you have commented on it :)

jerryshao · 2016-10-12T06:22:33Z

core/src/main/scala/org/apache/spark/util/Utils.scala

IIUC, code will never trap into this branch, since you intialize it as true.

Nope, when there is a ClassNotFound exception, it would be set to false.

The usage of CallerContext.callerContextSupported is quite weird to me, looks like you're implementing a singleton factory like patten to avoid re-calling this method if class not found.

Will this introduce threading issue for the state of CallerContext.callerContextSupported?

Also do we have side affect to call setCurrentContext multiple times in one JVM?

I would be in favor of avoiding such usage of global flag.

Here, it won't matter if it's called multiple times. It just means some extra work is done but the result is the same. The flag is at least local to the class here. I don't know of a better way to record this state because it is global, and properly so, as the things it depends on can't vary within one JVM.

Alright, so the main purpose to use this flag is to avoid re-executing the codes below. So AFAIK unless we can guarantee the sequentially calling of this method, otherwise callerContextSupported may introduce concurrent issue, especially in Task running code. Simultaneously started running tasks will possibly get the flag as true and executing below code, which makes this flag useless.

Also in AM/Client, it will only be called once, so this flag is not used at those JVMs.

Yes, but re-executing the code an extra doesn't do anything bad except log an additional message. We're trying to prevent it from executing a bunch of times. The flag can only go from true to false. It keeps executing the same deterministic path until something stops it. Unless you mean this really never executes more than once anyway, I don't think there's a problem right here.

If you cannot fully preventing re-executing this code, and to the worst all threads will executing the same logics again, so is it necessary enough to add such flag? For me I think it is some kind of undeterministic that will confuse the user (some tasks printed the log while others not).

The point is to avoid a bunch of exception stack traces in the log over and over. I don't think thread-safety is an issue here. If it prints twice instead of once, no big deal. Otherwise this error/warning prints on every task execution. That's not great.

jerryshao · 2016-10-12T06:25:33Z

core/src/main/scala/org/apache/spark/util/Utils.scala

Please use SparkHadoopUtils#conf.

jerryshao · 2016-10-12T06:27:02Z

core/src/main/scala/org/apache/spark/util/Utils.scala

Please use Utils#classForName.

jerryshao · 2016-10-12T06:28:49Z

core/src/main/scala/org/apache/spark/util/Utils.scala

Nit: indent is not correct, use 2 ws.

jerryshao · 2016-10-13T09:58:13Z

core/src/main/scala/org/apache/spark/util/Utils.scala

Remove this comment.

Good catch, done.

jerryshao · 2016-10-13T10:02:49Z

Another thing, do you verify it locally? Since there's no unit test to cover it.

SparkQA · 2016-10-13T11:40:59Z

Test build #66886 has finished for PR 15377 at commit c42019b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-13T11:59:45Z

Test build #66885 has finished for PR 15377 at commit 884de64.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

lins05 · 2016-10-13T12:41:51Z

Another thing, do you verify it locally? Since there's no unit test to cover it.

@jerryshao Yeah, I did test it locally to ensure the error is only logged once.

SparkQA · 2016-10-13T14:51:14Z

Test build #66891 has finished for PR 15377 at commit e83e101.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2016-10-19T08:32:44Z

core/src/main/scala/org/apache/spark/util/Utils.scala

These two values are not used at all, I think you could remove them.

jerryshao · 2016-10-19T08:36:08Z

LGTM just some minor things.

lins05 · 2016-10-19T15:20:54Z

@weiqingy Emm, then we would also add the logic of checking "hadoop.caller.context.enabled" in the test code, which makes the test code simply duplicates the code path of CallerContext.callerContextSupported, and IMHO this doesn't make much sense...

On the other hand I do agree we should test the code path, but it seems not easy to do it.

@srowen @jerryshao What do you think?

srowen

I see the issue with the tests, yeah, and it's a fair point. My personal opinion is that the test here is sufficient because one would have to reimplement the code in question to test it, and, we'll soon drop support for versions that don't support this anyway, and, the failure that this wouldn't catch is not bad

srowen · 2016-10-19T15:30:08Z

core/src/main/scala/org/apache/spark/util/Utils.scala

I don't think this needs to be lazy (?) and doesn't need to be broken out into methods. It can just be one val initialized directly upfront in one expression.

SparkQA · 2016-10-19T17:31:52Z

Test build #67201 has finished for PR 15377 at commit b3b794c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2016-10-20T04:46:14Z

core/src/main/scala/org/apache/spark/util/Utils.scala

The above code can be simplified with Try { xxxx }.isSuccess, please check scala.util.Try. Then these two methods can be merged into one expression.

SparkQA · 2016-10-20T07:35:46Z

Test build #67239 has finished for PR 15377 at commit 9cffcf3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

The rest seems OK to me.

srowen · 2016-10-20T09:11:14Z

core/src/main/scala/org/apache/spark/util/Utils.scala

Let's find a different way to wrap this; it's non-standard and hard to read the condition here. Maybe break out conf as a local val to make it simpler.

I see why you're using Try, though I wonder if we are suppressing potentially important errors this way. For example, ClassNotFoundException obviously means 'not supported' and that's normal. A fatal error should probably just propagate. But a NonFatal error could be logged with a warning? because that would be unexpected. It would still mean the caller context isn't supported rather than cause a failure.

SparkQA · 2016-10-22T04:55:42Z

Test build #67368 has finished for PR 15377 at commit af5a95b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

The rest is looking good. Let's keep this simple.

srowen · 2016-10-22T08:48:10Z

core/src/main/scala/org/apache/spark/util/Utils.scala

This doesn't need to be a separate member; it can be a local val.
Still, why have more than one member here? Initializing callerContextSupported in one block of code is sufficient and simpler.

SparkQA · 2016-10-22T19:59:47Z

Test build #67394 has finished for PR 15377 at commit c93bed3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-10-24T08:51:55Z

core/src/main/scala/org/apache/spark/util/Utils.scala

+private[util] object CallerContext extends Logging {
+  val callerContextSupported: Boolean = {
+    val conf = SparkHadoopUtil.get.conf
+    conf.getBoolean("hadoop.caller.context.enabled", false) && {


Although this construct looks funny to me, a more conventional structure would just be a little longer. I'm OK with it.

Right, I removed the local conf variable.

SparkQA · 2016-10-24T10:28:34Z

Test build #67445 has finished for PR 15377 at commit b57f009.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-24T12:36:52Z

Test build #67447 has finished for PR 15377 at commit 5acefad.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-25T12:57:48Z

Test build #3372 has finished for PR 15377 at commit 5acefad.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

lins05 · 2016-10-26T03:32:06Z

@srowen Could we get this merged since the tests are now green? Not sure why it failed previously, it just turned green without me doing anything.

srowen · 2016-10-26T12:32:04Z

Merged to master

## What changes were proposed in this pull request? [SPARK-16757](https://issues.apache.org/jira/browse/SPARK-16757) sets the hadoop `CallerContext` when calling hadoop/hdfs apis to make spark applications more diagnosable in hadoop/hdfs logs. However, the `org.apache.hadoop.ipc.CallerContext` class is only added since [hadoop 2.8](https://issues.apache.org/jira/browse/HDFS-9184), which is not officially releaed yet. So each time `utils.CallerContext.setCurrentContext()` is called (e.g [when a task is created](https://github.com/apache/spark/blob/b678e46/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L95-L96)), a "java.lang.ClassNotFoundException: org.apache.hadoop.ipc.CallerContext" error is logged, which pollutes the spark logs when there are lots of tasks. This patch improves this behaviour by only logging the `ClassNotFoundException` once. ## How was this patch tested? Existing tests. Author: Shuai Lin <linshuai2012@gmail.com> Closes apache#15377 from lins05/spark-17802-improve-callercontext-logging.

…op.caller.context.enabled` ### What changes were proposed in this pull request? This PR aims to fix `CallerContext` test by enabling `hadoop.caller.context.enabled` during tests. ### Why are the changes needed? This test case was disabled since Apache Spark 2.1.0 because `CallerContext` was supported since Apache Hadoop 2.8+. - #15377 We need to recover `CallerContext` test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49893 from dongjoon-hyun/SPARK-51164. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…op.caller.context.enabled` ### What changes were proposed in this pull request? This PR aims to fix `CallerContext` test by enabling `hadoop.caller.context.enabled` during tests. ### Why are the changes needed? This test case was disabled since Apache Spark 2.1.0 because `CallerContext` was supported since Apache Hadoop 2.8+. - #15377 We need to recover `CallerContext` test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49893 from dongjoon-hyun/SPARK-51164. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit d07b560) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…op.caller.context.enabled` ### What changes were proposed in this pull request? This PR aims to fix `CallerContext` test by enabling `hadoop.caller.context.enabled` during tests. ### Why are the changes needed? This test case was disabled since Apache Spark 2.1.0 because `CallerContext` was supported since Apache Hadoop 2.8+. - apache#15377 We need to recover `CallerContext` test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49893 from dongjoon-hyun/SPARK-51164. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 9edaf80) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

srowen requested changes Oct 6, 2016

View reviewed changes

lins05 force-pushed the spark-17802-improve-callercontext-logging branch from 489295d to 112ddcb Compare October 6, 2016 09:16

srowen requested changes Oct 6, 2016

View reviewed changes

srowen reviewed Oct 6, 2016

View reviewed changes

srowen approved these changes Oct 6, 2016

View reviewed changes

lins05 commented Oct 7, 2016

View reviewed changes

jerryshao reviewed Oct 12, 2016

View reviewed changes

lins05 force-pushed the spark-17802-improve-callercontext-logging branch from 884de64 to c42019b Compare October 13, 2016 09:47

jerryshao reviewed Oct 13, 2016

View reviewed changes

jerryshao reviewed Oct 19, 2016

View reviewed changes

srowen reviewed Oct 19, 2016

View reviewed changes

jerryshao reviewed Oct 20, 2016

View reviewed changes

srowen requested changes Oct 20, 2016

View reviewed changes

srowen requested changes Oct 22, 2016

View reviewed changes

[SPARK-17802] Improved caller context logging.

b57f009

lins05 force-pushed the spark-17802-improve-callercontext-logging branch from c93bed3 to b57f009 Compare October 24, 2016 08:46

srowen approved these changes Oct 24, 2016

View reviewed changes

CR

5acefad

asfgit closed this in 402205d Oct 26, 2016

lins05 deleted the spark-17802-improve-callercontext-logging branch October 26, 2016 17:39

dongjoon-hyun mentioned this pull request Feb 11, 2025

[SPARK-51164][CORE][TESTS] Fix CallerContext test by enabling hadoop.caller.context.enabled #49893

Closed

[SPARK-17802] Improved caller context logging. #15377

[SPARK-17802] Improved caller context logging. #15377

Uh oh!

Conversation

lins05 commented Oct 6, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

lins05 commented Oct 6, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 6, 2016

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 6, 2016

Uh oh!

SparkQA commented Oct 6, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lins05 Oct 6, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 6, 2016

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

weiqingy commented Oct 6, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 7, 2016

Uh oh!

SparkQA commented Oct 7, 2016

Uh oh!

SparkQA commented Oct 8, 2016

Uh oh!

SparkQA commented Oct 11, 2016

Uh oh!

SparkQA commented Oct 11, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

lins05 commented Oct 6, 2016 •

edited

Loading

lins05 Oct 6, 2016 •

edited

Loading

lins05 commented Oct 13, 2016 •

edited

Loading