Skip to content

Conversation

@lins05
Copy link
Contributor

@lins05 lins05 commented Oct 6, 2016

What changes were proposed in this pull request?

SPARK-16757 sets the hadoop CallerContext when calling hadoop/hdfs apis to make spark applications more diagnosable in hadoop/hdfs logs. However, the org.apache.hadoop.ipc.CallerContext class is only added since hadoop 2.8, which is not officially releaed yet. So each time utils.CallerContext.setCurrentContext() is called (e.g when a task is created), a "java.lang.ClassNotFoundException: org.apache.hadoop.ipc.CallerContext"
error is logged, which pollutes the spark logs when there are lots of tasks.

This patch improves this behaviour by only logging the ClassNotFoundException once.

How was this patch tested?

Existing tests.

@lins05
Copy link
Contributor Author

lins05 commented Oct 6, 2016

cc @weiqingy (who worked on SPARK-16757.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just Boolean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I have simplied the logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was also superfluous, just end the method with the right return value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just put false at the end of the method like before. Do you really want to log the exception at info level? it looks like a problem when it's not. Nits: Builder -> builder, hadoop -> Hadoop. Don't put a space before the colons, and the new callerContextSupported flag should be private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@lins05 lins05 force-pushed the spark-17802-improve-callercontext-logging branch from 489295d to 112ddcb Compare October 6, 2016 09:16
@SparkQA
Copy link

SparkQA commented Oct 6, 2016

Test build #66440 has finished for PR 15377 at commit b8621ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in the case of another non-fatal error, would you not also consider the context unsupported? it seems more like a warning than error because you can complete without it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think you perhaps should not log the exception but just log the message (so that the caller sees what class was not found, like logInfo(s"... or later: ${e.getMessage}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SparkQA
Copy link

SparkQA commented Oct 6, 2016

Test build #66443 has finished for PR 15377 at commit 3b5f9e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 6, 2016

Test build #66447 has finished for PR 15377 at commit 112ddcb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I mean: set callerContextSupported = false here too right? or else you're still going to fail over and over and log it if something besides ClassNotFoundException happens. Which is probably a pretty fatal error actually, but hey.

Copy link
Contributor Author

@lins05 lins05 Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm, I prefer not to suppress the non-ClassNotFound errors, because they are real errors (either on spark side or on hadoop side), compared to the ClassNotFound error, which is rather a conditional feature based on the hadoop environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some different deployment situations here.

  1. you are running on Hadoop 2.8, want caller context. A failure here is something to mention.
  2. you are running on Hadoop <= 2.7, don't want context or care about it. Here another stack trace is going to be a distraction; if it's not a support call then it gets added to the list of "error messages you learn to ignore". (this is my current state, BTW)
  3. you want caller context, but are running on an incompatible version of Hadoop. Again, here, logging the CNFE makes sense.

Question is: do you need anything if the caller context is disabled? As I don't see you do. And there's a Hadoop config option hadoop.caller.context.enabled (default false), which controls that.

What about looking for the config option, if it is set going through the introspection work, reporting problems with stack traces. And if unset: don't even bother with the introspection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip about hadoop.caller.context.enabled and the suggestion, i'll do that.

@SparkQA
Copy link

SparkQA commented Oct 6, 2016

Test build #66449 has finished for PR 15377 at commit bc2fb25.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it's a corner case anyway.

@weiqingy
Copy link
Contributor

weiqingy commented Oct 6, 2016

Thanks for proposing this PR. LGTM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen I updated the code to make callerContextSupported a static variable because the usage pattern of CallerContext is new CallerContext(...).setCurrentContext()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, could it even be private[util]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done.

@SparkQA
Copy link

SparkQA commented Oct 7, 2016

Test build #66507 has finished for PR 15377 at commit f6557d6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 7, 2016

Test build #66506 has finished for PR 15377 at commit f1962e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 8, 2016

Test build #66556 has finished for PR 15377 at commit dc6951d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 11, 2016

Test build #66715 has finished for PR 15377 at commit 7485ffa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 11, 2016

Test build #66719 has finished for PR 15377 at commit df28bdd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the usage of this flag? I don't see any other place use it, all just setters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used below, and you have commented on it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, code will never trap into this branch, since you intialize it as true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, when there is a ClassNotFound exception, it would be set to false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage of CallerContext.callerContextSupported is quite weird to me, looks like you're implementing a singleton factory like patten to avoid re-calling this method if class not found.

Will this introduce threading issue for the state of CallerContext.callerContextSupported?

Also do we have side affect to call setCurrentContext multiple times in one JVM?

I would be in favor of avoiding such usage of global flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, it won't matter if it's called multiple times. It just means some extra work is done but the result is the same. The flag is at least local to the class here. I don't know of a better way to record this state because it is global, and properly so, as the things it depends on can't vary within one JVM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, so the main purpose to use this flag is to avoid re-executing the codes below. So AFAIK unless we can guarantee the sequentially calling of this method, otherwise callerContextSupported may introduce concurrent issue, especially in Task running code. Simultaneously started running tasks will possibly get the flag as true and executing below code, which makes this flag useless.

Also in AM/Client, it will only be called once, so this flag is not used at those JVMs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but re-executing the code an extra doesn't do anything bad except log an additional message. We're trying to prevent it from executing a bunch of times. The flag can only go from true to false. It keeps executing the same deterministic path until something stops it. Unless you mean this really never executes more than once anyway, I don't think there's a problem right here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you cannot fully preventing re-executing this code, and to the worst all threads will executing the same logics again, so is it necessary enough to add such flag? For me I think it is some kind of undeterministic that will confuse the user (some tasks printed the log while others not).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is to avoid a bunch of exception stack traces in the log over and over. I don't think thread-safety is an issue here. If it prints twice instead of once, no big deal. Otherwise this error/warning prints on every task execution. That's not great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use SparkHadoopUtils#conf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Utils#classForName.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: indent is not correct, use 2 ws.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@lins05 lins05 force-pushed the spark-17802-improve-callercontext-logging branch from 884de64 to c42019b Compare October 13, 2016 09:47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, done.

@jerryshao
Copy link
Contributor

Another thing, do you verify it locally? Since there's no unit test to cover it.

@SparkQA
Copy link

SparkQA commented Oct 13, 2016

Test build #66886 has finished for PR 15377 at commit c42019b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 13, 2016

Test build #66885 has finished for PR 15377 at commit 884de64.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lins05
Copy link
Contributor Author

lins05 commented Oct 13, 2016

Another thing, do you verify it locally? Since there's no unit test to cover it.

@jerryshao Yeah, I did test it locally to ensure the error is only logged once.

@SparkQA
Copy link

SparkQA commented Oct 13, 2016

Test build #66891 has finished for PR 15377 at commit e83e101.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two values are not used at all, I think you could remove them.

@jerryshao
Copy link
Contributor

LGTM just some minor things.

@lins05
Copy link
Contributor Author

lins05 commented Oct 19, 2016

@weiqingy Emm, then we would also add the logic of checking "hadoop.caller.context.enabled" in the test code, which makes the test code simply duplicates the code path of CallerContext.callerContextSupported, and IMHO this doesn't make much sense...

On the other hand I do agree we should test the code path, but it seems not easy to do it.

@srowen @jerryshao What do you think?

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the issue with the tests, yeah, and it's a fair point. My personal opinion is that the test here is sufficient because one would have to reimplement the code in question to test it, and, we'll soon drop support for versions that don't support this anyway, and, the failure that this wouldn't catch is not bad

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be lazy (?) and doesn't need to be broken out into methods. It can just be one val initialized directly upfront in one expression.

@SparkQA
Copy link

SparkQA commented Oct 19, 2016

Test build #67201 has finished for PR 15377 at commit b3b794c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above code can be simplified with Try { xxxx }.isSuccess, please check scala.util.Try. Then these two methods can be merged into one expression.

@SparkQA
Copy link

SparkQA commented Oct 20, 2016

Test build #67239 has finished for PR 15377 at commit 9cffcf3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest seems OK to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's find a different way to wrap this; it's non-standard and hard to read the condition here. Maybe break out conf as a local val to make it simpler.

I see why you're using Try, though I wonder if we are suppressing potentially important errors this way. For example, ClassNotFoundException obviously means 'not supported' and that's normal. A fatal error should probably just propagate. But a NonFatal error could be logged with a warning? because that would be unexpected. It would still mean the caller context isn't supported rather than cause a failure.

@SparkQA
Copy link

SparkQA commented Oct 22, 2016

Test build #67368 has finished for PR 15377 at commit af5a95b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest is looking good. Let's keep this simple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be a separate member; it can be a local val.
Still, why have more than one member here? Initializing callerContextSupported in one block of code is sufficient and simpler.

@SparkQA
Copy link

SparkQA commented Oct 22, 2016

Test build #67394 has finished for PR 15377 at commit c93bed3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lins05 lins05 force-pushed the spark-17802-improve-callercontext-logging branch from c93bed3 to b57f009 Compare October 24, 2016 08:46
private[util] object CallerContext extends Logging {
val callerContextSupported: Boolean = {
val conf = SparkHadoopUtil.get.conf
conf.getBoolean("hadoop.caller.context.enabled", false) && {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this construct looks funny to me, a more conventional structure would just be a little longer. I'm OK with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I removed the local conf variable.

@SparkQA
Copy link

SparkQA commented Oct 24, 2016

Test build #67445 has finished for PR 15377 at commit b57f009.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 24, 2016

Test build #67447 has finished for PR 15377 at commit 5acefad.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 25, 2016

Test build #3372 has finished for PR 15377 at commit 5acefad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lins05
Copy link
Contributor Author

lins05 commented Oct 26, 2016

@srowen Could we get this merged since the tests are now green? Not sure why it failed previously, it just turned green without me doing anything.

@srowen
Copy link
Member

srowen commented Oct 26, 2016

Merged to master

@asfgit asfgit closed this in 402205d Oct 26, 2016
@lins05 lins05 deleted the spark-17802-improve-callercontext-logging branch October 26, 2016 17:39
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
## What changes were proposed in this pull request?

[SPARK-16757](https://issues.apache.org/jira/browse/SPARK-16757) sets the hadoop `CallerContext` when calling hadoop/hdfs apis to make spark applications more diagnosable in hadoop/hdfs logs. However, the `org.apache.hadoop.ipc.CallerContext` class is only added since [hadoop 2.8](https://issues.apache.org/jira/browse/HDFS-9184), which is not officially releaed yet. So each time `utils.CallerContext.setCurrentContext()` is called (e.g [when a task is created](https://github.com/apache/spark/blob/b678e46/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L95-L96)), a "java.lang.ClassNotFoundException: org.apache.hadoop.ipc.CallerContext"
error is logged, which pollutes the spark logs when there are lots of tasks.

This patch improves this behaviour by only logging the `ClassNotFoundException` once.

## How was this patch tested?

Existing tests.

Author: Shuai Lin <linshuai2012@gmail.com>

Closes apache#15377 from lins05/spark-17802-improve-callercontext-logging.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

[SPARK-16757](https://issues.apache.org/jira/browse/SPARK-16757) sets the hadoop `CallerContext` when calling hadoop/hdfs apis to make spark applications more diagnosable in hadoop/hdfs logs. However, the `org.apache.hadoop.ipc.CallerContext` class is only added since [hadoop 2.8](https://issues.apache.org/jira/browse/HDFS-9184), which is not officially releaed yet. So each time `utils.CallerContext.setCurrentContext()` is called (e.g [when a task is created](https://github.com/apache/spark/blob/b678e46/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L95-L96)), a "java.lang.ClassNotFoundException: org.apache.hadoop.ipc.CallerContext"
error is logged, which pollutes the spark logs when there are lots of tasks.

This patch improves this behaviour by only logging the `ClassNotFoundException` once.

## How was this patch tested?

Existing tests.

Author: Shuai Lin <linshuai2012@gmail.com>

Closes apache#15377 from lins05/spark-17802-improve-callercontext-logging.
dongjoon-hyun added a commit that referenced this pull request Feb 11, 2025
…op.caller.context.enabled`

### What changes were proposed in this pull request?

This PR aims to fix `CallerContext` test by enabling `hadoop.caller.context.enabled` during tests.

### Why are the changes needed?

This test case was disabled since Apache Spark 2.1.0 because `CallerContext` was supported since Apache Hadoop 2.8+.
- #15377

We need to recover `CallerContext` test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49893 from dongjoon-hyun/SPARK-51164.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Feb 11, 2025
…op.caller.context.enabled`

### What changes were proposed in this pull request?

This PR aims to fix `CallerContext` test by enabling `hadoop.caller.context.enabled` during tests.

### Why are the changes needed?

This test case was disabled since Apache Spark 2.1.0 because `CallerContext` was supported since Apache Hadoop 2.8+.
- #15377

We need to recover `CallerContext` test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49893 from dongjoon-hyun/SPARK-51164.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit d07b560)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Feb 11, 2025
…op.caller.context.enabled`

### What changes were proposed in this pull request?

This PR aims to fix `CallerContext` test by enabling `hadoop.caller.context.enabled` during tests.

### Why are the changes needed?

This test case was disabled since Apache Spark 2.1.0 because `CallerContext` was supported since Apache Hadoop 2.8+.
- #15377

We need to recover `CallerContext` test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49893 from dongjoon-hyun/SPARK-51164.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit d07b560)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
…op.caller.context.enabled`

### What changes were proposed in this pull request?

This PR aims to fix `CallerContext` test by enabling `hadoop.caller.context.enabled` during tests.

### Why are the changes needed?

This test case was disabled since Apache Spark 2.1.0 because `CallerContext` was supported since Apache Hadoop 2.8+.
- apache#15377

We need to recover `CallerContext` test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#49893 from dongjoon-hyun/SPARK-51164.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 9edaf80)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants