Skip to content

Conversation

@vanzin
Copy link
Contributor

@vanzin vanzin commented Sep 16, 2019

This situation can happen when an external system (e.g. Oozie) generates
delegation tokens for a Spark application. The Spark driver will then run
against secured services, have proper credentials (the tokens), but no
kerberos credentials. So trying to do things that requires a kerberos
credential fails.

Instead, if no kerberos credentials are detected, just skip the whole
delegation token code.

Tested with an application that simulates Oozie; fails before the fix,
passes with the fix. Also with other DT-related tests to make sure other
functionality keeps working.

…s are available.

This situation can happen when an external system (e.g. Oozie) generates
delegation tokens for a Spark application. The Spark driver will then run
against secured services, have proper credentials (the tokens), but no
kerberos credentials. So trying to do things that requires a kerberos
credential fails.

Instead, if no kerberos credentials are detected, just skip the whole
delegation token code.

Tested with an application that simulates Oozie; fails before the fix,
passes with the fix. Also with other DT-related tests to make sure other
functionality keeps working.
Copy link
Contributor

@gaborgsomogyi gaborgsomogyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (pending tests).

@SparkQA
Copy link

SparkQA commented Sep 16, 2019

Test build #110629 has finished for PR 25805 at commit e6efb06.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Different scenario because the tests doesn't go through spark-submit,
which logs in the user with the keytab. Just need to tweak the check
so that a keytab is considered as "having kerberos creds".
@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110671 has finished for PR 25805 at commit 29b069d.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110723 has finished for PR 25805 at commit 29b069d.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Doc failure seems unrelated.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110744 has finished for PR 25805 at commit 29b069d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor

[error] running /home/jenkins/workspace/SparkPullRequestBuilder@2/build/sbt -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos test:package streaming-kinesis-asl-assembly/assembly ; process was terminated by signal 9

@gaborgsomogyi
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110775 has finished for PR 25805 at commit 29b069d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor

Maybe it's related since No UGI and Proxy user tests are failing.

@vanzin
Copy link
Contributor Author

vanzin commented Sep 17, 2019

Hmm, that's a new test IIRC, I'll take a look at what's going on.

@vanzin
Copy link
Contributor Author

vanzin commented Sep 17, 2019

The Hive tests passed locally for me. Maybe just needed a merge with master...

@vanzin
Copy link
Contributor Author

vanzin commented Sep 17, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110814 has finished for PR 25805 at commit c46d76d.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110815 has finished for PR 25805 at commit c150238.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Sep 17, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110820 has finished for PR 25805 at commit c150238.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Sep 17, 2019

retest this please

@vanzin
Copy link
Contributor Author

vanzin commented Sep 17, 2019

(Looks like maven metadata is corrupt everywhere after Jenkins was restarted...)

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110798 has finished for PR 25805 at commit c46d76d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Sep 17, 2019

Ack. Updated the wrong branch...

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110823 has finished for PR 25805 at commit c150238.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Sep 18, 2019

Ok, no more comments, I'll merge to master.

@vanzin vanzin closed this in f32f16f Sep 18, 2019
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 19, 2019

Hi, All.
This seems to break our Jenkins one profile consistently. Could you guys take a look please?

- zero-partition RDD *** FAILED ***
  java.io.IOException: Can't get Master Kerberos principal for use as renewer
  at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:134)
  at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)
  at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
  at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:216)
  at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
  at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
  at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:256)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:254)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)

After this PR, all run failed consistently since 433 ...

@dongjoon-hyun
Copy link
Member

Sorry guys. Until now, I cannot find a solution for this. I'll revert this one to unblock our JDK11 monitoring. Please make another PR and test with [test-hadoop3.2][test-java11].

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 20, 2019

FYI, all the failures are recovered and the Jenkins is continuing to the next step. If this didn't hide the other failure during the outage, I guess the Jenkins will pass.

@srowen
Copy link
Member

srowen commented Sep 20, 2019

Yeah, I found this caused tests to fail locally too, so it's at least not specific to Jenkins (or Java 11)

@dongjoon-hyun
Copy link
Member

In addition to this, one another big issue is that one Python UT failure blocks spark-master-test-sbt-hadoop-2.7 for a long time until now.

@gaborgsomogyi
Copy link
Contributor

Having a look but jdk11 has to be set up locally where I have some difficulty...

@gaborgsomogyi
Copy link
Contributor

We've found the issue with the help of @squito so going to file a PR soon with the required change...

@dongjoon-hyun
Copy link
Member

Great! Thank you, @gaborgsomogyi and @squito !

dongjoon-hyun pushed a commit that referenced this pull request Sep 24, 2019
…s are available

This PR is an enhanced version of #25805 so I've kept the original text. The problem with the original PR can be found in comment.

This situation can happen when an external system (e.g. Oozie) generates
delegation tokens for a Spark application. The Spark driver will then run
against secured services, have proper credentials (the tokens), but no
kerberos credentials. So trying to do things that requires a kerberos
credential fails.

Instead, if no kerberos credentials are detected, just skip the whole
delegation token code.

Tested with an application that simulates Oozie; fails before the fix,
passes with the fix. Also with other DT-related tests to make sure other
functionality keeps working.

Closes #25901 from gaborgsomogyi/SPARK-29082.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@vanzin vanzin deleted the SPARK-29082 branch October 23, 2019 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants