-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-1825] Fixes cross-platform submit problem #899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -32,6 +32,7 @@ import org.apache.hadoop.mapreduce.MRJobConfig | |
| import org.apache.hadoop.security.UserGroupInformation | ||
| import org.apache.hadoop.util.StringUtils | ||
| import org.apache.hadoop.yarn.api._ | ||
| import org.apache.hadoop.yarn.api.ApplicationConstants | ||
| import org.apache.hadoop.yarn.api.ApplicationConstants.Environment | ||
| import org.apache.hadoop.yarn.api.protocolrecords._ | ||
| import org.apache.hadoop.yarn.api.records._ | ||
|
|
@@ -309,7 +310,7 @@ trait ClientBase extends Logging { | |
| // Add Xmx for AM memory | ||
| JAVA_OPTS += "-Xmx" + amMemory + "m" | ||
|
|
||
| val tmpDir = new Path(Environment.PWD.$(), YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR) | ||
| val tmpDir = new Path(Environment.PWD.$$(), YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR) | ||
| JAVA_OPTS += "-Djava.io.tmpdir=" + tmpDir | ||
|
|
||
| // TODO: Remove once cpuset version is pushed out. | ||
|
|
@@ -357,7 +358,7 @@ trait ClientBase extends Logging { | |
| JAVA_OPTS += ClientBase.getLog4jConfiguration(localResources) | ||
|
|
||
| // Command for the ApplicationMaster | ||
| val commands = Seq(Environment.JAVA_HOME.$() + "/bin/java", "-server") ++ | ||
| val commands = Seq(Environment.JAVA_HOME.$$() + "/bin/java", "-server") ++ | ||
| JAVA_OPTS ++ | ||
| Seq(args.amClass, "--class", args.userClass, "--jar ", args.userJar, | ||
| userArgsToString(args), | ||
|
|
@@ -396,7 +397,7 @@ object ClientBase { | |
| if (classpathEntries != null) { | ||
| for (c <- classpathEntries) { | ||
| YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, | ||
| File.pathSeparator) | ||
| ApplicationConstants.CLASS_PATH_SEPARATOR) | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -406,7 +407,7 @@ object ClientBase { | |
| if (mrClasspathEntries != null) { | ||
| for (c <- mrClasspathEntries) { | ||
| YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, | ||
| File.pathSeparator) | ||
| ApplicationConstants.CLASS_PATH_SEPARATOR) | ||
| } | ||
| } | ||
| } | ||
|
|
@@ -473,15 +474,15 @@ object ClientBase { | |
| if (localPath != null) { | ||
| val parentPath = new File(localPath).getParent() | ||
| YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, parentPath, | ||
| File.pathSeparator) | ||
| ApplicationConstants.CLASS_PATH_SEPARATOR) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As you say, this value only appeared in Hadoop 2.4.0: http://hadoop.apache.org/docs/r2.4.0/api/org/apache/hadoop/yarn/api/ApplicationConstants.html File.pathSeparator should already be ":" vs ";", which is what you intend right? I'm missing what this changes then. I understand that the char may vary on the client vs cluster, and that's why it's right to reference a symbolic constant, but these seem to be the same in that respect.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The value of ApplicationConstants.CLASS_PATH_SEPARATOR is "<CPS>" - neither ":" nor ";".
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW, maybe this is a 'yarn-client' mode-only problem.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh I get it. Because the client forms the path but sends it to the server, so it's produced and consumed in different places. The bad news is that you can't use this constant, but, I suppose you can literally specify
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it will fail for Hadoop < 2.4. |
||
| } | ||
| } | ||
|
|
||
| /** Add entry to the classpath. */ | ||
| def addClasspathEntry(path: String) = YarnSparkHadoopUtil.addToEnvironment(env, | ||
| Environment.CLASSPATH.name, path, File.pathSeparator) | ||
| Environment.CLASSPATH.name, path, ApplicationConstants.CLASS_PATH_SEPARATOR) | ||
| /** Add entry to the classpath. Interpreted as a path relative to the working directory. */ | ||
| def addPwdClasspathEntry(entry: String) = addClasspathEntry(Environment.PWD.$() + Path.SEPARATOR + entry) | ||
| def addPwdClasspathEntry(entry: String) = addClasspathEntry(Environment.PWD.$$() + Path.SEPARATOR + entry) | ||
|
|
||
| extraClassPath.foreach(addClasspathEntry) | ||
|
|
||
|
|
@@ -500,7 +501,7 @@ object ClientBase { | |
| cachedSecondaryJarLinks.foreach(addPwdClasspathEntry) | ||
| } | ||
| // Append all class files and jar files under the working directory to the classpath. | ||
| addClasspathEntry(Environment.PWD.$()) | ||
| addClasspathEntry(Environment.PWD.$$()) | ||
| addPwdClasspathEntry("*") | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just curious about what the $ vs $$ change does here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to ApplicationConstants.CLASS_PATH_SEPARATOR.
$$() wraps the string with "{{" and "}}", rather than 'hardcoding' "%" or "$" as is done by $().
And the cluster replaces it with the real script variable marker when it finds the "{{" and "}}" pair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question again that maybe someone else has -- is that going to work on Hadoop < 2.4, and I presume it works on both Windows and Linux of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same as
<CPS>. Hadoop < 2.4 will not recognize it.BTW, CLASS_PATH_SEPARATOR and $$() are annotated with
@Unstable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the best compromise here is to use reflection?
Check whether those constants exist, and use them if they do. Otherwise maintain current behavior. That way someone who has a 2.4-based installation will get the benefits, and things will keep working as they do today for those who don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a new spark configuration variable, like "spark.yarn.submit.crossplatform"?
If so, the code here can be applied conditionally "hard-coding" required separators, without using reflection.
(But currently I my knowledge on Spark source code and Scala is limited to implement it myself.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what you mean by a config option. you mean build time? Otherwise I would expect anything <2.4 to fail the to build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgravescs No, I mean run time. To apply the config option idea, the code I requested must be modified to use hard-coding approach that does not use 2.4-only methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the only way to special case this is through reflection. Otherwise it won't even compile for < 2.4. Also, I'm not sure if we need a config here. We can just try to load the
$$method, and if it's not there we fallback to$. Same goes forApplicationConstants.CLASS_PATH_SEPARATOR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewor14 What I meant for hard-coding is to use string value "{{" and "}}" directly without even calling $$() in the Spark code when Hadoop version is >= 2.4 (or config is set). But I agree that the reflection is cleaner solution.