Skip to content

Conversation

@weiqingy
Copy link
Contributor

@weiqingy weiqingy commented Nov 13, 2016

What changes were proposed in this pull request?

Remove spark.driver.memory, spark.executor.memory, spark.driver.cores, and spark.executor.cores from running-on-yarn.md as they are not Yarn-specific, and they are also defined inconfiguration.md.

How was this patch tested?

Build passed & Manually check.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @vanzin

Amount of memory to use per executor process (e.g. <code>2g</code>, <code>8g</code>).
</td>
</tr>
<tr>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually YARN-specific right now? At least it is according to SparkSubmit.scala

Copy link
Contributor Author

@weiqingy weiqingy Nov 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing this PR, @srowen

Actually I am confused by this. From the information below, it seems not YARN-specific:

<tr>
  <td><code>spark.dynamicAllocation.initialExecutors</code></td>
  <td><code>spark.dynamicAllocation.minExecutors</code></td>
  <td>
    Initial number of executors to run if dynamic allocation is enabled.
    <br /><br />
    If `--num-executors` (or `spark.executor.instances`) is set and larger than this value, it will
    be used as the initial number of executors.
  </td>
</tr>
  • Currentlyspark.executor.instances is defined in config/package.scala. If it is for YARN-specific, it should be defined in yarn/config.scala instead of config/package.scala.

Use lower-case suffixes, e.g. <code>k</code>, <code>m</code>, <code>g</code>, <code>t</code>, and <code>p</code>, for kibi-, mebi-, gibi-, tebi-, and pebibytes, respectively.
</td>
</tr>
<tr>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see that these are duplicated in the YARN docs and main config docs. So are things like spark.driver.cores, but, the info on what it means in YARN is slightly more specific to YARN in the YARN docs. The spark.driver.memory doc isn't different, but, for consistency, maybe does not hurt to leave a note about these key, generic props in the YARN docs.

Copy link
Contributor Author

</td>
</tr>
<tr>
<td><code>spark.yarn.report.interval</code></td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether these are meant to be exposed to users -- not secret exactly, but not sure if they're to be advertised. Maybe it's fine to document. I'm neutral.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it wasn't documented, it was on purpose. If users find it a useful config we can expose it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this information.

<td><code>spark.yarn.report.interval</code></td>
<td>1s</td>
<td>
Interval between reports of the current app status in Yarn cluster mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yarn -> YARN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

</tr>
<tr>
<td><code>spark.yarn.services</code></td>
<td>Nil</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far the user is concerned, I think the best way to express the default is "(none)". Nil is an implementation detail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@weiqingy
Copy link
Contributor Author

Hi, @srowen I have replied your comments and updated the PR. Could you please review it again? Thanks.

@rxin
Copy link
Contributor

rxin commented Nov 15, 2016

Can you just put some of the more trivial pull requests into one, e.g. the two documentation update ones?

</td>
</tr>
<tr>
<td><code>spark.yarn.services</code></td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should expose this. I think this was added with a PR to start adding integration with ATS which I don't know we officially support yet, this config is just for testing at this point. @steveloughran to verify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out this.

@weiqingy
Copy link
Contributor Author

@rxin Thanks for commenting. The reason I created two PRs is that I noticed these issues not at the same time. This one is still during discussion, and I am not sure if it will be merged, however, the other one I am sure it will be merged. I did not think too much about this. I just though if I find an issue, no matter how small it is, just submit a PR directly. But, I will pay attention next time. So do you mean I should close PR #15886 and merge it into this PR? I can do that. Thanks.

@weiqingy
Copy link
Contributor Author

weiqingy commented Nov 15, 2016

Hi, @tgravescs What do you think about spark.driver.memory, spark.executor.memory, and spark.executor.instances?

  • Move spark.executor.instances from running-on-yarn.md to configuration.md.
  • Remove spark.driver.memory and spark.executor.memory from running-on-yarn.md(they are also defined in configuration.md).

Actually I noticed this issue when I was working on PR #15563. So if they are documented in current way on purpose, I can just close this PR. I would like your suggestion. Thanks.

@tgravescs
Copy link
Contributor

  • spark.driver.memory and spark.executor.memory is good to remove from yarn side as its duplicate since they were added for others.
    spark.driver.cores and spark.executor.cores could also be removed from yarn config docs as they are in the general ones now too.
  • spark.executor.instances is still only used in YARN. --num-executors is also YARN only. I'm not sure the exact behavior of spark.dynamicAllocation.initialExecutors on mesos or standalone mode to know if the description for that should be updated for those.

@weiqingy
Copy link
Contributor Author

@tgravescs Thanks for the reply. I have updated the PR.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but can you now update the title and description? "Update Yarn doc" is too generic, and the description doesn't match the change now.

@steveloughran
Copy link
Contributor

The plugin point is more generic than ATS integration; it lets you stick anything in to come up in the driver. Weakness: it's actually yarn specific; I could imagine uses in standalone too. The update interval flag? I don't remember this or what it does. Sorry

@weiqingy weiqingy changed the title [YARN][DOC] Update Yarn configuration doc [YARN][DOC] Remove non-Yarn specific configurations from running-on-yarn.md Nov 16, 2016
@weiqingy
Copy link
Contributor Author

Hi, @srowen I have updated the title and description. Thanks.

@SparkQA
Copy link

SparkQA commented Nov 17, 2016

Test build #3429 has finished for PR 15869 at commit dcb11ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Nov 17, 2016

Merged to master/2.1

@asfgit asfgit closed this in a3cac7b Nov 17, 2016
asfgit pushed a commit that referenced this pull request Nov 17, 2016
…arn.md

## What changes were proposed in this pull request?

Remove `spark.driver.memory`, `spark.executor.memory`,  `spark.driver.cores`, and `spark.executor.cores` from `running-on-yarn.md` as they are not Yarn-specific, and they are also defined in`configuration.md`.

## How was this patch tested?
Build passed & Manually check.

Author: Weiqing Yang <yangweiqing001@gmail.com>

Closes #15869 from weiqingy/yarnDoc.

(cherry picked from commit a3cac7b)
Signed-off-by: Sean Owen <sowen@cloudera.com>
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…arn.md

## What changes were proposed in this pull request?

Remove `spark.driver.memory`, `spark.executor.memory`,  `spark.driver.cores`, and `spark.executor.cores` from `running-on-yarn.md` as they are not Yarn-specific, and they are also defined in`configuration.md`.

## How was this patch tested?
Build passed & Manually check.

Author: Weiqing Yang <yangweiqing001@gmail.com>

Closes apache#15869 from weiqingy/yarnDoc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants