Skip to content

Conversation

@ryan-williams
Copy link
Contributor

Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR.

Alternative to #16303.

@ryan-williams
Copy link
Contributor Author

While preparing this PR, I noted that spark-tags has 3 test-tags that are each specifically directed at each of three modules' tests:

  • DockerTest -> external/docker-integration-tests
  • ExtendedHiveTest -> sql/hive
  • ExtendedYarnTest -> resource-managers/yarn

It may make more sense to just move each test-tag to its corresponding module, and do away with the spark-tags test-JAR altogether; I don't see that their status as "tags" is the most important axis on which to physically group them.

LMK if anyone would prefer that and I'll do it.

cc @vanzin @srowen

@srowen
Copy link
Member

srowen commented Dec 16, 2016

This looks reasonable to me as a solution. I am not sure if there was a reason the tags needed to be in one module (like to enable some certain usage of it in the SBT build or something). I suppose the nice thing about having them in one place is that, being just tags, this whole module can be dropped at runtime if desired. Hence pushing the whole thing to provided scope is always probably possible.

@vanzin
Copy link
Contributor

vanzin commented Dec 16, 2016

It may make more sense to just move each test-tag to its corresponding module

I tried that at first when I introduced those tags, but because they're provided as arguments in the command line to disable those tests, scalatest throws a fit if the tag class doesn't exist when running tests for a particular module.

@ryan-williams
Copy link
Contributor Author

@vanzin interesting, so at what level do they have to be brought in as dependencies in order to avoid that? Is that why spark-core needs to depend on the test-tags? I didn't see a reason for it to so I did not add such a dep here, I wonder if that will cause some tests to fail?

@vanzin
Copy link
Contributor

vanzin commented Dec 16, 2016

so at what level do they have to be brought in as dependencies

They need to be a dependency in every module. That's why the current artifact is declared in every module, even though most of them do not use any tags.

And you can't just declare that in the root pom's dependencies because then "spark-tags" cannot inherit from the root pom... fun all around.

@vanzin
Copy link
Contributor

vanzin commented Dec 16, 2016

BTW, I agree this is a better fix and my version kinda abuses maven's "provided" scope, but I was mainly trying to avoid the above (having to modify every pom file in the repo).

@vanzin
Copy link
Contributor

vanzin commented Dec 16, 2016

One more, forgot:

I wonder if that will cause some tests to fail?

They will only cause tests to fail when the tags are excluded from the tests. e.g., if you make a SQL change, that will trigger the exclusion of "ExtendedYarnTest", and that would cause test failures if the tag is not available in some module.

@ryan-williams
Copy link
Contributor Author

And you can't just declare that in the root pom's dependencies because then "spark-tags" cannot inherit from the root pom... fun all around.

Should spark-tags not inherit from the root POM? it seems like it barely does anything as a module, does it need the root POM?

if you make a SQL change, that will trigger the exclusion of "ExtendedYarnTest"

Can you elaborate on why the second part of this follows from the first?

Can we add a test-jar-test-dep to the root POM? Wouldn't that be picked up by every module?

@vanzin
Copy link
Contributor

vanzin commented Dec 16, 2016

me: if you make a SQL change, that will trigger the exclusion of "ExtendedYarnTest"
Can you elaborate on why the second part of this follows from the first?

Sorry, I think that's not accurate, actually. I think it works more like this: if you make a change, but you don't change any files under the YARN module, then some slower YARN tests are skipped by excluding that tag. I don't remember exactly what the code does, so it might add the exclusion in more cases, but that is the idea at least.

You can try to run the tests with the argument to exclude the tags locally to see if that works, but I really do not remember the command.

Should spark-tags not inherit from the root POM?

It would require some duplication (e.g. version of shared dependencies) but is probably doable. Don't know if there are any other implications.

@ryan-williams
Copy link
Contributor Author

ryan-williams commented Dec 16, 2016

if you make a change, but you don't change any files under the YARN module, then some slower YARN tests are skipped by excluding that tag. I don't remember exactly what the code does, so it might add the exclusion in more cases, but that is the idea at least.

Interesting, where is the code that does this that you referred to? Some kind of hard-coded rules about what has to be rebuilt/re-tested when various things change?

It would require some duplication (e.g. version of shared dependencies) but is probably doable. Don't know if there are any other implications.

Looking at the spark-tags pom and libraries, it has no dependencies other than scalatest (test scope; I had missed changing this from "compile" to "test" scope in this PR, just updated, thanks!) which it gets from the root POM.

OTOMH, spark-tags does get at least some necessary configuration from the root POM, for publishing a test-jar; if that's all there is, we could duplicate it and remove spark-tags' inheritance from spark-parent, or we could factor the shared configs into into a root-root-POM that spark-tags and spark-parent each inherit from, and have spark-parent depend on spark-tags.

Obviously this is all pretty sticky, but while we're here I'm interested in at least discussing what the most idiomatic solution would be, whether or not we ultimately decide that it's worth the trouble.

@vanzin
Copy link
Contributor

vanzin commented Dec 16, 2016

The metadata is defined in "dev/sparktestsupport/modules.py", and there's other code around that area that sets the commands.

Another option I'd throw in the pit is just getting rid of the test tags... these days they're probably not that useful anymore. Tests take a long time even if those tags are excluded, so a few minutes more or less won't make things much worse.

@ryan-williams
Copy link
Contributor Author

ryan-williams commented Dec 16, 2016

OK, well lmk if I should make a version of this that:

  • un-inherits spark-tags from spark-parent,
  • copies the test-jar-publishing config (and any other relevant ones I see) into spark-tags, and
  • adds a spark-tags test-dep to spark-parent, thereby having all submodules inherit it.

@vanzin
Copy link
Contributor

vanzin commented Dec 16, 2016

You'll either have to do that or fix the current change to have the test artifact in every pom, too.

But I'm warming up to the idea of getting rid of the test tags altogether. @JoshRosen do you have any insight of whether that would cause issues? I know you added maven caches for the "VersionSuite" test which used to take a long time. And the YARN tests are only really a couple of extra minutes.

Perhaps for 2.1(.x) we can go with the smaller change and remove the tags in 2.2.

@SparkQA
Copy link

SparkQA commented Dec 16, 2016

Test build #70255 has finished for PR 16311 at commit fe77117.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ryan-williams
Copy link
Contributor Author

I suppose that build-pass above can be considered erroneous since I'd not actually removed the compile-scope scalatest-dep from spark-tags in the first commit in this PR… let's see how the second build fares.

@vanzin
Copy link
Contributor

vanzin commented Dec 16, 2016

The builds in this PR will not trigger the issue I explained before. Because you're changing both SQL and YARN (which is where the test tags are used), the tags won't be excluded, and the error will not be triggered.

@SparkQA
Copy link

SparkQA commented Dec 16, 2016

Test build #70259 has finished for PR 16311 at commit 46135a2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Dec 19, 2016

@ryan-williams I favor continuing with this change if you're able to finish off the changes per @vanzin . It does indeed require editing all the POMs but it's straightforward.

@vanzin
Copy link
Contributor

vanzin commented Dec 21, 2016

@ryan-williams do you plan on updating the PR? It can't be checked in as is. One of the two previous solutions needs to be implemented.

@ryan-williams
Copy link
Contributor Author

@srowen thx, one comment:

It does indeed require editing all the POMs but it's straightforward.

The other option that I described above is to express the "each module needs a spark-tags test-dep" requirement once, in the spark-parent POM's <dependencies/> tag.

@vanzin on that note, I guess I am proximally blocked on this question of whether to just hard-code a spark-tags dep into every submodule's POM, or try to factor it out a little more cleanly, albeit with a bit more pom-inheritance contortions.

I'm also balking at a higher-level at some apparent messiness of the overall build landscape that this has run into:

  • we have submodule-specific tags that have been module-sharded by their tag-ness instead of by their submodule-relevance, which are used to manually control whether to run certain tests in certain situations (as opposed to, say, the build system understanding when they'd need to be run based on changes to things they depend on).
  • apparently the way they are used is broken by this PR, but the already-3hr build does not know about that, and passes.

So I'm not really sure what the best way to proceed is; I could almost squint and say the requirement for all possible tags to be on the classpath when running unrelated modules' tests is a problem with Scalatest, or Scalatest's interactions with Spark's submodule-landscape, or something else altogether.

Anyway we can keep discussing or someone is welcome to fork this PR themselves and push through some seemingly-least-evil solution.

@vanzin
Copy link
Contributor

vanzin commented Dec 21, 2016

we have submodule-specific tags that have been module-sharded by their tag-ness instead of by their submodule-relevance

Again, that is because of the issue with sbt complaining if you try to run tests excluding the tags, and they don't exist. My original version of the change, which caused test breakages, had the tags in the "test" directory of the respective modules.

If you want more history, those tags used to be in a purely "test-tags" module which was only used in test scope, but at some point more, non-test tags were added to the same module and it was re-purposed, but caused the current scalatest leakage issue.

So you have 3 options, actually, here:

  • add the test-jar to all poms
  • don't inherit from the Spark root pom
  • remove the test tags and run those tests in all PR builds

Given how slow the current tests already are, and what I've said before about VersionsSuite / YARN tests, I think the latter is ok. Maybe even ok as a 2.1 patch, now that 2.1.0 is out. But the other two are fine too.

@ryan-williams
Copy link
Contributor Author

Thx for that info.

Are these test-gated tags ever run? If so, when?

@vanzin
Copy link
Contributor

vanzin commented Dec 21, 2016

See previous comments. They are triggered when you don't make explicit changes in the affected modules (e.g. SQL or YARN). e.g., see:

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70383/consoleFull

[info] Running Spark tests using SBT with these arguments:  -Phadoop-2.3 -Phive -Pyarn -Pmesos -Phive-thriftserver -Pkinesis-asl -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test

@ryan-williams
Copy link
Contributor Author

ryan-williams commented Dec 21, 2016

OK, I looked at what the more-correct-but-invasive POM-refactoring would take and could not stomach it, so I pasted a spark-tags test-dep into every relevant submodule's POM*.

*I only did POMs that already had a compile-scope dep on spark-tags; these POMs were skipped:

assembly/pom.xml
examples/pom.xml
external/flume-assembly/pom.xml
external/kafka-0-10-assembly/pom.xml
external/kafka-0-8-assembly/pom.xml
external/kinesis-asl-assembly/pom.xml
external/spark-ganglia-lgpl/pom.xml
resource-managers/mesos/pom.xml
tools/pom.xml

lmkwyt

@vanzin
Copy link
Contributor

vanzin commented Dec 21, 2016

LGTM pending tests, which should pass. Weird that some components don't have it; maybe we've just been lucky. Or maybe scalatest changed and fixed the issue we saw originally. Either way, we'll fix things if they become a problem.

@ryan-williams
Copy link
Contributor Author

ryan-williams commented Dec 21, 2016

Thanks @vanzin; do you have a cmdline handy that will reproduce the problem with the tags not being found (i.e. if I remove the spark-tags test-dep from the core module)?

It'd be nice to sanity-check this for myself but I don't see what to do with the configs in dev/sparktestsupport/modules.py.

Sorry if you've described it already and I missed it, still glazing over at some parts of the whole menagerie 😀.

@vanzin
Copy link
Contributor

vanzin commented Dec 21, 2016

The closest I have is pasted in a comment above: #16311 (comment)

@SparkQA
Copy link

SparkQA commented Dec 21, 2016

Test build #70489 has finished for PR 16311 at commit fb16afb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Dec 22, 2016

Merging to master. If builds (jenkins and PR builders) are happy I'll later merge to 2.1.

@asfgit asfgit closed this in afd9bc1 Dec 22, 2016
@ryan-williams
Copy link
Contributor Author

Thanks @vanzin !

@vanzin
Copy link
Contributor

vanzin commented Dec 22, 2016

Builds look good as far as I can tell, so I'm also merging this into 2.1.

asfgit pushed a commit that referenced this pull request Dec 23, 2016
Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR.

Alternative to #16303.

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #16311 from ryan-williams/tt.

(cherry picked from commit afd9bc1)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
asfgit pushed a commit that referenced this pull request Dec 23, 2016
Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR.

Alternative to #16303.

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #16311 from ryan-williams/tt.

(cherry picked from commit afd9bc1)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@vanzin
Copy link
Contributor

vanzin commented Dec 23, 2016

Merged to 2.0 also (after a trivial conflict resolution).

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR.

Alternative to apache#16303.

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes apache#16311 from ryan-williams/tt.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants