-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17807][core] split test-tags into test-JAR #16311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
While preparing this PR, I noted that spark-tags has 3 test-tags that are each specifically directed at each of three modules' tests:
It may make more sense to just move each test-tag to its corresponding module, and do away with the spark-tags test-JAR altogether; I don't see that their status as "tags" is the most important axis on which to physically group them. LMK if anyone would prefer that and I'll do it. |
|
This looks reasonable to me as a solution. I am not sure if there was a reason the tags needed to be in one module (like to enable some certain usage of it in the SBT build or something). I suppose the nice thing about having them in one place is that, being just tags, this whole module can be dropped at runtime if desired. Hence pushing the whole thing to |
I tried that at first when I introduced those tags, but because they're provided as arguments in the command line to disable those tests, scalatest throws a fit if the tag class doesn't exist when running tests for a particular module. |
|
@vanzin interesting, so at what level do they have to be brought in as dependencies in order to avoid that? Is that why spark-core needs to depend on the test-tags? I didn't see a reason for it to so I did not add such a dep here, I wonder if that will cause some tests to fail? |
They need to be a dependency in every module. That's why the current artifact is declared in every module, even though most of them do not use any tags. And you can't just declare that in the root pom's dependencies because then "spark-tags" cannot inherit from the root pom... fun all around. |
|
BTW, I agree this is a better fix and my version kinda abuses maven's "provided" scope, but I was mainly trying to avoid the above (having to modify every pom file in the repo). |
|
One more, forgot:
They will only cause tests to fail when the tags are excluded from the tests. e.g., if you make a SQL change, that will trigger the exclusion of "ExtendedYarnTest", and that would cause test failures if the tag is not available in some module. |
Should spark-tags not inherit from the root POM? it seems like it barely does anything as a module, does it need the root POM?
Can you elaborate on why the second part of this follows from the first? Can we add a test-jar-test-dep to the root POM? Wouldn't that be picked up by every module? |
Sorry, I think that's not accurate, actually. I think it works more like this: if you make a change, but you don't change any files under the YARN module, then some slower YARN tests are skipped by excluding that tag. I don't remember exactly what the code does, so it might add the exclusion in more cases, but that is the idea at least. You can try to run the tests with the argument to exclude the tags locally to see if that works, but I really do not remember the command.
It would require some duplication (e.g. version of shared dependencies) but is probably doable. Don't know if there are any other implications. |
Interesting, where is the code that does this that you referred to? Some kind of hard-coded rules about what has to be rebuilt/re-tested when various things change?
Looking at the spark-tags pom and libraries, it has no dependencies other than scalatest (test scope; I had missed changing this from "compile" to "test" scope in this PR, just updated, thanks!) which it gets from the root POM. OTOMH, spark-tags does get at least some necessary configuration from the root POM, for publishing a Obviously this is all pretty sticky, but while we're here I'm interested in at least discussing what the most idiomatic solution would be, whether or not we ultimately decide that it's worth the trouble. |
|
The metadata is defined in "dev/sparktestsupport/modules.py", and there's other code around that area that sets the commands. Another option I'd throw in the pit is just getting rid of the test tags... these days they're probably not that useful anymore. Tests take a long time even if those tags are excluded, so a few minutes more or less won't make things much worse. |
|
OK, well lmk if I should make a version of this that:
|
|
You'll either have to do that or fix the current change to have the test artifact in every pom, too. But I'm warming up to the idea of getting rid of the test tags altogether. @JoshRosen do you have any insight of whether that would cause issues? I know you added maven caches for the "VersionSuite" test which used to take a long time. And the YARN tests are only really a couple of extra minutes. Perhaps for 2.1(.x) we can go with the smaller change and remove the tags in 2.2. |
|
Test build #70255 has finished for PR 16311 at commit
|
|
I suppose that build-pass above can be considered erroneous since I'd not actually removed the compile-scope scalatest-dep from spark-tags in the first commit in this PR… let's see how the second build fares. |
|
The builds in this PR will not trigger the issue I explained before. Because you're changing both SQL and YARN (which is where the test tags are used), the tags won't be excluded, and the error will not be triggered. |
|
Test build #70259 has finished for PR 16311 at commit
|
|
@ryan-williams I favor continuing with this change if you're able to finish off the changes per @vanzin . It does indeed require editing all the POMs but it's straightforward. |
|
@ryan-williams do you plan on updating the PR? It can't be checked in as is. One of the two previous solutions needs to be implemented. |
|
@srowen thx, one comment:
The other option that I described above is to express the "each module needs a spark-tags test-dep" requirement once, in the spark-parent POM's @vanzin on that note, I guess I am proximally blocked on this question of whether to just hard-code a spark-tags dep into every submodule's POM, or try to factor it out a little more cleanly, albeit with a bit more pom-inheritance contortions. I'm also balking at a higher-level at some apparent messiness of the overall build landscape that this has run into:
So I'm not really sure what the best way to proceed is; I could almost squint and say the requirement for all possible tags to be on the classpath when running unrelated modules' tests is a problem with Scalatest, or Scalatest's interactions with Spark's submodule-landscape, or something else altogether. Anyway we can keep discussing or someone is welcome to fork this PR themselves and push through some seemingly-least-evil solution. |
Again, that is because of the issue with sbt complaining if you try to run tests excluding the tags, and they don't exist. My original version of the change, which caused test breakages, had the tags in the "test" directory of the respective modules. If you want more history, those tags used to be in a purely "test-tags" module which was only used in test scope, but at some point more, non-test tags were added to the same module and it was re-purposed, but caused the current scalatest leakage issue. So you have 3 options, actually, here:
Given how slow the current tests already are, and what I've said before about VersionsSuite / YARN tests, I think the latter is ok. Maybe even ok as a 2.1 patch, now that 2.1.0 is out. But the other two are fine too. |
|
Thx for that info. Are these test-gated tags ever run? If so, when? |
|
See previous comments. They are triggered when you don't make explicit changes in the affected modules (e.g. SQL or YARN). e.g., see: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70383/consoleFull |
|
OK, I looked at what the more-correct-but-invasive POM-refactoring would take and could not stomach it, so I pasted a spark-tags test-dep into every relevant submodule's POM*. *I only did POMs that already had a compile-scope dep on spark-tags; these POMs were skipped: lmkwyt |
|
LGTM pending tests, which should pass. Weird that some components don't have it; maybe we've just been lucky. Or maybe scalatest changed and fixed the issue we saw originally. Either way, we'll fix things if they become a problem. |
|
Thanks @vanzin; do you have a cmdline handy that will reproduce the problem with the tags not being found (i.e. if I remove the spark-tags test-dep from the core module)? It'd be nice to sanity-check this for myself but I don't see what to do with the configs in Sorry if you've described it already and I missed it, still glazing over at some parts of the whole menagerie 😀. |
|
The closest I have is pasted in a comment above: #16311 (comment) |
|
Test build #70489 has finished for PR 16311 at commit
|
|
Merging to master. If builds (jenkins and PR builders) are happy I'll later merge to 2.1. |
|
Thanks @vanzin ! |
|
Builds look good as far as I can tell, so I'm also merging this into 2.1. |
Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR. Alternative to #16303. Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #16311 from ryan-williams/tt. (cherry picked from commit afd9bc1) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR. Alternative to #16303. Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #16311 from ryan-williams/tt. (cherry picked from commit afd9bc1) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
|
Merged to 2.0 also (after a trivial conflict resolution). |
Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR. Alternative to apache#16303. Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes apache#16311 from ryan-williams/tt.
Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR.
Alternative to #16303.