-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-258] Configure RunnableOnService tests for Flink runner #291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm thinking that it would be good to get the configuration in place and then work towards re-enabling the tests. We'll need to disable many more, but I first wanted to get feedback on whether I just missed something obvious that would get all the batch tests to run. I think it still makes sense to pull the first two commits either way. |
| <plugin> | ||
| <groupId>org.apache.maven.plugins</groupId> | ||
| <artifactId>maven-surefire-plugin</artifactId> | ||
| <executions> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davorbonaci @jasonkuster @lukecwik my maven-fu might be lacking. I think I can remove the phase and goals here, since the execution has an id, and maybe lift some of the configuration, but most importantly, I could not get runnableOnServicePipelineOptions to do anything, instead having to set beamTestPipelineOptions. This seems mostly fine but please advise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to drop the whole plugin configuration and either:
- do as we did in the dataflow runner pom to have a profile dependent on runnableOnService existing and then jenkins could be configured to build the flink pipeline and dependent modules with a system property on the jenkins command line which sets the runnableOnServicePipelineOptions as you have listed before
- always run the jenkins tests part of the regular integration-test run by hardcoding runnableOnServicePipelineOptions in this module and having a trivial configuration part of builds/plugins:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<executions>
<id>runnable-on-service-tests</id>
</execution>
</executions>
</plugin>I don't know if you need the executions block or not, I would think it would inherit it from plugin but could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need two executions for now, since we need to pass --streaming=true and --streaming=false. It actually seems to confuse Jenkins, since the test class is duplicated. It probably will also confuse it with the direct runner and spark runner tests.
For Flink and Spark, running against a local endpoint as a unit test (not really an integration test per se) is reasonable. We want to support both, so Jenkins can also run a real integration test against a cluster by overrides on the command line.
750a49d to
9bf7ec0
Compare
|
@kennknowles I created a PR against your branch. The problem was that the Flink Batch API does not allow dangling operators. I fixed it by keeping track of dangling operators and terminating them with a dummy sink before execution. I also changed the |
|
@aljoscha I've had to disable parallel execution of the tests before. The Flink testing clusters are not light weight, they basically start a real cluster environment inside a single JVM.
Should be resolved with Aljoscha's fix. Thanks for fixing Aljoscha. @kennknowles Could we add Aljoscha's fix to the commits up this PR? Your changes look good otherwise. Thanks for adding the
I wonder if we could fix that by making |
|
I added some more fixes to my PR. I got it down quite a bit but these are still failing (in addition to the tests that @kennknowles already disabled in the pom): The last one fails because of missing window support in Flink batch. The first two fail because of a pretty fundamental problem: Flink batch does not allow emission of null values from inputs or user operations (except in some corner cases). The In one of the commits I removed the special TypeSerializer for VoidCoder and also our special implementation of Create in favor of the |
1bc6957 to
b98ee40
Compare
|
Nice use of branch-to-branch PR :-) I added a couple more tweaks to get things closer.
Incidentally, running this on my desktop I'm often getting |
|
Hehe, yes 😃 Some of it is hacky but I think we're getting there. I'll finally start working for real on side inputs for streaming next week, this should also be interesting. The We also have more failing tests now the the |
|
I fixed another round of tests and did a PR against your branch. Most of them were caused by Flink not supporting null values so I replaced all of the dummy The last set of failing tests is caused by Flink Batch not using a |
|
I think we can solve this last one with another JUnit category for timestamp control, which is a row on the capability matrix so it fits. |
|
Yeah, I thought so as well. |
|
One thing that we need to consider: null values are a language-specific concept. The language-agnostic conception is really "PCollection containing elements encoded with More broadly, as long as a user can provide a So it may be more appropriate to have the behavior you mentioned before, where the Flink runner notices the use of root level |
|
There is no real problem preventing this, no. But you're right, that the system should be able to handle everything that the user can provide a |
|
I thought about it some more. The problem with |
|
@kennknowles Are you planning to merge this soon or will we leave it open until the other stuff is fixed as well? |
|
@aljoscha I think we can start to get useful coverage from this before all the tests pass. I just want to use some combination of So, considering the new ideas about What do you think? |
|
Sounds good, you can also revert the |
|
Cool. I feel like rebasing this to remove some of the experiments would be nice. I don't want to conflict with your current work, so let me know if/when you think it would be safe for me to rebase, add exclusions, and merge. |
|
I think you can go ahead and rebase/clean up. I've got it working except for merging windows, but I still want to go over the code another time so I'll rebase on master then. By the way, for the batch windowing I'm completely ignoring triggers. Is this also what the Dataflow runner does? This essentially does a shuffle by key and window and merges elements using a |
|
Yes, ignoring triggers in batch is fine. I have a design to doc share about that as soon as I can patch it up... |
78aae1e to
0b61c8c
Compare
|
@amitsela the changes to Given that this shouldn't actually be a semantic change, my thought is that there might be something going on with null values in the Spark runner as well (possibly causing a no-op |
|
I saw this as well in the Flink tests. Once I fixed the |
|
The existing |
|
If I take your prior comments as "LGTM" I will merge when Travis finishes up. |
|
Ah yes, I wasn't aware that we really need so see the actual text "LGTM". 😉 I also have #328 which fixes the remaining tests. It got a bit more complicated than I though, especially getting |
e9c6ce5 to
c5b88f2
Compare
This makes the runner available for selection by integration tests.
Today Flink batch supports only global windows. This is a situation we intend our build to allow, eventually via JUnit category filtering. For now all the test classes that use non-global windows are excluded entirely via maven configuration. In the future, it should be on a per-test-method basis.
We're now using a PerKeyCombineFnRunner for all interaction with the CombineFn. This required adding a proper ProcessContext in FlinkReduceFunction and FlinkPartialReduceFunction, along with adding support for side inputs there.
This does not work because Flink Batch does not allow sending null elements. This is a pretty deep thing and hard to fix. In an earlier commit I removed the special TypeSerializer for VoidCoder. Before, we got away by always intercepting the VoidCoder and wrapping it in a TypeSerializer that would always emit a VoidValue instead of a proper null. If the user fn reads this, this will not be consistent with how it should behave, however.
The single null value is only used as a dummy, thus can also be an integer. This makes it work with runners that don't support sending null values.
The single null value is only used as a dummy, thus can also be an integer. This makes it work with runners that don't support sending null values.
The single null value is only used as a dummy, thus can also be an integer. This makes it work with runners that don't support sending null values.
c5b88f2 to
31cb37d
Compare
|
I believe this is superseded by #328. |
* test: update conf tests * test: use relative directories, not hardcoded, in conf test makefile * test: add format at end of generation * test: remove unused var * Updated Makefile to make less assumptions about project structure Co-authored-by: Craig Labenz <craig.labenz@gmail.com>
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull requestmvn clean verify. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.
This is a sample configuration for now.
There are these kind of failures in the tests right now:
UnsupportedOperationExceptionto the windowing translator so I could distinguish them.NullPointerException.PAssertworks, so they cannot work in streaming mode.