-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-12931] Allow for DoFn#getAllowedTimestampSkew() when checking the output timestamp #15540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/test/java/org/apache/beam/runners/core/SimpleDoFnRunnerTest.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java
Outdated
Show resolved
Hide resolved
|
Is there an associated JIRA? |
|
I think you have to update the similar check in FnApiDoFnRunner |
|
Added checks in FnApiDoFnRunner which was not checking for skew on timers and not checking timestamps at all for normal emit. PTAL and make sure it makes sense and I can prob add some tests in FnApiRunner. PTAL @reuvenlax @lukecwik Thanks! |
abbc86d to
a88948b
Compare
lukecwik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this allow for infinite skew since if have a timer at X and skew of -1 then the first time the timer is processed you can output at time X-1 and when it gets scheduled again you can now output at X-2 since the the new timers timestamp is X-1?
The changes to the FnApiDoFnRunner to check timestamp output validity makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can handle the proper bounds via:
Instant lowerBound;
try {
lowerBound = elementInputTimestamp.minus(fn.getAllowedTimestampSkew());
catch (ArithmeticException e) {
lowerBound = BoundedWindow.TIMESTAMP_MIN_VALUE;
}
if (outputTimestamp.isBefore(lowerBound)) {
...
}
Finally it would make sense to check the upper bound as well of BoundedWindow.TIMESTAMP_MAX_VALUE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you not use TimerMap as it limits the number of runners this can run on and use individual timers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The elements need to share a timer then but I suppose the same logic will hit whenever we set the output timestamp. I removed the output of elements though since this would be harder. Wasn't really needed anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to tag with UsesTimersInParDo.class and/or UsesTimerMap.class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Assuming just the validates runner tests need this because I don't see it elsewhere in the file. Let me know if not.
runners/core-java/src/test/java/org/apache/beam/runners/core/SimpleDoFnRunnerTest.java
Outdated
Show resolved
Hide resolved
sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
Outdated
Show resolved
Hide resolved
sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
Outdated
Show resolved
Hide resolved
sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
Outdated
Show resolved
Hide resolved
sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
Outdated
Show resolved
Hide resolved
So my understanding of the reason for these checks is to stop people from doing the wrong thing without realizing it. We don't even take any different action based on this variable. It seems okay to apply this to each specific output timestamp and let you skew more if you chain timers in this fashion. On a more practical note, there's reasons why you might want a timer to output an earlier element if you've properly set up watermark holds. There's currently no way to do that so we need some allowance. It would probably be better if we could constrain skew from the first output timestamp but I don't think that's available in the later timers, right? If you disagree with the approach, I can bring this up on the email thread for others to chime in in case they are not checking here. |
|
PTAL, @lukecwik @reuvenlax |
bf7d2c3 to
7cd3818
Compare
je-ik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Do we have a follow-up for un-deprececating the method?
I think users will be surprised that their data will be dropped as late once they pass the watermark skew bound if they output past it. The existing logic had guards for this explicitly since it would be surprising for users so I do believe it is important enough to discuss whether there is another approach to solve this or we are ok with this happening. |
We chatted a bit about this offline. There's actually no guarantee that the watermark is held back when using DoFn#getAllowedTimestampSkew. The allowedTimestampSkew just removes the check that we have to avoid accidentally dropping late data. See the javadoc [1] and relevant reply from Jan [2]. [1] https://beam.apache.org/releases/javadoc/2.5.0/org/apache/beam/sdk/transforms/DoFn.html#getAllowedTimestampSkew-- |
runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java
Outdated
Show resolved
Hide resolved
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
Outdated
Show resolved
Hide resolved
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
Outdated
Show resolved
Hide resolved
sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/test/java/org/apache/beam/runners/core/SimpleDoFnRunnerTest.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/test/java/org/apache/beam/runners/core/SimpleDoFnRunnerTest.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/test/java/org/apache/beam/runners/core/SimpleDoFnRunnerTest.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/test/java/org/apache/beam/runners/core/SimpleDoFnRunnerTest.java
Outdated
Show resolved
Hide resolved
fa5c8c8 to
ba23c23
Compare
|
PTAL @lukecwik |
lukecwik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great, need to fix up the OnWindowExpiration case and add a ValidatesRunner test for it.
runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #15540 +/- ##
===========================================
+ Coverage 46.10% 83.53% +37.42%
===========================================
Files 196 445 +249
Lines 19498 61210 +41712
===========================================
+ Hits 8990 51131 +42141
- Misses 9538 10079 +541
+ Partials 970 0 -970 Continue to review full report at Codecov.
|
7cdde98 to
bfaac5a
Compare
bfaac5a to
9794db7
Compare
9794db7 to
fbcfd0b
Compare
|
@lukecwik PTAL, still need to check that everything runs internally but otherwise should be good. |
lukecwik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great and I really like the simplification in the testing.
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
Outdated
Show resolved
Hide resolved
bafa6ce to
5a95bf6
Compare
|
Run Java PreCommit |
5a95bf6 to
a720c16
Compare
b373e0a to
3249a0c
Compare
8954610 to
b3192dc
Compare
b3192dc to
c43415e
Compare
A DoFn may emit elements with a timestamp up to DoFn#getAllowedTimestampSkew() before the current element's timestamp. This change implements this change for timer's as well. Now a timer may have an output timestamp up to DoFn#getAllowedTimestampSkew() before the current element's timestamp. Before this change a timer's output timestamp could not be before the current output element.
Additional Context: https://lists.apache.org/thread.html/r7554658114ddde86c5d82e1c39fe7e1ef587fe926b8e406d1130d501%40%3Cdev.beam.apache.org%3E
@reuvenlax
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
ValidatesRunnercompliance status (on master branch)Examples testing status on various runners
Post-Commit SDK/Transform Integration Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.