[BEAM-3499, BEAM-2607] Gives the runner access to positions of SDF claimed blocks #4483

jkff · 2018-01-25T01:53:12Z

This addresses the following issues:

https://issues.apache.org/jira/browse/BEAM-3499 Watch can make no progress if a single poll takes more than checkpoint interval
https://issues.apache.org/jira/browse/BEAM-2607 Enforce that SDF must return stop() after a failed tryClaim() call

The former is the primary motivation for this PR. This PR changes SDF checkpointing timer countdown to start from the first claimed block, rather than from the beginning of @ProcessElement. This requires giving the runner visibility into claimed blocks. Such visibility enables fixing BEAM-2607 as well. It also is a required part of implementing SDF splitting over Fn API (tracked separately).

This PR also, of course, changes the Watch transform to the new API; and, while we're at it, does some related improvements:

Compresses Watch.GrowthState using Snappy. E.g. with 100k files, the encoded state is about 3MB instead of 8MB. Compressing it much more is difficult because the state includes uncompressible hashes. To address this, one must shard the filepattern, or implement the improvements suggested in https://issues.apache.org/jira/browse/BEAM-2680 .
Makes direct runner create a clone of state cells - I did this mainly because I noticed that GrowthStateCoder was never called on the Watch state, which risks missing coder bugs when testing with direct runner.

This PR is update-incompatible for users of the Watch transform, e.g. FileIO.match().continuously(). This is an experimental and very recent transform, so I'm going to ignore the incompatibility. It also requires a traditional Dataflow worker dance to get the worker container in sync with these runners-core changes - I'll perform that when the rest of the PR is approved.

R: @tgroh @chamikaramj
CC: @kennknowles @reuvenlax

jkff · 2018-01-25T19:46:23Z

retest this please

tgroh

There may be a couple of duplicate comments, because I went through the commits one-by-one first

tgroh · 2018-01-25T20:43:48Z

...ava/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.java

+    this.claimObserver = claimObserver;
+  }
+
+  public final boolean tryClaim(PositionT position) {


I would strongly consider inverting these names in some way (so the author implements tryClaim, and this is executeTryClaim or something that signals that it is using the tryClaim method)

I'm coming from the assumption that new RestrictionTrackers are written much more rarely than new SDFs using existing trackers, and I'd like the caller to use tryClaim. I guess I could rename tryClaimImpl to executeTryClaim but it seems about equally descriptive. (side note: I considered a number of other alternatives for this design, e.g. passing a claim callback as a context parameter to @ProcessElement; allowing a RestrictionTracker to simply refuse a checkpoint etc. to address just the checkpointing issue, but they all were much worse in various ways)

tgroh · 2018-01-25T20:47:02Z

sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java

      this.terminationState = state.terminationState;
-      this.pending = Lists.newLinkedList(state.pending);
+      this.pending = Maps.newLinkedHashMapWithExpectedSize(state.pending.size());
+      for (Map.Entry<HashCode, TimestampedValue<OutputT>> entry : state.pending.entrySet()) {


this.pending.putAll(state.pending)? or this.pending = new LinkedHashMap<>(state.pending)

tgroh · 2018-01-25T20:56:58Z

...n/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java

-              },
-              maxDuration.getMillis(),
-              TimeUnit.MILLISECONDS);
+    void checkClaimHasNotFailed() {


Can this just be inlined?

tgroh · 2018-01-25T20:59:30Z

...ava/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.java

 */
-public interface RestrictionTracker<RestrictionT> {
+public abstract class RestrictionTracker<RestrictionT, PositionT> {
+  interface ClaimObserver<PositionT> {


Do we expect the ClaimObserver to ever interact with the PositionT?

tgroh · 2018-01-25T21:02:09Z

runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryStateInternals.java

    }
  }
+
+  private static <T> T unsafeClone(Coder<T> coder, T value) {


s/unsafe/unchecked/

tgroh · 2018-01-25T21:03:00Z

runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryStateInternals.java

+          new InMemoryCombiningState<>(combineFn, accumCoder);
      if (!this.isCleared) {
        that.isCleared = this.isCleared;
        that.addAccum(accum);


Should this be cloned?

Yup, thanks for the catch

tgroh · 2018-01-25T21:05:25Z

...n/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java

      this.tracker = tracker;
+    }
+
+    void checkClaimHasNotFailed() {


tgroh · 2018-01-25T21:07:16Z

...ava/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.java

+public abstract class RestrictionTracker<RestrictionT, PositionT> {
+  /** Internal interface allowing a runner to observe the calls to {@link #tryClaim}. */
+  @Internal
+  public interface ClaimObserver<PositionT> {


Do we expect this to ever do anything notable with the position? I can't think of a case where the invoker would be concerned with the actual position, which is an implementation detail of the DoFn.

If you've got an idea of when it might, I'd love an example; otherwise I'd remove the parameters from this interface

Yes, the observer will eventually need to store the positions and pass them back to new methods of SDF for splitting, as part of implementation of splitting/checkpointing over Fn API.

jkff

Thanks!

jkff · 2018-01-25T21:14:19Z

runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryStateInternals.java

+          new InMemoryCombiningState<>(combineFn, accumCoder);
      if (!this.isCleared) {
        that.isCleared = this.isCleared;
        that.addAccum(accum);


Yup, thanks for the catch

jkff · 2018-01-25T21:14:42Z

runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryStateInternals.java

    }
  }
+
+  private static <T> T unsafeClone(Coder<T> coder, T value) {


jkff · 2018-01-25T21:15:04Z

...n/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java

-              },
-              maxDuration.getMillis(),
-              TimeUnit.MILLISECONDS);
+    void checkClaimHasNotFailed() {


jkff · 2018-01-25T21:17:36Z

...ava/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.java

+public abstract class RestrictionTracker<RestrictionT, PositionT> {
+  /** Internal interface allowing a runner to observe the calls to {@link #tryClaim}. */
+  @Internal
+  public interface ClaimObserver<PositionT> {


Yes, the observer will eventually need to store the positions and pass them back to new methods of SDF for splitting, as part of implementation of splitting/checkpointing over Fn API.

jkff · 2018-01-25T21:20:08Z

...ava/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.java

+    this.claimObserver = claimObserver;
+  }
+
+  public final boolean tryClaim(PositionT position) {


I'm coming from the assumption that new RestrictionTrackers are written much more rarely than new SDFs using existing trackers, and I'd like the caller to use tryClaim. I guess I could rename tryClaimImpl to executeTryClaim but it seems about equally descriptive. (side note: I considered a number of other alternatives for this design, e.g. passing a claim callback as a context parameter to @ProcessElement; allowing a RestrictionTracker to simply refuse a checkpoint etc. to address just the checkpointing issue, but they all were much worse in various ways)

jkff · 2018-01-25T21:37:14Z

Apologies, forgot to actually push the changes.

chamikaramj

Thanks.

chamikaramj · 2018-01-25T22:17:33Z

...n/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java


    private void noteOutput() {
+      checkState(!hasClaimFailed, "Output is not allowed after a failed tryClaim()");
+      checkState(numClaimedBlocks > 0, "Output is not allowed before tryClaim()");


chamikaramj · 2018-01-25T22:17:33Z

...n/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java


    @Override
    public synchronized void updateWatermark(Instant watermark) {
+      // Updating the watermark without any claimed blocks is allowed.


Why ? Should we at least warn ?

Clarified in a comment.

chamikaramj · 2018-01-25T22:17:33Z

sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java


    @Override
    public synchronized GrowthState<OutputT, KeyT, TerminationStateT> checkpoint() {
+      checkState(


Should we be rejecting the checkpoint request instead of failing here ?

Rejecting the checkpoint is not allowed. Allowing it is one of the alternatives I considered, but since runner needs access to positions anyway, I preferred to just do that.

chamikaramj · 2018-01-25T22:17:33Z

sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java

      // unless output is complete or termination condition is reached.
      if (tracker.shouldPollMore()) {
+        LOG.info(
+            "{} - emitted all known results so far; will resume polling in {} ms",


Mention numEmitted (total) here ?

chamikaramj · 2018-01-25T22:17:33Z

sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java

            KV.of(c.element(), nextPending.getValue()), nextPending.getTimestamp());
+        ++numEmitted;
      }
+      LOG.debug("{} - emitted {} new results.", c.element(), numEmitted);


This log might be a bit confusing. It says new results but do not reset numEmitted after this log.

Made this log more comprehensive and added some clarifying variables.

chamikaramj · 2018-01-25T22:17:33Z

...n/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java

+          // of any work to be done at the moment, but more might emerge later. In this case,
+          // we must simply reschedule the original restriction - checkpointing a tracker that
+          // hasn't claimed any work is not allowed.
+          residual = KV.of(tracker.currentRestriction(), processContext.getLastReportedWatermark());


Why not just fail ? This might result in an infinite scheduling loop due to a bug, no ?

Clarified in a comment.

chamikaramj · 2018-01-25T22:17:33Z

runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryStateInternals.java

    }
  }
+
+  private static <T> T uncheckedClone(Coder<T> coder, T value) {


Why "unchecked" ? Add a comment ?

Added a comment.

chamikaramj · 2018-01-25T22:17:34Z

sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java

    }
  }

+  private static class SnappyCoder<T> extends StructuredCoder<T> {


Should we take make this public (and it's own Java file) ? This might be useful for other transforms.

jkff

Thanks!

jkff · 2018-01-25T22:59:51Z

runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryStateInternals.java

    }
  }
+
+  private static <T> T uncheckedClone(Coder<T> coder, T value) {


Added a comment.

jkff · 2018-01-25T23:00:00Z

...n/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java

+          // of any work to be done at the moment, but more might emerge later. In this case,
+          // we must simply reschedule the original restriction - checkpointing a tracker that
+          // hasn't claimed any work is not allowed.
+          residual = KV.of(tracker.currentRestriction(), processContext.getLastReportedWatermark());


Clarified in a comment.

jkff · 2018-01-25T23:00:08Z

...n/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java


    @Override
    public synchronized void updateWatermark(Instant watermark) {
+      // Updating the watermark without any claimed blocks is allowed.


Clarified in a comment.

jkff · 2018-01-25T23:01:16Z

sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java

            KV.of(c.element(), nextPending.getValue()), nextPending.getTimestamp());
+        ++numEmitted;
      }
+      LOG.debug("{} - emitted {} new results.", c.element(), numEmitted);


Made this log more comprehensive and added some clarifying variables.

jkff · 2018-01-25T23:01:20Z

sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java

      // unless output is complete or termination condition is reached.
      if (tracker.shouldPollMore()) {
+        LOG.info(
+            "{} - emitted all known results so far; will resume polling in {} ms",


jkff · 2018-01-25T23:02:02Z

sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java


    @Override
    public synchronized GrowthState<OutputT, KeyT, TerminationStateT> checkpoint() {
+      checkState(


Rejecting the checkpoint is not allowed. Allowing it is one of the alternatives I considered, but since runner needs access to positions anyway, I preferred to just do that.

jkff · 2018-01-25T23:02:10Z

sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java

    }
  }

+  private static class SnappyCoder<T> extends StructuredCoder<T> {


chamikaramj

Thanks.

LGTM other than one comment.

chamikaramj · 2018-01-26T00:15:01Z

...n/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java

+        // the original restriction, i.e. pointless.
+        this.scheduledCheckpoint =
+            executor.schedule(
+                this::takeCheckpointNow, maxDuration.getMillis(), TimeUnit.MILLISECONDS);


This should be max((maxDuration - "time up to now"), 0) no ?

I'm not sure. I think both "10 seconds of claimed work" and "10 seconds of total work" are valid options, but I'm slightly in favor of the former because it's less likely to lead to pathological behavior, e.g. imagine that opening a connection to Kafka consistently takes 10+ seconds due to network issues, then the first behavior will lead to repeatedly reading just 1 record from Kafka (compared to 0 before this PR...), but the second will provide 10 seconds of useful work.

Sounds good.

jkff · 2018-01-26T00:20:14Z

Would like @tgroh 's LGTM as well before proceeding with Dataflow worker changes.

chamikaramj · 2018-01-26T00:22:33Z

LGTM

jkff · 2018-01-26T01:41:01Z

Test failures are unrelated.

tgroh

My naming thing is because I really dislike the use of Impl as a signifier of the implementation, and generally want to give users the nicer name if possible, but it's purely a style thing.

jkff · 2018-02-01T22:35:13Z

Rebased but still waiting for some Dataflow worker side Google-internal stuff before I can merge.

…first claim, and verifies more invariants

jkff · 2018-02-06T02:42:57Z

Run Dataflow ValidatesRunner

jkff · 2018-02-06T18:06:58Z

Dataflow runner tests failed somewhere towards the end due to unrelated issues - I confirmed by looking at Jenkins output that SDF tests passed. Merging.

kennknowles · 2019-01-02T20:04:02Z

Noting here, too, that InMemoryStateInternals is not part of the direct runner, but a general utility. The cloning changes caused perf regressions in multiple other contexts and need to be reverted and re-instantiated only in the direct runner.

jkff requested review from chamikaramj and tgroh January 25, 2018 01:53

tgroh reviewed Jan 25, 2018

View reviewed changes

jkff commented Jan 25, 2018

View reviewed changes

chamikaramj reviewed Jan 25, 2018

View reviewed changes

jkff commented Jan 25, 2018

View reviewed changes

chamikaramj reviewed Jan 26, 2018

View reviewed changes

jkff assigned tgroh Jan 26, 2018

tgroh approved these changes Jan 26, 2018

View reviewed changes

jkff force-pushed the sdf-claim-callback branch from c46da1b to 44ad0fe Compare February 1, 2018 22:34

jkff and others added 6 commits February 5, 2018 18:42

Adds PositionT and claim callback to RestrictionTracker

e003431

Changes OutputAndTimeBounded invoker to start checkpoint timer after …

eca41b9

…first claim, and verifies more invariants

Compresses encoded GrowthState with Snappy - about 2x-3x more compact

0371848

InMemoryStateInternals.copy clones the values using the coder

32a427c

Final fixups

8151d82

Bump worker to 20180205

6857cb9

jkff force-pushed the sdf-claim-callback branch from 44ad0fe to 6857cb9 Compare February 6, 2018 02:42

jkff merged commit 2826362 into apache:master Feb 6, 2018

jkff deleted the sdf-claim-callback branch February 6, 2018 18:07

[BEAM-3499, BEAM-2607] Gives the runner access to positions of SDF claimed blocks #4483

[BEAM-3499, BEAM-2607] Gives the runner access to positions of SDF claimed blocks #4483

Uh oh!

Conversation

jkff commented Jan 25, 2018

Uh oh!

jkff commented Jan 25, 2018

Uh oh!

tgroh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkff left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkff commented Jan 25, 2018

Uh oh!

chamikaramj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment