-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
Support the bundle finalization feature. This will enable certain tests in the Validates Runner Suites to pass.
SDKs can set a ParDo needs finalization at pipeline submission time, and require the feature is supported by the runner.
DoFns require finalization to allow state to be cleared in external services, since it's been "durably persisted" by the runner. While durability isn't true for Prism at this time, finalization remains a blocker even for small test cases of interesting DoFns.
Places to look for implementing this feature.
Pipeline Proto:
The requirement in the proto file:
https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L1720
The proto field in question for ParDoPayload
https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L542
FnAPI:
ProcessBundleResponses set the following field if they need finalization after bundle persistence.
The runner is expected to send the following InstructionRequest back to the SDK.
https://github.com/apache/beam/blob/master/model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto#L156
Prism implementation tips.
Requirement filtering occurs here:
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/jobservices/job.go#L44
The Execute method handles the ProcessBundle lifecycle.
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/stage.go#L79
Data is persisted to the runner here:
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/stage.go#L272
Bundle Related SDK callbacks are implemented here:
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/worker/bundle.go#L208
Testing
There are both Python and Java Validates runner tests that require this feature, but the feature is implemented in the Go SDK, so it's possible to author a Go pipeline that exercises the feature that validates that the callback works.
It would likely be similar to the Separation Harness tests for Splittable DoFns, which turn up a small local server so the test only executes in LoopBack mode. https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/separate_test.go
It wouldn't need to be complicated, just validate that the finalized is called, in a way that can be validated outside of the Pipeline execution itself. Bonus points for having it validated in pipeline somehow. Only required to be validated in loopback mode execution though. That will show in execution coverage in the prism/internal package.