Skip to content

[Prism] Support Bundle Finalization #31912

@lostluck

Description

@lostluck

Support the bundle finalization feature. This will enable certain tests in the Validates Runner Suites to pass.

SDKs can set a ParDo needs finalization at pipeline submission time, and require the feature is supported by the runner.

DoFns require finalization to allow state to be cleared in external services, since it's been "durably persisted" by the runner. While durability isn't true for Prism at this time, finalization remains a blocker even for small test cases of interesting DoFns.

Places to look for implementing this feature.

Pipeline Proto:

The requirement in the proto file:
https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L1720

The proto field in question for ParDoPayload
https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L542

FnAPI:

ProcessBundleResponses set the following field if they need finalization after bundle persistence.

https://github.com/apache/beam/blob/master/model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto#L434

The runner is expected to send the following InstructionRequest back to the SDK.
https://github.com/apache/beam/blob/master/model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto#L156

Prism implementation tips.

Requirement filtering occurs here:
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/jobservices/job.go#L44

The Execute method handles the ProcessBundle lifecycle.
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/stage.go#L79

Data is persisted to the runner here:
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/stage.go#L272

Bundle Related SDK callbacks are implemented here:
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/worker/bundle.go#L208

Testing

There are both Python and Java Validates runner tests that require this feature, but the feature is implemented in the Go SDK, so it's possible to author a Go pipeline that exercises the feature that validates that the callback works.

It would likely be similar to the Separation Harness tests for Splittable DoFns, which turn up a small local server so the test only executes in LoopBack mode. https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/separate_test.go

It wouldn't need to be complicated, just validate that the finalized is called, in a way that can be validated outside of the Pipeline execution itself. Bonus points for having it validated in pipeline somehow. Only required to be validated in loopback mode execution though. That will show in execution coverage in the prism/internal package.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions