Skip to content

Conversation

@lostluck
Copy link
Contributor

Until now, side inputs were handled separately from most of the data handling, leading to
them remaining un-garbage-collected, leading to a memory leak for long running streaming pipelines. This PR resolves that by moving Side Input Data handling into the elementmanager, which allows the data to be GCd once the output watermark for a stage has moved beyond where a given window's side input could be referenced.

As a refresher: A transform T with a side input S, can only access data from S that is in the Same Window W as a given Element. This is what gives the "blocking" behavior of side inputs for consumers. On the other side, the side input can thus be cleaned up if there's no data that remains to be processed by T, that can be projected onto W.

Necessary precursor to add state and timers in #28543, which has the same garbage collection behavior WRT a stages watermark.

Bonus changes:

  • Annotated some coder look up errors in the SDK's execution bundle translation, to clarify what was actually failing. The issue that needed this debugging was due to removing the calls that added the coders to the bundle's components.
  • Removed some extra map nesting around side inputs.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@codecov
Copy link

codecov bot commented Nov 14, 2023

Codecov Report

Attention: 30 lines in your changes are missing coverage. Please review.

Comparison is base (618d7a8) 38.34% compared to head (d4c28f0) 38.34%.
Report is 2 commits behind head on master.

Files Patch % Lines
...am/runners/prism/internal/engine/elementmanager.go 73.46% 13 Missing ⚠️
sdks/go/pkg/beam/core/runtime/graphx/coder.go 0.00% 7 Missing ⚠️
sdks/go/pkg/beam/runners/prism/internal/stage.go 86.53% 6 Missing and 1 partial ⚠️
sdks/go/pkg/beam/core/runtime/exec/translate.go 0.00% 1 Missing ⚠️
sdks/go/pkg/beam/runners/prism/internal/execute.go 83.33% 0 Missing and 1 partial ⚠️
...o/pkg/beam/runners/prism/internal/worker/worker.go 50.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #29423   +/-   ##
=======================================
  Coverage   38.34%   38.34%           
=======================================
  Files         693      693           
  Lines      102237   102240    +3     
=======================================
+ Hits        39202    39204    +2     
  Misses      61444    61444           
- Partials     1591     1592    +1     
Flag Coverage Δ
go 53.51% <74.78%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @jrmccluskey for label go.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@lostluck
Copy link
Contributor Author

Run Go Prism Precommit

@lostluck
Copy link
Contributor Author

It's looking like the standard PreCommit and Prism PreCommit (so, same runner) have been broken since I merged in Fusion. Looking into it.

Copy link
Contributor

@jrmccluskey jrmccluskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For being a decently sized delta, this was surprisingly easy to follow for the most part.

@lostluck
Copy link
Contributor Author

Thanks! Holding off on merging this for a bit while I'm sorting out prism stability. Very likely not hurt or helped by this PR, but prudence.

@lostluck
Copy link
Contributor Author

Scratch that. My client has extra println... so that "stability" bit was just the Examples failing. whoops.

@lostluck lostluck merged commit 1f4bc68 into apache:master Nov 15, 2023
@lostluck lostluck deleted the prismSideInputProper branch November 15, 2023 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants