Skip to content

Conversation

@stankiewicz
Copy link
Contributor

Stacked on top of #36523

  1. Adds proto definition for drain
  2. Extends WindowedValue interface, plumbs default implementation where needed

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @stankiewicz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces foundational changes to support a 'drain' mechanism within Apache Beam's data processing model. It extends the WindowedValue interface and updates the protocol buffer definitions to include a draining state for elements. These modifications are crucial for enabling more robust and controlled shutdown procedures in streaming pipelines, allowing elements to be gracefully processed even as a job is winding down. The changes involve updating core data structures and simplifying DoFn output contexts by consolidating methods and leveraging a builder pattern for element construction.

Highlights

  • Drain Protocol Buffer Definition: A new DrainMode enum and an optional drain field have been added to the ElementMetadata within the beam_fn_api.proto definition. This introduces a standardized way to signal the draining state of elements in the Beam Fn API.
  • WindowedValue Interface Extension: The WindowedValue interface has been extended to include an isDraining() method, and the OutputBuilder now supports a setDraining() method. This change propagates the draining state through the core data model.
  • Simplified DoFn Output Contexts: Multiple output and outputWindowedValue overloads in DoFn.FinishBundleContext and DoFn.ProcessContext that previously accepted currentRecordId and currentRecordOffset parameters have been removed. This streamlines the API by consolidating output methods and leveraging the OutputBuilder for detailed element construction.
  • Draining State Plumbing: The new draining state has been plumbed through various core Java runners (Dataflow, Spark), SDK components (WindowedValues, DoFnTester), and the Fn Harness. This ensures that the draining information is consistently available and propagated across the Beam ecosystem.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@stankiewicz
Copy link
Contributor Author

:sdks:java:io:mqtt:test is flaky
python tests are flaky

@github-actions
Copy link
Contributor

Assigning reviewers:

R: @ahmedabu98 for label java.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@kennknowles
Copy link
Member

R: @kennknowles

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

@Nullable
Long getRecordOffset();

boolean isDraining();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this might be named causedByDrain or firedDueToDrain or some way of saying "element exists because of drain" or "element impacted by drain". I don't have a good idea, really. But I couldn't help commenting, because technically this is a piece of metadata about the element ("might be incomplete because fired while draining"), not about the state of the world ("is draining").

But I'm going to approve and merge anyhow, because it isn't worth kicking all our flaky tests just to rename, unless we have a much better name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rename it.. :)

@kennknowles kennknowles merged commit 2bd497f into apache:master Oct 26, 2025
146 of 149 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants