Skip to content

Conversation

@shunping
Copy link
Collaborator

@shunping shunping commented Jun 11, 2025

fixes #35244
fixes #28151

The date type in Java is encoded with the same logic as Timestamp in Python.

When we construct the schema proto of a python date field, it first resolves its logical type to JdbcDateType. From there, it finds its representation type.

Originally, the representation type of JdbcDateType is Timestamp. Because Timestamp is not an atomic type, it tries to find what is the corresponding logical type of Timestamp. However, there are two logical types defining Timestamp as their language type, MillisInstant and MicrosInstant. The latter one overrides the registration of the former. Therefore, we resolve the representation type of JdbcDataType as MicrosInstant, which is a logical type that can be expanded to a representation type of a named tuple of int64.

This causes the conflict on the coders. The data that comes out from the JDBCIO transform is encoded with Timestamp-like coder (in Java), but the downstream step (in Python) thinks the values are named tuples.

@shunping shunping marked this pull request as ready for review June 11, 2025 17:35
@shunping shunping requested a review from Abacn June 11, 2025 17:35
@github-actions
Copy link
Contributor

Assigning reviewers:

R: @tvalentyn for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@shunping shunping changed the title Fix a logical type issue about JdbcDateType Fix a logical type issue about JdbcDateType and JdbcTimeType Jun 11, 2025
@Abacn
Copy link
Contributor

Abacn commented Jun 11, 2025

Can you confirm the tests now work without this line?

LogicalType.register_logical_type(MillisInstant)

@shunping
Copy link
Collaborator Author

shunping commented Jun 11, 2025

xlang_jdbcio_it_test.py

Can you confirm the tests now work without this line?

LogicalType.register_logical_type(MillisInstant)

I didn't run this test, but I have some local ones that has a similar logic (like the one in #35244). They failed prior to my change with the errors, and now they run successfully.

@Abacn
Copy link
Contributor

Abacn commented Jun 11, 2025

To run this test, just remove this line in the test, make any change to https://github.com/apache/beam/blob/master/.github/trigger_files/beam_PostCommit_Python.json and push a commit to this PR

@github-actions github-actions bot added yaml and removed yaml labels Jun 12, 2025
@shunping
Copy link
Collaborator Author

remind me after tests pass

@github-actions
Copy link
Contributor

Ok - I'll remind @shunping after tests pass

@github-actions github-actions bot added yaml and removed yaml labels Jun 12, 2025
@shunping
Copy link
Collaborator Author

I checked and verified that the failure is not related to the change here.

@shunping
Copy link
Collaborator Author

@Abacn, could you please take another look? Thanks!

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@shunping shunping merged commit b7262c3 into apache:master Jun 12, 2025
100 of 102 checks passed
razvanculea pushed a commit to razvanculea/beam that referenced this pull request Jun 17, 2025
…35243)

* Fix a logical type issue about JdbcDateType

* Fix typo and also fix the logical class for java time.

* Get rid of the workaround on logical type registration. Trigger tests.

* Fix lints.
@claudevdm
Copy link
Collaborator

claudevdm commented Jun 23, 2025

I think this PR is causing failures in the xlang direct tests https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python_Xlang_Gcp_Direct.yml?page=2

@shunping

@shunping
Copy link
Collaborator Author

I think this PR is causing failures in the xlang direct tests https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python_Xlang_Gcp_Direct.yml?page=2

@shunping

Do you see the error messages? I checked but they are not very useful.

We can try to revert this PR and run xlang postcommit and see if it passes.

@claudevdm
Copy link
Collaborator

Here is one

ValueError: Error decoding input stream with coder WindowedValueCoder[window_coder=GlobalWindowCoder, value_coder=RowCoder]
self = <apache_beam.runners.worker.bundle_processor.DataInputOperation object at 0x7917f5f45d60>
encoded_windowed_values = b'\x7f\xdf;dZ\x1c\xac\t\x00\x00\x00\x01\x0fQ\x0b\x00\xfd\xff\xff\xff\x0f\xfd\xff\xff\xff\xff\xff\xff\xff\xff\x01?\xb9\...\x05Test9\x05Test9\x05Test9\x80\x00\x01\x97{"\xe0C\x02\x02\x037\x80\x00\x00@t\x17p\x00\x80\x00\x00\x00\x01\xf6\xc3\x11'

    def process_encoded(self, encoded_windowed_values: bytes) -> None:
      input_stream = coder_impl.create_InputStream(encoded_windowed_values)
      while input_stream.size() > 0:
        with self.splitting_lock:
          if self.index == self.stop - 1:
            return
          self.index += 1
        try:
>         decoded_value = self.windowed_coder_impl.decode_from_stream(
              input_stream, True)

apache_beam/runners/worker/bundle_processor.py:231: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
apache_beam/coders/coder_impl.py:1549: in decode_from_stream
    value = self._value_coder.decode_from_stream(in_stream, nested)
apache_beam/coders/coder_impl.py:1641: in decode_from_stream
    return self._value_coder.decode(in_stream.read(value_length))
apache_beam/coders/coder_impl.py:239: in decode
    return self.decode_from_stream(create_InputStream(encoded), False)
apache_beam/coders/coder_impl.py:1952: in decode_from_stream
    item = component_coder.decode_from_stream(in_stream, True)
apache_beam/coders/coder_impl.py:2005: in decode_from_stream
    return self.logical_type.to_language_type(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <apache_beam.typehints.schemas.MicrosInstant object at 0x79182e358fa0>
value = BeamSchema_ee15bbf4_32de_4611_a9dc_b1b5c5981f57(seconds=None, micros=None)

    def to_language_type(self, value):
      # type: (MicrosInstantRepresentation) -> Timestamp
>     return Timestamp(seconds=int(value.seconds), micros=int(value.micros))
E     TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

apache_beam/typehints/schemas.py:934: TypeError

The above exception was the direct cause of the following exception:

a = (<apache_beam.io.external.xlang_jdbcio_it_test.CrossLanguageJdbcIOTest testMethod=test_xlang_jdbc_write_read_0_postgres>,)
kw = {}

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

../../build/gradleenv/1398941893/lib/python3.9/site-packages/parameterized/parameterized.py:620: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
apache_beam/io/external/xlang_jdbcio_it_test.py:268: in test_xlang_jdbc_write_read
    assert_that(result, equal_to(expected_rows))
apache_beam/pipeline.py:661: in __exit__
    self.result = self.run()
apache_beam/testing/test_pipeline.py:115: in run
    result = super().run(
apache_beam/pipeline.py:635: in run
    return self.runner.run_pipeline(self, self._options)
apache_beam/runners/direct/test_direct_runner.py:42: in run_pipeline
    self.result = super().run_pipeline(pipeline, options)
apache_beam/runners/direct/direct_runner.py:185: in run_pipeline
    return runner.run_pipeline(pipeline, options)
apache_beam/runners/portability/fn_api_runner/fn_runner.py:196: in run_pipeline
    self._latest_run_result = self.run_via_runner_api(
apache_beam/runners/portability/fn_api_runner/fn_runner.py:223: in run_via_runner_api
    return self.run_stages(stage_context, stages)
apache_beam/runners/portability/fn_api_runner/fn_runner.py:470: in run_stages
    bundle_results = self._execute_bundle(
apache_beam/runners/portability/fn_api_runner/fn_runner.py:795: in _execute_bundle
    self._run_bundle(
apache_beam/runners/portability/fn_api_runner/fn_runner.py:1034: in _run_bundle
    result, splits = bundle_manager.process_bundle(
apache_beam/runners/portability/fn_api_runner/fn_runner.py:1360: in process_bundle
    result_future = self._worker_handler.control_conn.push(process_bundle_req)
apache_beam/runners/portability/fn_api_runner/worker_handlers.py:389: in push
    response = self.worker.do_instruction(request)
apache_beam/runners/worker/sdk_worker.py:657: in do_instruction
    return getattr(self, request_type)(
apache_beam/runners/worker/sdk_worker.py:695: in process_bundle
    bundle_processor.process_bundle(instruction_id))
apache_beam/runners/worker/bundle_processor.py:1274: in process_bundle
    input_op_by_transform_id[element.transform_id].process_encoded(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <apache_beam.runners.worker.bundle_processor.DataInputOperation object at 0x7917f5f45d60>
encoded_windowed_values = b'\x7f\xdf;dZ\x1c\xac\t\x00\x00\x00\x01\x0fQ\x0b\x00\xfd\xff\xff\xff\x0f\xfd\xff\xff\xff\xff\xff\xff\xff\xff\x01?\xb9\...\x05Test9\x05Test9\x05Test9\x80\x00\x01\x97{"\xe0C\x02\x02\x037\x80\x00\x00@t\x17p\x00\x80\x00\x00\x00\x01\xf6\xc3\x11'

    def process_encoded(self, encoded_windowed_values: bytes) -> None:
      input_stream = coder_impl.create_InputStream(encoded_windowed_values)
      while input_stream.size() > 0:
        with self.splitting_lock:
          if self.index == self.stop - 1:
            return
          self.index += 1
        try:
          decoded_value = self.windowed_coder_impl.decode_from_stream(
              input_stream, True)
        except Exception as exn:
>         raise ValueError(
              "Error decoding input stream with coder " +
              str(self.windowed_coder)) from exn
E         ValueError: Error decoding input stream with coder WindowedValueCoder[window_coder=GlobalWindowCoder, value_coder=RowCoder]

apache_beam/runners/worker/bundle_processor.py:234: ValueError


# Register MillisInstant logical type to override the mapping from Timestamp
# originally handled by MicrosInstant.
LogicalType.register_logical_type(MillisInstant)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add this back?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, removing that line is the main purpose of this PR. Let's revert the PR for now and I will take a look at that failed test later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a PR that reverts this one and #35191, which also contributed to the failures of PostCommit Python XLang GCP Direct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I understand it is your fix avoids the need for using this to workaround to work withdate (beam:logical_type:javasdk_date:v1) and time (beam:logical_type:javasdk_time:v1) types.

For jdbc timestamp types, java encodes as beam:logical_type:millis_instant:v1, which python interprets as a language type Timestamp, but MicrosInstant is registered to handle timestamps?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the two logical types MillisInstant (

class MillisInstant(NoArgumentLogicalType[Timestamp, np.int64]):
) and MicrosInstant (
class MicrosInstant(NoArgumentLogicalType[Timestamp,
) both register Timestamp as their language type and MillisInstant goes first and then MicrosIntant goes second in the file.

There is a map internally to map a language type to its logical type and it is used when it tries to determine the coder. Without my fix and the workaround, the internal map is going to map Timestamp to MicroInstant.

Then the whole code path when determining the coder of MillisInstant will be MillisInstant -> TimeStamp -> Logical Type of TimeStamp -> MicroInstant and its representation type.

The workaround is to calling the register function again for MillisInstant, so that it overwrite the internal mapping to Timestamp -> Millistant.

My fix was trying to eliminate the need of the workaround.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we read from jdbc, we get a schema with a millis_instant urn e.g.

fields {
  name: "created_at"
  type {
    nullable: true
    logical_type {
      urn: "beam:logical_type:millis_instant:v1"
      representation {
        atomic_type: INT64
      }
    }
  }
  id: 2
  encoding_position: 2
}

Then in named_tuple_from_schema we try map the schema fields to python types

return LogicalType.from_runner_api(
.

[('created_at', typing.Optional[apache_beam.typehints.schemas.Timestamp])]

The coder for this tuple will use MicrosInstant because that was registered last for Timestamp.

I am testing a fix here to use a stub type so that
beam:logical_type:millis_instant:v1 -> Stub Type -> MillisInstant -> Timestamp
#35400

akashorabek added a commit to akvelon/beam that referenced this pull request Jun 23, 2025
shunping added a commit that referenced this pull request Jul 30, 2025
* Add support for streaming writes in IOBase (Python)

* add triggering_frequency in iobase.Sink

* fix whitespaces/newlines

* fixes per #35137 (review)

* implement pre_finalize_windowed and finalize_windowed_write in FileBasedSink
handle file naming based on windowing for streaming in internal methods

TextIO - expose the new filebasedsink capabilities to write in streaming

* update changes.md

* fix formatting

* space formatting

* space nagging

* keep in sync with refactor in #35137

* keep in sync with iobase changes in #35137

* [Website] add akvelon case study (#34943)

* feat: add akvelon logo

* feat: add akvelon case study

* fix: remove white space

* feat: add akvelon to main page

* feat: use new images

* fix: typos

* fix: change order of akvelon case-study

* fix: update text

* fix: update mainPage text

* fix: update images

* fix: about akvelon section update

* fix: update akvelon card

* fix: update akvelon header

* fix: update code tag

* fix: update about akvelon

* fix: update date and order

* fix: add link and change img

* fix: change CDAP text

* fix: add bold weight

* fix: solve conflicts

* fix: remove unused code

* fix: delete whitespace

* fix: indents format

* fix: add bold text

---------

Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>

* added the rate test for GenerateSequence (#35108)

* added the rate test for GenerateSequence

* keep the master yaml

* Re-enable logging after importing vllm. (#35103)

Co-authored-by: Claude <cvandermerwe@google.com>

* Deprecate Java 8 (#35064)

* Deprecate Java 8

* Java 8 client now using Java 11 SDK container

* adjust non LTS fallback to use the next LTS instead of the nearest LTS. Previously Java18 falls back to Java17, which won't work

* Emit warning when Java 8 is used. Java8 is still
  supported until Beam 3.0

* Clean up subproject build file requiring Java9+

* Require java 11 to build SDK container

* fix workflow

* Simplify XVR test workflow

* Fix Samza PVR

* Remove beam college banners (#35123)

* feat: change text (#35130)

Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>

* Update Custom Remote Inference example to use RemoteModelHandler (#35066)

* Update Custom Remote Inference example to use RemoteModelHandler

* restore old kernel metadata

* Update examples/notebooks/beam-ml/custom_remote_inference.ipynb

Co-authored-by: Danny McCormick <dannymccormick@google.com>

* Add DLQ

---------

Co-authored-by: Danny McCormick <dannymccormick@google.com>

* Remove Java 8 container (#35125)

* add extra_transforms block documentation to chain transform documentation (#35101)

* add note about testing (#35075)

* fix whitespaces/newlines

* fixes per #35137 (review)

* implement pre_finalize_windowed and finalize_windowed_write in FileBasedSink
handle file naming based on windowing for streaming in internal methods

TextIO - expose the new filebasedsink capabilities to write in streaming

* update changes.md

* fix formatting

* space formatting

* space nagging

* keep in sync with refactor in #35137

* keep in sync with iobase changes in #35137

* [Website] update akvelon case study: update text and fix  landing page (#35133)

* feat: change text

* fix: add learn more to quotes Akvelon

---------

Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>

* Fix PostCommit Python Xlang IO Dataflow job (#35131)

* Add support for iterable type

* Fix formatting

* Bump google.golang.org/grpc from 1.72.0 to 1.72.2 in /sdks (#35113)

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.72.0 to 1.72.2.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.72.0...v1.72.2)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.72.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump cloud.google.com/go/bigquery from 1.67.0 to 1.69.0 in /sdks (#35061)

Bumps [cloud.google.com/go/bigquery](https://github.com/googleapis/google-cloud-go) from 1.67.0 to 1.69.0.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@spanner/v1.67.0...spanner/v1.69.0)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/bigquery
  dependency-version: 1.69.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add known issues. (#35138)

* Add known issues.

* Add fixes notes.

---------

Co-authored-by: Claude <cvandermerwe@google.com>

* Bump @octokit/plugin-paginate-rest and @octokit/rest (#34167)

Bumps [@octokit/plugin-paginate-rest](https://github.com/octokit/plugin-paginate-rest.js) to 11.4.3 and updates ancestor dependency [@octokit/rest](https://github.com/octokit/rest.js). These dependencies need to be updated together.


Updates `@octokit/plugin-paginate-rest` from 2.17.0 to 11.4.3
- [Release notes](https://github.com/octokit/plugin-paginate-rest.js/releases)
- [Commits](octokit/plugin-paginate-rest.js@v2.17.0...v11.4.3)

Updates `@octokit/rest` from 18.12.0 to 21.1.1
- [Release notes](https://github.com/octokit/rest.js/releases)
- [Commits](octokit/rest.js@v18.12.0...v21.1.1)

---
updated-dependencies:
- dependency-name: "@octokit/plugin-paginate-rest"
  dependency-type: indirect
- dependency-name: "@octokit/rest"
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Explicitly handle singleton iterators instead of using helper and catching exceptions which may be from generating iterable (#35124)

* Build last snapshot against RC00 tag instead of release branch (#35142)

The last snapshot Beam Java SDK (aka "RC00" release) build is triggered manually, to verify a RC1 build will be successful.

It has been built against release-2.xx branch, where the Dataflow container tag replaced from beam-master to the 2.xx.0

However, the versioned containers are not yet released, causing a timing gap that Beam 2.xx.0-SNAPSHOT won't work on Dataflow between release branch cut and RC1 rolled out to Dataflow

Since we now have a v2.xx.0-RC00 tag, build RC00 against this tag resolves the issue.

* Bump nodemailer from 6.7.5 to 6.9.9 in /scripts/ci/issue-report (#35143)

Bumps [nodemailer](https://github.com/nodemailer/nodemailer) from 6.7.5 to 6.9.9.
- [Release notes](https://github.com/nodemailer/nodemailer/releases)
- [Changelog](https://github.com/nodemailer/nodemailer/blob/master/CHANGELOG.md)
- [Commits](nodemailer/nodemailer@v6.7.5...v6.9.9)

---
updated-dependencies:
- dependency-name: nodemailer
  dependency-version: 6.9.9
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix tests affected by Java 8 container turn down (#35145)

* Fix tests affected by Java 8 container turn down

* still use Java 8 for Samza runner

* Bump github.com/aws/aws-sdk-go-v2/service/s3 in /sdks (#35146)

Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.79.3 to 1.80.0.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json)
- [Commits](aws/aws-sdk-go-v2@service/s3/v1.79.3...service/s3/v1.80.0)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/s3
  dependency-version: 1.80.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix jdbc transform validation (#35141)

* fix jdbc transform  validation

* add test

* annotations

* spotless

* Fix Java Example ARM PostCommit

* fix: add missed word (#35163)

Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>

* Add postcommit yaml xlang workflow and split tests accordingly  (#35119)

* move precommit yaml xlang to postcommit

* add json file to trigger path

* update pull request targets

* add readme workflow changes

* add cloud sdk setup

* switch out to ubuntu-latest

* add back precommit workflow

* switch names and add postcommit

* switch out to postCommitYamlIT

* add test_files_dir flag

* add conftest.py file for capturing directory flag

* shift yaml tests around to appropriate locations

* add back precommit to readme

* add license for conftest.py

* revert precommit to previous name

* remove github.event.comment.body trigger

* Replace usages of deprecated pkg_resources package (#35153)

* Remove usages of deprecated pkg_resources package

* use stdlib importlib.resources

* remove extra comma

* linting

* import order

* Improve error message when accidentally using PBegin/Pipeline (#35156)

* Create test

* Implement new error message

* Add beam.Create into unit test pipeline

* add friendly error message for when transform is applied to no output (#35160)

* add friendly error message for when transform is applied to no output

* update test name

* Fix pubsub unit tests that depend on old behavior

* Add warning if temp location bucket has soft delete enabled for Go SD… (#34996)

* Add warning if temp location bucket has soft delete enabled for Go SDK (resolves #31606)

* Corrected Formatting

* Applied suggested changes

* Constrain DequeCoder type correctly, as it does not support nulls

The DequeCoder uses ArrayDeque internally, which disallows null elements.
We could switch Deque implementations, but this change is better. To quote the
JDK docs: "While Deque implementations are not strictly required to prohibit
the insertion of null elements, they are strongly encouraged to do so. Users of
any Deque implementations that do allow null elements are strongly encouraged
not to take advantage of the ability to insert nulls. This is so because null
is used as a special return value by various methods to indicated that the
deque is empty."

* Do not overwrite class states if a cached dynamic class is returned in cloudpickle.load (#35063)

* Fix class states overwritten after cloudpickle.load

* Fix further

* Fix lint

* Make SDK harness change effective on Iceberg Dataflow test (#35173)

* Fix beam_PostCommit_Java_Examples_Dataflow_V2 (#35172)

* [YAML]: Update postgres IT test and readme (#35169)

* update postgres test without driver_class_name

* update readme on how to run integration tests

* fix misspelling

* fix trailing whitespace

* Bump Java beam-master container (#35170)

* Make WindowedValue a public interface

The following mostly-automated changes:

 - Moved WindowedValue from util to values package
 - Make WindowedValue an interface with companion class WindowedValues

* Run integration tests for moving WindowedValue and making public

* Add timer tests to make sure event-time timer firing at the right time. (#35109)

* Add timer tests to make sure event-time timer firing at the right time.

* Add more tests.

* Disable the failed event-time timer tests for FnApiRunner.

* Fix lints and reformat.

* Disable another new test in FnApiRunnerTest and PortableRunnerTest due to flakiness.

* Disable a new test in FlinkRunnerTest

* Take out the early firing test case because it depends on bundle size.

* change to ubuntu-20.04 (#35182)

* Fix IntelliJ sync project failure due to circular Beam dependency (#35167)

* Fix IntelliJ sync project failure due to circular Beam dependency

* address comments

* Update workflows categories (#35162)

* Add cloudpickle coder. (#35166)

* Move examples from sql package (#35183)

* Fix the beam interactive install problem when on Google Colab (#35148)

* Fix Google Colab Issue

* Update CHANGES.md

* Bump github.com/docker/docker in /sdks (#35112)

Bumps [github.com/docker/docker](https://github.com/docker/docker) from 28.0.4+incompatible to 28.2.2+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](moby/moby@v28.0.4...v28.2.2)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-version: 28.2.2+incompatible
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/nats-io/nats-server/v2 from 2.11.3 to 2.11.4 in /sdks (#35161)

Bumps [github.com/nats-io/nats-server/v2](https://github.com/nats-io/nats-server) from 2.11.3 to 2.11.4.
- [Release notes](https://github.com/nats-io/nats-server/releases)
- [Changelog](https://github.com/nats-io/nats-server/blob/main/.goreleaser.yml)
- [Commits](nats-io/nats-server@v2.11.3...v2.11.4)

---
updated-dependencies:
- dependency-name: github.com/nats-io/nats-server/v2
  dependency-version: 2.11.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Touch JPMS test trigger file

* Use staged SDK harness & Dataflow worker jar in JPMS tests

* Bump cloud.google.com/go/storage from 1.52.0 to 1.55.0 in /sdks (#35114)

Bumps [cloud.google.com/go/storage](https://github.com/googleapis/google-cloud-go) from 1.52.0 to 1.55.0.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@spanner/v1.52.0...spanner/v1.55.0)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/storage
  dependency-version: 1.55.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/nats-io/nats.go from 1.42.0 to 1.43.0 in /sdks (#35147)

Bumps [github.com/nats-io/nats.go](https://github.com/nats-io/nats.go) from 1.42.0 to 1.43.0.
- [Release notes](https://github.com/nats-io/nats.go/releases)
- [Commits](nats-io/nats.go@v1.42.0...v1.43.0)

---
updated-dependencies:
- dependency-name: github.com/nats-io/nats.go
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Move non-dill specific code out of dill_pickler.py (#35139)

* Create consistent_pickle.py

* Update dill_pickler.py

* Update consistent_pickle.py

* Add Apache license to consistent_pickle.py

* Update dill_pickler.py

* Update and rename consistent_pickle.py to code_object_pickler.py

* Update dill_pickler.py to change consistent_pickle to code_object_pickler

* Fix formatting issue in code_object_pickler.py

* Fix formatting issue in dill_pickler.py

* Remove apache license from code_object_pickler.py

* Add apache license to code_object_pickler.py

* Fix some lints introduced in a recent PR. (#35193)

* small filesystem fixes (#34956)

* Fix typehint for get_filesystem

* Fix bug with creating files in cwd with LocalFileSystem

* lint

* revert precommit config change

* isort

* add file to start PR builder

* Revert "add file to start PR builder"

This reverts commit f80b361.

* fix isort again

* [YAML] Add a spec provider for transforms taking specifiable arguments (#35187)

* Add a test provider for specifiable and try it on AnomalyDetection.

Also add support on callable in spec.

* Minor renaming

* Fix lints.

* Touch trigger files to test WindowedValueReceiver in runners

* Introduce WindowedValue receivers and consolidate runner code to them

We need a receiver for WindowedValue to create an OutputBuilder. This change
introduces the interface and uses it in many places where it is appropriate,
replacing and simplifying internal runner code.

* Eliminate nullness errors from ByteBuddyDoFnInvokerFactory and DoFnOutputReceivers

* Fix null check when fetching driverJars from value provider

* Fix PostCommit Python ValidatesRunner Samza / Spark jobs (#35210)

* Skip Samza and Spark runner tests

* Fix formatting

* Update pypi documentation 30145 (#34329)

* Add Pypi friendly documentation

* provided full link path

* Add more YAML examples involving Kafka and Iceberg (#35151)

* Add more YAML examples involving Kafka and Iceberg

* Fix some missed details from rebasing

* Adding unit tests for YAML examples

* Clean up and address PR comments

* Formatting

* Formatting

* Evaluate namedTuples as equivalent to rows (#35218)

* Evaluate namedTuples as equivalent to rows

* lint

* Add a new experiment flag to enable real-time clock as processing time. (#35202)

* Touch trigger files for lightweight runners

* Eliminate WindowedValue.getPane() in preparation for making it a user-facing interface

* Do not fail if there were failed deletes (#35222)

* Force FnApiRunner in cases where prism can't handle use case (#35219)

* Force FnApiRunner in cases where prism can't handle use case

* yapf

* Bump golang.org/x/net from 0.40.0 to 0.41.0 in /sdks (#35206)

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.40.0 to 0.41.0.
- [Commits](golang/net@v0.40.0...v0.41.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-version: 0.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix incorrect typehints generated by FlatMap with default identity function (#35164)

* create unit test

* add some unit tests

* fix bug where T is considered iterable

* update strip_iterable to return Any for "stripped iterable" type of TypeVariable

* remove typehint from identity function and add a test to test for proper typechecking

* Move callablewrapp typehint test

* Remove print

* isort

* isort

* return any for yielded type of T

* Parse values returned from Dataflow API to BoundedTrieData (#34738)

* Parse struct returned from Dataflow API to BoundedTrieData

fix checkstyle

Use getBoundedTrie

add debug log

adapt to ArrayMap

* Add test, clean up

* Fix test: setTrie -> setBoundedTrie

* Fix comments

* Remove breaking PDone change (#35224)

* Remove breaking PDone change

* Update pipeline.py

* Update pipeline.py

* Update pipeline_test.py

* Move none check

* yapf

* fmt

* Generic Postgres + Cloudsql postgres embeddings. (#35215)

* Add base Postgres vector writer, CloudSQL vector writer and refactor.

* Trigger tests.

* Linter fixes.

* Fix test

* Add back tests. Update changes.md. Fix unrelated lint.

* Drop test tables.

* Fix test

* Fix tests.

* Fix test.

* Fix tests.

---------

Co-authored-by: Claude <cvandermerwe@google.com>

* Allow only one thread at a time to start the VLLM server. (#35234)

* [IcebergIO] Create namespaces if needed (#35228)

* create namespaces dynamically

* cleanup namespaces in ITs

* optimization

* Support configuring flush_count and max_row_bytes of WriteToBigTable (#34761)

* Update CHANGES.md (#35242)

Highlight improvements to the vllm model handler.

* [Beam SQL] Implement Catalog and CatalogManager (#35223)

* beam sql catalog

* api adjustment

* cover more naming syntax (quotes, backticks, none)

* spotless

* fix

* add documentation and cleanup

* rename to dropCatalog; mark BeamSqlCli @internal; rename to EmptyCatalogManager

* use registrars instead; remove initialize() method from Catalog

* cleanupo

* [IcebergIO] Fix conversion logic for arrays of structs and maps of structs; fix output Schema resolution with column pruning (#35230)

* fix complex types; filx filter schema logic

* update Changes

* add missing changes from other PRs

* trigger ITs

* make separate impl for iterable

* fix

* fix long_description when the md file cannot be found (#35246)

* fix long_description when the md file cannot be found

* yapf

* [IcebergIO] Create tables with a partition spec (#34966)

* create table with partition spec

* add description and regenerate config doc; trigger tests

* spotless

* log partition spec

* fixes

* Fix typo java-dependencies.md (#35251)

* Adding project and database support in write transform for firestoreIO (#35017)

* Adding project and database support in write transform for firestoreIO

* Spotless Apply

* Resolving comments

* Removing useless comments

* public to private

* Spotless apply

* Fix a logical type issue about JdbcDateType and JdbcTimeType (#35243)

* Fix a logical type issue about JdbcDateType

* Fix typo and also fix the logical class for java time.

* Get rid of the workaround on logical type registration. Trigger tests.

* Fix lints.

* Remove internal code. (#35239)

Co-authored-by: Claude <cvandermerwe@google.com>

* enable setting max_writer_per_bundle for avroIO and other IO (#35092)

* enable setting max_writer_per_bundle for avroIO and other IO

* enable setting max_writer_per_bundle for avroIO and other IO

* enable for TFRecordIO and corrections

* Updated standard_external_transforms.yaml

* [Java FnApi] Fix race in BeamFnStatusClient by initializing all fields before starting rpc. (#35252)

* [Java FnApi] Fix race in BeamFnStatusClient by initializing all fields before starting rpc.

* spotless

* [IcebergIO] Integrate with Beam SQL (#34799)

* add Iceberg table provider and tests

* properties go in the tableprovider initialization

* tobuilder

* streaming integration test

* spotless

* extend test to include multi nested types; fix iceberg <-> conversion logic

* add to changes.md

* spotless

* fix tests

* clean

* update CHANGES

* add projection pushdown and column pruning

* spotless

* fixes

* fixes

* updates

* sync with HEAD and use new Catalog implementation

* mark new interfaces @internal

* spotless

* fix unparse method

* update changes (#35256)

* [YAML]: fix postcommit oracle bug and reorganize postcommit tests (#35191)

* fix postcommit oracle test

* add revision

* switch to hosted runner to try with kafka test

* add extended timeout

* upgrade to 4.10 testcontainers

* switch out to redpanda for kafka

* remove redpandacontainer

* tmp comment

* add postgres comment

* revert to old kafkaContainer

* revert commented out code and revert testcontainer version change

* change mysql image version

* revert image change

* revert image change again :)

* add comments to mysql again

* shift post commit files to different folders

* rename to Data version

* add databases version

* add messaging version

* update readme for three post commits

* update gradle with new post commits

* fix job names

* uncomment fixture on mysql

* switch back to one workflow and update readme as such

* remove old workflow files

* update order and remove comment

* update gradle with parameterized options

* update gradle command calls to correct location

* update workflow to three jobs explicit

* add back Bigquery deselect

* fix mysql teardown error

* Simplify down to one from three explicit jobs

* remove tab

* remove Data

* fix parsing parameters

* update hadoop prefix (#35257)

* Set go version 1.24.4 (#35261)

* Set go version 1.24.4

* Set go version 1.24.4

* upgrade org.apache.parquet:parquet-avro to to 1.15.2 (#35037)

* upgrade org.apache.parquet:parquet-avro to to 1.15.2

* keep iceberg to 1.6.1

* use 1.9.0 for iceberg

* run all iceberg tests

* require Java 11

* fixed java11 for iceberg

* switched back to iceberg_version = "1.6.1"

* merged master

---------

Co-authored-by: Vitaly Terentyev <vitaly.terentyev.akv@gmail.com>

* Bump google.golang.org/grpc from 1.72.2 to 1.73.0 in /sdks (#35236)

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.72.2 to 1.73.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.72.2...v1.73.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.73.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add more doc strings for integration tests (#35171)

* [IcebergIO] extend table partitioning to Beam SQL (#35268)

* add sql partitioning

* trigger ITs and add to CHANGES

* simplify test

* include emoty namespace fix

* add error when using partitioning for unsupported tables

* fix test

* Moving to 2.67.0-SNAPSHOT on master branch.

* Update CHANGES.md to have fields for 2.67.0 release

* add free disk space step (#35260)

* [yaml]: Fix post commit yaml once more (#35273)

* switch back to self hosted runner and comment out most of kafka test for now

* add git issue to track

* revision

* remove free space step - seems to be causing more issues than helping

* Polish anomaly detection notebook and get ready to be imported in public devsite. (#35278)

* Polish anomaly detection zscore notebook for public doc.

* Adjust formatting.

* Adjust formatting.

* Suppress Findbugs (#35274)

* Support Java 17+ compiled Beam components for build, release, and xlang tests (#35232)

* Install JDK 21 in release build

Support Beam components requiring Java17+ for release workflows
They will be compiled with JDK21 with byte code compatibility
configured by applyJavaNature(requireJavaVersion)

* Use Java 21 for Python PostCommit

* Honor JAVA_HOME in JavaJarServer checks and tests

* Disable Debezium test on Java17-

* add example line

* [IO] Update Debezium in DebeziumIO to 3.1.1 (#34763)

* Updating Debezium IO to 3.1.1
Enforce JDK17 in build.gradle of Debezium IO

* adjust to review comments by @Abacn

* cleanup

* mention which PR needs to merge before unpinning

* Mention Beam version 2.66 in the README for Debezium

---------

Co-authored-by: Shunping Huang <shunping@google.com>

* Document BQ Storage API pipeline options (#35259)

* Document BQ Storage API pipeline options

* Fix whitespace

* Fix whitespace

* Speed up StreamingDataflowWorkerTest by removing 10 second wait from shutdown path. (#35275)

Takes test time from 5m18s to 2m40s.

The previous shutdown() call doesn't do anything since that just means
future scheduling won't trigger but we only schedule on the executor
once.

Also cleanup test logs by making sure to stop all workers we start so they
don't continue to run in the background and log.

This shutdown paths is only used in testing.

* Bump org.ajoberstar.grgit:grgit-gradle from 4.1.1 to 5.3.2 (#35301)

Bumps [org.ajoberstar.grgit:grgit-gradle](https://github.com/ajoberstar/grgit) from 4.1.1 to 5.3.2.
- [Release notes](https://github.com/ajoberstar/grgit/releases)
- [Commits](ajoberstar/grgit@4.1.1...5.3.2)

---
updated-dependencies:
- dependency-name: org.ajoberstar.grgit:grgit-gradle
  dependency-version: 5.3.2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump google.golang.org/api from 0.235.0 to 0.237.0 in /sdks (#35302)

Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.235.0 to 0.237.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md)
- [Commits](googleapis/google-api-go-client@v0.235.0...v0.237.0)

---
updated-dependencies:
- dependency-name: google.golang.org/api
  dependency-version: 0.237.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/linkedin/goavro/v2 from 2.13.1 to 2.14.0 in /sdks (#35205)

Bumps [github.com/linkedin/goavro/v2](https://github.com/linkedin/goavro) from 2.13.1 to 2.14.0.
- [Release notes](https://github.com/linkedin/goavro/releases)
- [Changelog](https://github.com/linkedin/goavro/blob/master/debug_release.go)
- [Commits](linkedin/goavro@v2.13.1...v2.14.0)

---
updated-dependencies:
- dependency-name: github.com/linkedin/goavro/v2
  dependency-version: 2.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update spotbugs version, fix runner ubuntu version, fix found spotbugs issues (#35303)

* feat: Add option to control resource cleanup failure for IT (#35287)

* Revert "Bump org.ajoberstar.grgit:grgit-gradle from 4.1.1 to 5.3.2 (#35301)" (#35305)

This reverts commit 0fd77ce.

* Add PeriodicStream in the new time series folder. (#35300)

* Add PeriodicStream in the new time series folder.

* Add some more docstrings and minor fix on test name.

* Fix lints and docs.

* try buildah to replace kaniko (#35289)

* try buildah to replace kaniko

* trigger post-commit

* Adding error handler for SpannerReadSchemaTransformProvider and missi… (#35241)

* Adding error handler for SpannerReadSchemaTransformProvider and missing tests for SpannerSchemaTransformProvider

* Removed not used logging

* Spotless Apply

* Spotless Apply

* Spotless Apply

* Typo correction

* requests vulnerability. (#35308)

* [IcebergIO] create custom java container image for tests (#35307)

* create custom java container image for tests

* syntax

* eval depends on df

* add streaming support to iobase (python) (#35137)

* Add support for streaming writes in IOBase (Python)

* add triggering_frequency in iobase.Sink

* fix whitespaces/newlines

* fixes per #35137 (review)

* refactor for code redability

* refactor _expand_unbounded , default num_shards to 1 , if undef or 0

* fix formatter

* space

* keep num_shards = 0 the same as before for bounded write

* reafactor and code clean

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: bullet03 <bulatkazan@yahoo.com>
Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>
Co-authored-by: liferoad <huxiangqian@gmail.com>
Co-authored-by: claudevdm <33973061+claudevdm@users.noreply.github.com>
Co-authored-by: Claude <cvandermerwe@google.com>
Co-authored-by: Yi Hu <yathu@google.com>
Co-authored-by: Danny McCormick <dannymccormick@google.com>
Co-authored-by: Jack McCluskey <34928439+jrmccluskey@users.noreply.github.com>
Co-authored-by: Derrick Williams <derrickaw@google.com>
Co-authored-by: Vitaly Terentyev <vitaly.terentyev.akv@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: scwhittle <scwhittle@users.noreply.github.com>
Co-authored-by: Radosław Stankiewicz <radoslaws@google.com>
Co-authored-by: Hai Joey Tran <joey.tran@schrodinger.com>
Co-authored-by: Tanu Sharma <53229637+TanuSharma2511@users.noreply.github.com>
Co-authored-by: Kenneth Knowles <klk@google.com>
Co-authored-by: Minbo Bae <49642083+baeminbo@users.noreply.github.com>
Co-authored-by: Shunping Huang <shunping@google.com>
Co-authored-by: Chenzo <120361592+Chenzo1001@users.noreply.github.com>
Co-authored-by: kristynsmith <kristynsmith@google.com>
Co-authored-by: Rakesh Kumar <rakeshcusat@gmail.com>
Co-authored-by: Charles Nguyen <phucnh402@gmail.com>
Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>
Co-authored-by: Ahmed Abualsaud <65791736+ahmedabu98@users.noreply.github.com>
Co-authored-by: Minh Son Nguyen <minh.son.nguyen.1209@gmail.com>
Co-authored-by: Amar3tto <actions@GitHub Actions 1000279405.local>
Co-authored-by: Vitaly Terentyev <vitaly.terentyev@akvelon.com>
Co-authored-by: Tobias Kaymak <tobias.kaymak@gmail.com>
Co-authored-by: Veronica Wasson <3992422+VeronicaWasson@users.noreply.github.com>
shunping added a commit that referenced this pull request Aug 7, 2025
* Add support for streaming writes in IOBase (Python)

* add triggering_frequency in iobase.Sink

* fix whitespaces/newlines

* fixes per #35137 (review)

* implement pre_finalize_windowed and finalize_windowed_write in FileBasedSink
handle file naming based on windowing for streaming in internal methods

TextIO - expose the new filebasedsink capabilities to write in streaming

* update changes.md

* fix formatting

* space formatting

* space nagging

* keep in sync with refactor in #35137

* keep in sync with iobase changes in #35137

* [Website] add akvelon case study (#34943)

* feat: add akvelon logo

* feat: add akvelon case study

* fix: remove white space

* feat: add akvelon to main page

* feat: use new images

* fix: typos

* fix: change order of akvelon case-study

* fix: update text

* fix: update mainPage text

* fix: update images

* fix: about akvelon section update

* fix: update akvelon card

* fix: update akvelon header

* fix: update code tag

* fix: update about akvelon

* fix: update date and order

* fix: add link and change img

* fix: change CDAP text

* fix: add bold weight

* fix: solve conflicts

* fix: remove unused code

* fix: delete whitespace

* fix: indents format

* fix: add bold text

---------

Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>

* added the rate test for GenerateSequence (#35108)

* added the rate test for GenerateSequence

* keep the master yaml

* Re-enable logging after importing vllm. (#35103)

Co-authored-by: Claude <cvandermerwe@google.com>

* Deprecate Java 8 (#35064)

* Deprecate Java 8

* Java 8 client now using Java 11 SDK container

* adjust non LTS fallback to use the next LTS instead of the nearest LTS. Previously Java18 falls back to Java17, which won't work

* Emit warning when Java 8 is used. Java8 is still
  supported until Beam 3.0

* Clean up subproject build file requiring Java9+

* Require java 11 to build SDK container

* fix workflow

* Simplify XVR test workflow

* Fix Samza PVR

* Remove beam college banners (#35123)

* feat: change text (#35130)

Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>

* Update Custom Remote Inference example to use RemoteModelHandler (#35066)

* Update Custom Remote Inference example to use RemoteModelHandler

* restore old kernel metadata

* Update examples/notebooks/beam-ml/custom_remote_inference.ipynb

Co-authored-by: Danny McCormick <dannymccormick@google.com>

* Add DLQ

---------

Co-authored-by: Danny McCormick <dannymccormick@google.com>

* Remove Java 8 container (#35125)

* add extra_transforms block documentation to chain transform documentation (#35101)

* add note about testing (#35075)

* fix whitespaces/newlines

* fixes per #35137 (review)

* implement pre_finalize_windowed and finalize_windowed_write in FileBasedSink
handle file naming based on windowing for streaming in internal methods

TextIO - expose the new filebasedsink capabilities to write in streaming

* update changes.md

* fix formatting

* space formatting

* space nagging

* keep in sync with refactor in #35137

* keep in sync with iobase changes in #35137

* [Website] update akvelon case study: update text and fix  landing page (#35133)

* feat: change text

* fix: add learn more to quotes Akvelon

---------

Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>

* Fix PostCommit Python Xlang IO Dataflow job (#35131)

* Add support for iterable type

* Fix formatting

* Bump google.golang.org/grpc from 1.72.0 to 1.72.2 in /sdks (#35113)

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.72.0 to 1.72.2.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.72.0...v1.72.2)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.72.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump cloud.google.com/go/bigquery from 1.67.0 to 1.69.0 in /sdks (#35061)

Bumps [cloud.google.com/go/bigquery](https://github.com/googleapis/google-cloud-go) from 1.67.0 to 1.69.0.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@spanner/v1.67.0...spanner/v1.69.0)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/bigquery
  dependency-version: 1.69.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add known issues. (#35138)

* Add known issues.

* Add fixes notes.

---------

Co-authored-by: Claude <cvandermerwe@google.com>

* Bump @octokit/plugin-paginate-rest and @octokit/rest (#34167)

Bumps [@octokit/plugin-paginate-rest](https://github.com/octokit/plugin-paginate-rest.js) to 11.4.3 and updates ancestor dependency [@octokit/rest](https://github.com/octokit/rest.js). These dependencies need to be updated together.


Updates `@octokit/plugin-paginate-rest` from 2.17.0 to 11.4.3
- [Release notes](https://github.com/octokit/plugin-paginate-rest.js/releases)
- [Commits](octokit/plugin-paginate-rest.js@v2.17.0...v11.4.3)

Updates `@octokit/rest` from 18.12.0 to 21.1.1
- [Release notes](https://github.com/octokit/rest.js/releases)
- [Commits](octokit/rest.js@v18.12.0...v21.1.1)

---
updated-dependencies:
- dependency-name: "@octokit/plugin-paginate-rest"
  dependency-type: indirect
- dependency-name: "@octokit/rest"
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Explicitly handle singleton iterators instead of using helper and catching exceptions which may be from generating iterable (#35124)

* Build last snapshot against RC00 tag instead of release branch (#35142)

The last snapshot Beam Java SDK (aka "RC00" release) build is triggered manually, to verify a RC1 build will be successful.

It has been built against release-2.xx branch, where the Dataflow container tag replaced from beam-master to the 2.xx.0

However, the versioned containers are not yet released, causing a timing gap that Beam 2.xx.0-SNAPSHOT won't work on Dataflow between release branch cut and RC1 rolled out to Dataflow

Since we now have a v2.xx.0-RC00 tag, build RC00 against this tag resolves the issue.

* Bump nodemailer from 6.7.5 to 6.9.9 in /scripts/ci/issue-report (#35143)

Bumps [nodemailer](https://github.com/nodemailer/nodemailer) from 6.7.5 to 6.9.9.
- [Release notes](https://github.com/nodemailer/nodemailer/releases)
- [Changelog](https://github.com/nodemailer/nodemailer/blob/master/CHANGELOG.md)
- [Commits](nodemailer/nodemailer@v6.7.5...v6.9.9)

---
updated-dependencies:
- dependency-name: nodemailer
  dependency-version: 6.9.9
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix tests affected by Java 8 container turn down (#35145)

* Fix tests affected by Java 8 container turn down

* still use Java 8 for Samza runner

* Bump github.com/aws/aws-sdk-go-v2/service/s3 in /sdks (#35146)

Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.79.3 to 1.80.0.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json)
- [Commits](aws/aws-sdk-go-v2@service/s3/v1.79.3...service/s3/v1.80.0)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/s3
  dependency-version: 1.80.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix jdbc transform validation (#35141)

* fix jdbc transform  validation

* add test

* annotations

* spotless

* Fix Java Example ARM PostCommit

* fix: add missed word (#35163)

Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>

* Add postcommit yaml xlang workflow and split tests accordingly  (#35119)

* move precommit yaml xlang to postcommit

* add json file to trigger path

* update pull request targets

* add readme workflow changes

* add cloud sdk setup

* switch out to ubuntu-latest

* add back precommit workflow

* switch names and add postcommit

* switch out to postCommitYamlIT

* add test_files_dir flag

* add conftest.py file for capturing directory flag

* shift yaml tests around to appropriate locations

* add back precommit to readme

* add license for conftest.py

* revert precommit to previous name

* remove github.event.comment.body trigger

* Replace usages of deprecated pkg_resources package (#35153)

* Remove usages of deprecated pkg_resources package

* use stdlib importlib.resources

* remove extra comma

* linting

* import order

* Improve error message when accidentally using PBegin/Pipeline (#35156)

* Create test

* Implement new error message

* Add beam.Create into unit test pipeline

* add friendly error message for when transform is applied to no output (#35160)

* add friendly error message for when transform is applied to no output

* update test name

* Fix pubsub unit tests that depend on old behavior

* Add warning if temp location bucket has soft delete enabled for Go SD… (#34996)

* Add warning if temp location bucket has soft delete enabled for Go SDK (resolves #31606)

* Corrected Formatting

* Applied suggested changes

* Constrain DequeCoder type correctly, as it does not support nulls

The DequeCoder uses ArrayDeque internally, which disallows null elements.
We could switch Deque implementations, but this change is better. To quote the
JDK docs: "While Deque implementations are not strictly required to prohibit
the insertion of null elements, they are strongly encouraged to do so. Users of
any Deque implementations that do allow null elements are strongly encouraged
not to take advantage of the ability to insert nulls. This is so because null
is used as a special return value by various methods to indicated that the
deque is empty."

* Do not overwrite class states if a cached dynamic class is returned in cloudpickle.load (#35063)

* Fix class states overwritten after cloudpickle.load

* Fix further

* Fix lint

* Make SDK harness change effective on Iceberg Dataflow test (#35173)

* Fix beam_PostCommit_Java_Examples_Dataflow_V2 (#35172)

* [YAML]: Update postgres IT test and readme (#35169)

* update postgres test without driver_class_name

* update readme on how to run integration tests

* fix misspelling

* fix trailing whitespace

* Bump Java beam-master container (#35170)

* Make WindowedValue a public interface

The following mostly-automated changes:

 - Moved WindowedValue from util to values package
 - Make WindowedValue an interface with companion class WindowedValues

* Run integration tests for moving WindowedValue and making public

* Add timer tests to make sure event-time timer firing at the right time. (#35109)

* Add timer tests to make sure event-time timer firing at the right time.

* Add more tests.

* Disable the failed event-time timer tests for FnApiRunner.

* Fix lints and reformat.

* Disable another new test in FnApiRunnerTest and PortableRunnerTest due to flakiness.

* Disable a new test in FlinkRunnerTest

* Take out the early firing test case because it depends on bundle size.

* change to ubuntu-20.04 (#35182)

* Fix IntelliJ sync project failure due to circular Beam dependency (#35167)

* Fix IntelliJ sync project failure due to circular Beam dependency

* address comments

* Update workflows categories (#35162)

* Add cloudpickle coder. (#35166)

* Move examples from sql package (#35183)

* Fix the beam interactive install problem when on Google Colab (#35148)

* Fix Google Colab Issue

* Update CHANGES.md

* Bump github.com/docker/docker in /sdks (#35112)

Bumps [github.com/docker/docker](https://github.com/docker/docker) from 28.0.4+incompatible to 28.2.2+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](moby/moby@v28.0.4...v28.2.2)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-version: 28.2.2+incompatible
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/nats-io/nats-server/v2 from 2.11.3 to 2.11.4 in /sdks (#35161)

Bumps [github.com/nats-io/nats-server/v2](https://github.com/nats-io/nats-server) from 2.11.3 to 2.11.4.
- [Release notes](https://github.com/nats-io/nats-server/releases)
- [Changelog](https://github.com/nats-io/nats-server/blob/main/.goreleaser.yml)
- [Commits](nats-io/nats-server@v2.11.3...v2.11.4)

---
updated-dependencies:
- dependency-name: github.com/nats-io/nats-server/v2
  dependency-version: 2.11.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Touch JPMS test trigger file

* Use staged SDK harness & Dataflow worker jar in JPMS tests

* Bump cloud.google.com/go/storage from 1.52.0 to 1.55.0 in /sdks (#35114)

Bumps [cloud.google.com/go/storage](https://github.com/googleapis/google-cloud-go) from 1.52.0 to 1.55.0.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@spanner/v1.52.0...spanner/v1.55.0)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/storage
  dependency-version: 1.55.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/nats-io/nats.go from 1.42.0 to 1.43.0 in /sdks (#35147)

Bumps [github.com/nats-io/nats.go](https://github.com/nats-io/nats.go) from 1.42.0 to 1.43.0.
- [Release notes](https://github.com/nats-io/nats.go/releases)
- [Commits](nats-io/nats.go@v1.42.0...v1.43.0)

---
updated-dependencies:
- dependency-name: github.com/nats-io/nats.go
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Move non-dill specific code out of dill_pickler.py (#35139)

* Create consistent_pickle.py

* Update dill_pickler.py

* Update consistent_pickle.py

* Add Apache license to consistent_pickle.py

* Update dill_pickler.py

* Update and rename consistent_pickle.py to code_object_pickler.py

* Update dill_pickler.py to change consistent_pickle to code_object_pickler

* Fix formatting issue in code_object_pickler.py

* Fix formatting issue in dill_pickler.py

* Remove apache license from code_object_pickler.py

* Add apache license to code_object_pickler.py

* Fix some lints introduced in a recent PR. (#35193)

* small filesystem fixes (#34956)

* Fix typehint for get_filesystem

* Fix bug with creating files in cwd with LocalFileSystem

* lint

* revert precommit config change

* isort

* add file to start PR builder

* Revert "add file to start PR builder"

This reverts commit f80b361.

* fix isort again

* [YAML] Add a spec provider for transforms taking specifiable arguments (#35187)

* Add a test provider for specifiable and try it on AnomalyDetection.

Also add support on callable in spec.

* Minor renaming

* Fix lints.

* Touch trigger files to test WindowedValueReceiver in runners

* Introduce WindowedValue receivers and consolidate runner code to them

We need a receiver for WindowedValue to create an OutputBuilder. This change
introduces the interface and uses it in many places where it is appropriate,
replacing and simplifying internal runner code.

* Eliminate nullness errors from ByteBuddyDoFnInvokerFactory and DoFnOutputReceivers

* Fix null check when fetching driverJars from value provider

* Fix PostCommit Python ValidatesRunner Samza / Spark jobs (#35210)

* Skip Samza and Spark runner tests

* Fix formatting

* Update pypi documentation 30145 (#34329)

* Add Pypi friendly documentation

* provided full link path

* Add more YAML examples involving Kafka and Iceberg (#35151)

* Add more YAML examples involving Kafka and Iceberg

* Fix some missed details from rebasing

* Adding unit tests for YAML examples

* Clean up and address PR comments

* Formatting

* Formatting

* Evaluate namedTuples as equivalent to rows (#35218)

* Evaluate namedTuples as equivalent to rows

* lint

* Add a new experiment flag to enable real-time clock as processing time. (#35202)

* Touch trigger files for lightweight runners

* Eliminate WindowedValue.getPane() in preparation for making it a user-facing interface

* Do not fail if there were failed deletes (#35222)

* Force FnApiRunner in cases where prism can't handle use case (#35219)

* Force FnApiRunner in cases where prism can't handle use case

* yapf

* Bump golang.org/x/net from 0.40.0 to 0.41.0 in /sdks (#35206)

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.40.0 to 0.41.0.
- [Commits](golang/net@v0.40.0...v0.41.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-version: 0.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix incorrect typehints generated by FlatMap with default identity function (#35164)

* create unit test

* add some unit tests

* fix bug where T is considered iterable

* update strip_iterable to return Any for "stripped iterable" type of TypeVariable

* remove typehint from identity function and add a test to test for proper typechecking

* Move callablewrapp typehint test

* Remove print

* isort

* isort

* return any for yielded type of T

* Parse values returned from Dataflow API to BoundedTrieData (#34738)

* Parse struct returned from Dataflow API to BoundedTrieData

fix checkstyle

Use getBoundedTrie

add debug log

adapt to ArrayMap

* Add test, clean up

* Fix test: setTrie -> setBoundedTrie

* Fix comments

* Remove breaking PDone change (#35224)

* Remove breaking PDone change

* Update pipeline.py

* Update pipeline.py

* Update pipeline_test.py

* Move none check

* yapf

* fmt

* Generic Postgres + Cloudsql postgres embeddings. (#35215)

* Add base Postgres vector writer, CloudSQL vector writer and refactor.

* Trigger tests.

* Linter fixes.

* Fix test

* Add back tests. Update changes.md. Fix unrelated lint.

* Drop test tables.

* Fix test

* Fix tests.

* Fix test.

* Fix tests.

---------

Co-authored-by: Claude <cvandermerwe@google.com>

* Allow only one thread at a time to start the VLLM server. (#35234)

* [IcebergIO] Create namespaces if needed (#35228)

* create namespaces dynamically

* cleanup namespaces in ITs

* optimization

* Support configuring flush_count and max_row_bytes of WriteToBigTable (#34761)

* Update CHANGES.md (#35242)

Highlight improvements to the vllm model handler.

* [Beam SQL] Implement Catalog and CatalogManager (#35223)

* beam sql catalog

* api adjustment

* cover more naming syntax (quotes, backticks, none)

* spotless

* fix

* add documentation and cleanup

* rename to dropCatalog; mark BeamSqlCli @internal; rename to EmptyCatalogManager

* use registrars instead; remove initialize() method from Catalog

* cleanupo

* [IcebergIO] Fix conversion logic for arrays of structs and maps of structs; fix output Schema resolution with column pruning (#35230)

* fix complex types; filx filter schema logic

* update Changes

* add missing changes from other PRs

* trigger ITs

* make separate impl for iterable

* fix

* fix long_description when the md file cannot be found (#35246)

* fix long_description when the md file cannot be found

* yapf

* [IcebergIO] Create tables with a partition spec (#34966)

* create table with partition spec

* add description and regenerate config doc; trigger tests

* spotless

* log partition spec

* fixes

* Fix typo java-dependencies.md (#35251)

* Adding project and database support in write transform for firestoreIO (#35017)

* Adding project and database support in write transform for firestoreIO

* Spotless Apply

* Resolving comments

* Removing useless comments

* public to private

* Spotless apply

* Fix a logical type issue about JdbcDateType and JdbcTimeType (#35243)

* Fix a logical type issue about JdbcDateType

* Fix typo and also fix the logical class for java time.

* Get rid of the workaround on logical type registration. Trigger tests.

* Fix lints.

* Remove internal code. (#35239)

Co-authored-by: Claude <cvandermerwe@google.com>

* enable setting max_writer_per_bundle for avroIO and other IO (#35092)

* enable setting max_writer_per_bundle for avroIO and other IO

* enable setting max_writer_per_bundle for avroIO and other IO

* enable for TFRecordIO and corrections

* Updated standard_external_transforms.yaml

* [Java FnApi] Fix race in BeamFnStatusClient by initializing all fields before starting rpc. (#35252)

* [Java FnApi] Fix race in BeamFnStatusClient by initializing all fields before starting rpc.

* spotless

* [IcebergIO] Integrate with Beam SQL (#34799)

* add Iceberg table provider and tests

* properties go in the tableprovider initialization

* tobuilder

* streaming integration test

* spotless

* extend test to include multi nested types; fix iceberg <-> conversion logic

* add to changes.md

* spotless

* fix tests

* clean

* update CHANGES

* add projection pushdown and column pruning

* spotless

* fixes

* fixes

* updates

* sync with HEAD and use new Catalog implementation

* mark new interfaces @internal

* spotless

* fix unparse method

* update changes (#35256)

* [YAML]: fix postcommit oracle bug and reorganize postcommit tests (#35191)

* fix postcommit oracle test

* add revision

* switch to hosted runner to try with kafka test

* add extended timeout

* upgrade to 4.10 testcontainers

* switch out to redpanda for kafka

* remove redpandacontainer

* tmp comment

* add postgres comment

* revert to old kafkaContainer

* revert commented out code and revert testcontainer version change

* change mysql image version

* revert image change

* revert image change again :)

* add comments to mysql again

* shift post commit files to different folders

* rename to Data version

* add databases version

* add messaging version

* update readme for three post commits

* update gradle with new post commits

* fix job names

* uncomment fixture on mysql

* switch back to one workflow and update readme as such

* remove old workflow files

* update order and remove comment

* update gradle with parameterized options

* update gradle command calls to correct location

* update workflow to three jobs explicit

* add back Bigquery deselect

* fix mysql teardown error

* Simplify down to one from three explicit jobs

* remove tab

* remove Data

* fix parsing parameters

* update hadoop prefix (#35257)

* Set go version 1.24.4 (#35261)

* Set go version 1.24.4

* Set go version 1.24.4

* upgrade org.apache.parquet:parquet-avro to to 1.15.2 (#35037)

* upgrade org.apache.parquet:parquet-avro to to 1.15.2

* keep iceberg to 1.6.1

* use 1.9.0 for iceberg

* run all iceberg tests

* require Java 11

* fixed java11 for iceberg

* switched back to iceberg_version = "1.6.1"

* merged master

---------

Co-authored-by: Vitaly Terentyev <vitaly.terentyev.akv@gmail.com>

* Bump google.golang.org/grpc from 1.72.2 to 1.73.0 in /sdks (#35236)

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.72.2 to 1.73.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.72.2...v1.73.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.73.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add more doc strings for integration tests (#35171)

* [IcebergIO] extend table partitioning to Beam SQL (#35268)

* add sql partitioning

* trigger ITs and add to CHANGES

* simplify test

* include emoty namespace fix

* add error when using partitioning for unsupported tables

* fix test

* Moving to 2.67.0-SNAPSHOT on master branch.

* Update CHANGES.md to have fields for 2.67.0 release

* add free disk space step (#35260)

* [yaml]: Fix post commit yaml once more (#35273)

* switch back to self hosted runner and comment out most of kafka test for now

* add git issue to track

* revision

* remove free space step - seems to be causing more issues than helping

* Polish anomaly detection notebook and get ready to be imported in public devsite. (#35278)

* Polish anomaly detection zscore notebook for public doc.

* Adjust formatting.

* Adjust formatting.

* Suppress Findbugs (#35274)

* Support Java 17+ compiled Beam components for build, release, and xlang tests (#35232)

* Install JDK 21 in release build

Support Beam components requiring Java17+ for release workflows
They will be compiled with JDK21 with byte code compatibility
configured by applyJavaNature(requireJavaVersion)

* Use Java 21 for Python PostCommit

* Honor JAVA_HOME in JavaJarServer checks and tests

* Disable Debezium test on Java17-

* add example line

* [IO] Update Debezium in DebeziumIO to 3.1.1 (#34763)

* Updating Debezium IO to 3.1.1
Enforce JDK17 in build.gradle of Debezium IO

* adjust to review comments by @Abacn

* cleanup

* mention which PR needs to merge before unpinning

* Mention Beam version 2.66 in the README for Debezium

---------

Co-authored-by: Shunping Huang <shunping@google.com>

* Document BQ Storage API pipeline options (#35259)

* Document BQ Storage API pipeline options

* Fix whitespace

* Fix whitespace

* Speed up StreamingDataflowWorkerTest by removing 10 second wait from shutdown path. (#35275)

Takes test time from 5m18s to 2m40s.

The previous shutdown() call doesn't do anything since that just means
future scheduling won't trigger but we only schedule on the executor
once.

Also cleanup test logs by making sure to stop all workers we start so they
don't continue to run in the background and log.

This shutdown paths is only used in testing.

* Bump org.ajoberstar.grgit:grgit-gradle from 4.1.1 to 5.3.2 (#35301)

Bumps [org.ajoberstar.grgit:grgit-gradle](https://github.com/ajoberstar/grgit) from 4.1.1 to 5.3.2.
- [Release notes](https://github.com/ajoberstar/grgit/releases)
- [Commits](ajoberstar/grgit@4.1.1...5.3.2)

---
updated-dependencies:
- dependency-name: org.ajoberstar.grgit:grgit-gradle
  dependency-version: 5.3.2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump google.golang.org/api from 0.235.0 to 0.237.0 in /sdks (#35302)

Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.235.0 to 0.237.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md)
- [Commits](googleapis/google-api-go-client@v0.235.0...v0.237.0)

---
updated-dependencies:
- dependency-name: google.golang.org/api
  dependency-version: 0.237.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/linkedin/goavro/v2 from 2.13.1 to 2.14.0 in /sdks (#35205)

Bumps [github.com/linkedin/goavro/v2](https://github.com/linkedin/goavro) from 2.13.1 to 2.14.0.
- [Release notes](https://github.com/linkedin/goavro/releases)
- [Changelog](https://github.com/linkedin/goavro/blob/master/debug_release.go)
- [Commits](linkedin/goavro@v2.13.1...v2.14.0)

---
updated-dependencies:
- dependency-name: github.com/linkedin/goavro/v2
  dependency-version: 2.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update spotbugs version, fix runner ubuntu version, fix found spotbugs issues (#35303)

* feat: Add option to control resource cleanup failure for IT (#35287)

* Revert "Bump org.ajoberstar.grgit:grgit-gradle from 4.1.1 to 5.3.2 (#35301)" (#35305)

This reverts commit 0fd77ce.

* Add PeriodicStream in the new time series folder. (#35300)

* Add PeriodicStream in the new time series folder.

* Add some more docstrings and minor fix on test name.

* Fix lints and docs.

* try buildah to replace kaniko (#35289)

* try buildah to replace kaniko

* trigger post-commit

* Adding error handler for SpannerReadSchemaTransformProvider and missi… (#35241)

* Adding error handler for SpannerReadSchemaTransformProvider and missing tests for SpannerSchemaTransformProvider

* Removed not used logging

* Spotless Apply

* Spotless Apply

* Spotless Apply

* Typo correction

* requests vulnerability. (#35308)

* [IcebergIO] create custom java container image for tests (#35307)

* create custom java container image for tests

* syntax

* eval depends on df

* add streaming support to iobase (python) (#35137)

* Add support for streaming writes in IOBase (Python)

* add triggering_frequency in iobase.Sink

* fix whitespaces/newlines

* fixes per #35137 (review)

* refactor for code redability

* refactor _expand_unbounded , default num_shards to 1 , if undef or 0

* fix formatter

* space

* keep num_shards = 0 the same as before for bounded write

* add streaming to AvroIO, ParquetIO, TFRecordsIO

* reformat

* typo and spaces

* carry on the refactor from #35253

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: bullet03 <bulatkazan@yahoo.com>
Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>
Co-authored-by: liferoad <huxiangqian@gmail.com>
Co-authored-by: claudevdm <33973061+claudevdm@users.noreply.github.com>
Co-authored-by: Claude <cvandermerwe@google.com>
Co-authored-by: Yi Hu <yathu@google.com>
Co-authored-by: Danny McCormick <dannymccormick@google.com>
Co-authored-by: Jack McCluskey <34928439+jrmccluskey@users.noreply.github.com>
Co-authored-by: Derrick Williams <derrickaw@google.com>
Co-authored-by: Vitaly Terentyev <vitaly.terentyev.akv@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: scwhittle <scwhittle@users.noreply.github.com>
Co-authored-by: Radosław Stankiewicz <radoslaws@google.com>
Co-authored-by: Hai Joey Tran <joey.tran@schrodinger.com>
Co-authored-by: Tanu Sharma <53229637+TanuSharma2511@users.noreply.github.com>
Co-authored-by: Kenneth Knowles <klk@google.com>
Co-authored-by: Minbo Bae <49642083+baeminbo@users.noreply.github.com>
Co-authored-by: Shunping Huang <shunping@google.com>
Co-authored-by: Chenzo <120361592+Chenzo1001@users.noreply.github.com>
Co-authored-by: kristynsmith <kristynsmith@google.com>
Co-authored-by: Rakesh Kumar <rakeshcusat@gmail.com>
Co-authored-by: Charles Nguyen <phucnh402@gmail.com>
Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>
Co-authored-by: Ahmed Abualsaud <65791736+ahmedabu98@users.noreply.github.com>
Co-authored-by: Minh Son Nguyen <minh.son.nguyen.1209@gmail.com>
Co-authored-by: Amar3tto <actions@GitHub Actions 1000279405.local>
Co-authored-by: Vitaly Terentyev <vitaly.terentyev@akvelon.com>
Co-authored-by: Tobias Kaymak <tobias.kaymak@gmail.com>
Co-authored-by: Veronica Wasson <3992422+VeronicaWasson@users.noreply.github.com>
parveensania pushed a commit to parveensania/beam-dp that referenced this pull request Aug 17, 2025
* Add support for streaming writes in IOBase (Python)

* add triggering_frequency in iobase.Sink

* fix whitespaces/newlines

* fixes per apache#35137 (review)

* implement pre_finalize_windowed and finalize_windowed_write in FileBasedSink
handle file naming based on windowing for streaming in internal methods

TextIO - expose the new filebasedsink capabilities to write in streaming

* update changes.md

* fix formatting

* space formatting

* space nagging

* keep in sync with refactor in apache#35137

* keep in sync with iobase changes in apache#35137

* [Website] add akvelon case study (apache#34943)

* feat: add akvelon logo

* feat: add akvelon case study

* fix: remove white space

* feat: add akvelon to main page

* feat: use new images

* fix: typos

* fix: change order of akvelon case-study

* fix: update text

* fix: update mainPage text

* fix: update images

* fix: about akvelon section update

* fix: update akvelon card

* fix: update akvelon header

* fix: update code tag

* fix: update about akvelon

* fix: update date and order

* fix: add link and change img

* fix: change CDAP text

* fix: add bold weight

* fix: solve conflicts

* fix: remove unused code

* fix: delete whitespace

* fix: indents format

* fix: add bold text

---------

Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>

* added the rate test for GenerateSequence (apache#35108)

* added the rate test for GenerateSequence

* keep the master yaml

* Re-enable logging after importing vllm. (apache#35103)

Co-authored-by: Claude <cvandermerwe@google.com>

* Deprecate Java 8 (apache#35064)

* Deprecate Java 8

* Java 8 client now using Java 11 SDK container

* adjust non LTS fallback to use the next LTS instead of the nearest LTS. Previously Java18 falls back to Java17, which won't work

* Emit warning when Java 8 is used. Java8 is still
  supported until Beam 3.0

* Clean up subproject build file requiring Java9+

* Require java 11 to build SDK container

* fix workflow

* Simplify XVR test workflow

* Fix Samza PVR

* Remove beam college banners (apache#35123)

* feat: change text (apache#35130)

Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>

* Update Custom Remote Inference example to use RemoteModelHandler (apache#35066)

* Update Custom Remote Inference example to use RemoteModelHandler

* restore old kernel metadata

* Update examples/notebooks/beam-ml/custom_remote_inference.ipynb

Co-authored-by: Danny McCormick <dannymccormick@google.com>

* Add DLQ

---------

Co-authored-by: Danny McCormick <dannymccormick@google.com>

* Remove Java 8 container (apache#35125)

* add extra_transforms block documentation to chain transform documentation (apache#35101)

* add note about testing (apache#35075)

* fix whitespaces/newlines

* fixes per apache#35137 (review)

* implement pre_finalize_windowed and finalize_windowed_write in FileBasedSink
handle file naming based on windowing for streaming in internal methods

TextIO - expose the new filebasedsink capabilities to write in streaming

* update changes.md

* fix formatting

* space formatting

* space nagging

* keep in sync with refactor in apache#35137

* keep in sync with iobase changes in apache#35137

* [Website] update akvelon case study: update text and fix  landing page (apache#35133)

* feat: change text

* fix: add learn more to quotes Akvelon

---------

Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>

* Fix PostCommit Python Xlang IO Dataflow job (apache#35131)

* Add support for iterable type

* Fix formatting

* Bump google.golang.org/grpc from 1.72.0 to 1.72.2 in /sdks (apache#35113)

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.72.0 to 1.72.2.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.72.0...v1.72.2)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.72.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump cloud.google.com/go/bigquery from 1.67.0 to 1.69.0 in /sdks (apache#35061)

Bumps [cloud.google.com/go/bigquery](https://github.com/googleapis/google-cloud-go) from 1.67.0 to 1.69.0.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@spanner/v1.67.0...spanner/v1.69.0)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/bigquery
  dependency-version: 1.69.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add known issues. (apache#35138)

* Add known issues.

* Add fixes notes.

---------

Co-authored-by: Claude <cvandermerwe@google.com>

* Bump @octokit/plugin-paginate-rest and @octokit/rest (apache#34167)

Bumps [@octokit/plugin-paginate-rest](https://github.com/octokit/plugin-paginate-rest.js) to 11.4.3 and updates ancestor dependency [@octokit/rest](https://github.com/octokit/rest.js). These dependencies need to be updated together.


Updates `@octokit/plugin-paginate-rest` from 2.17.0 to 11.4.3
- [Release notes](https://github.com/octokit/plugin-paginate-rest.js/releases)
- [Commits](octokit/plugin-paginate-rest.js@v2.17.0...v11.4.3)

Updates `@octokit/rest` from 18.12.0 to 21.1.1
- [Release notes](https://github.com/octokit/rest.js/releases)
- [Commits](octokit/rest.js@v18.12.0...v21.1.1)

---
updated-dependencies:
- dependency-name: "@octokit/plugin-paginate-rest"
  dependency-type: indirect
- dependency-name: "@octokit/rest"
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Explicitly handle singleton iterators instead of using helper and catching exceptions which may be from generating iterable (apache#35124)

* Build last snapshot against RC00 tag instead of release branch (apache#35142)

The last snapshot Beam Java SDK (aka "RC00" release) build is triggered manually, to verify a RC1 build will be successful.

It has been built against release-2.xx branch, where the Dataflow container tag replaced from beam-master to the 2.xx.0

However, the versioned containers are not yet released, causing a timing gap that Beam 2.xx.0-SNAPSHOT won't work on Dataflow between release branch cut and RC1 rolled out to Dataflow

Since we now have a v2.xx.0-RC00 tag, build RC00 against this tag resolves the issue.

* Bump nodemailer from 6.7.5 to 6.9.9 in /scripts/ci/issue-report (apache#35143)

Bumps [nodemailer](https://github.com/nodemailer/nodemailer) from 6.7.5 to 6.9.9.
- [Release notes](https://github.com/nodemailer/nodemailer/releases)
- [Changelog](https://github.com/nodemailer/nodemailer/blob/master/CHANGELOG.md)
- [Commits](nodemailer/nodemailer@v6.7.5...v6.9.9)

---
updated-dependencies:
- dependency-name: nodemailer
  dependency-version: 6.9.9
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix tests affected by Java 8 container turn down (apache#35145)

* Fix tests affected by Java 8 container turn down

* still use Java 8 for Samza runner

* Bump github.com/aws/aws-sdk-go-v2/service/s3 in /sdks (apache#35146)

Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.79.3 to 1.80.0.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json)
- [Commits](aws/aws-sdk-go-v2@service/s3/v1.79.3...service/s3/v1.80.0)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/s3
  dependency-version: 1.80.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix jdbc transform validation (apache#35141)

* fix jdbc transform  validation

* add test

* annotations

* spotless

* Fix Java Example ARM PostCommit

* fix: add missed word (apache#35163)

Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>

* Add postcommit yaml xlang workflow and split tests accordingly  (apache#35119)

* move precommit yaml xlang to postcommit

* add json file to trigger path

* update pull request targets

* add readme workflow changes

* add cloud sdk setup

* switch out to ubuntu-latest

* add back precommit workflow

* switch names and add postcommit

* switch out to postCommitYamlIT

* add test_files_dir flag

* add conftest.py file for capturing directory flag

* shift yaml tests around to appropriate locations

* add back precommit to readme

* add license for conftest.py

* revert precommit to previous name

* remove github.event.comment.body trigger

* Replace usages of deprecated pkg_resources package (apache#35153)

* Remove usages of deprecated pkg_resources package

* use stdlib importlib.resources

* remove extra comma

* linting

* import order

* Improve error message when accidentally using PBegin/Pipeline (apache#35156)

* Create test

* Implement new error message

* Add beam.Create into unit test pipeline

* add friendly error message for when transform is applied to no output (apache#35160)

* add friendly error message for when transform is applied to no output

* update test name

* Fix pubsub unit tests that depend on old behavior

* Add warning if temp location bucket has soft delete enabled for Go SD… (apache#34996)

* Add warning if temp location bucket has soft delete enabled for Go SDK (resolves apache#31606)

* Corrected Formatting

* Applied suggested changes

* Constrain DequeCoder type correctly, as it does not support nulls

The DequeCoder uses ArrayDeque internally, which disallows null elements.
We could switch Deque implementations, but this change is better. To quote the
JDK docs: "While Deque implementations are not strictly required to prohibit
the insertion of null elements, they are strongly encouraged to do so. Users of
any Deque implementations that do allow null elements are strongly encouraged
not to take advantage of the ability to insert nulls. This is so because null
is used as a special return value by various methods to indicated that the
deque is empty."

* Do not overwrite class states if a cached dynamic class is returned in cloudpickle.load (apache#35063)

* Fix class states overwritten after cloudpickle.load

* Fix further

* Fix lint

* Make SDK harness change effective on Iceberg Dataflow test (apache#35173)

* Fix beam_PostCommit_Java_Examples_Dataflow_V2 (apache#35172)

* [YAML]: Update postgres IT test and readme (apache#35169)

* update postgres test without driver_class_name

* update readme on how to run integration tests

* fix misspelling

* fix trailing whitespace

* Bump Java beam-master container (apache#35170)

* Make WindowedValue a public interface

The following mostly-automated changes:

 - Moved WindowedValue from util to values package
 - Make WindowedValue an interface with companion class WindowedValues

* Run integration tests for moving WindowedValue and making public

* Add timer tests to make sure event-time timer firing at the right time. (apache#35109)

* Add timer tests to make sure event-time timer firing at the right time.

* Add more tests.

* Disable the failed event-time timer tests for FnApiRunner.

* Fix lints and reformat.

* Disable another new test in FnApiRunnerTest and PortableRunnerTest due to flakiness.

* Disable a new test in FlinkRunnerTest

* Take out the early firing test case because it depends on bundle size.

* change to ubuntu-20.04 (apache#35182)

* Fix IntelliJ sync project failure due to circular Beam dependency (apache#35167)

* Fix IntelliJ sync project failure due to circular Beam dependency

* address comments

* Update workflows categories (apache#35162)

* Add cloudpickle coder. (apache#35166)

* Move examples from sql package (apache#35183)

* Fix the beam interactive install problem when on Google Colab (apache#35148)

* Fix Google Colab Issue

* Update CHANGES.md

* Bump github.com/docker/docker in /sdks (apache#35112)

Bumps [github.com/docker/docker](https://github.com/docker/docker) from 28.0.4+incompatible to 28.2.2+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](moby/moby@v28.0.4...v28.2.2)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-version: 28.2.2+incompatible
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/nats-io/nats-server/v2 from 2.11.3 to 2.11.4 in /sdks (apache#35161)

Bumps [github.com/nats-io/nats-server/v2](https://github.com/nats-io/nats-server) from 2.11.3 to 2.11.4.
- [Release notes](https://github.com/nats-io/nats-server/releases)
- [Changelog](https://github.com/nats-io/nats-server/blob/main/.goreleaser.yml)
- [Commits](nats-io/nats-server@v2.11.3...v2.11.4)

---
updated-dependencies:
- dependency-name: github.com/nats-io/nats-server/v2
  dependency-version: 2.11.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Touch JPMS test trigger file

* Use staged SDK harness & Dataflow worker jar in JPMS tests

* Bump cloud.google.com/go/storage from 1.52.0 to 1.55.0 in /sdks (apache#35114)

Bumps [cloud.google.com/go/storage](https://github.com/googleapis/google-cloud-go) from 1.52.0 to 1.55.0.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@spanner/v1.52.0...spanner/v1.55.0)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/storage
  dependency-version: 1.55.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/nats-io/nats.go from 1.42.0 to 1.43.0 in /sdks (apache#35147)

Bumps [github.com/nats-io/nats.go](https://github.com/nats-io/nats.go) from 1.42.0 to 1.43.0.
- [Release notes](https://github.com/nats-io/nats.go/releases)
- [Commits](nats-io/nats.go@v1.42.0...v1.43.0)

---
updated-dependencies:
- dependency-name: github.com/nats-io/nats.go
  dependency-version: 1.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Move non-dill specific code out of dill_pickler.py (apache#35139)

* Create consistent_pickle.py

* Update dill_pickler.py

* Update consistent_pickle.py

* Add Apache license to consistent_pickle.py

* Update dill_pickler.py

* Update and rename consistent_pickle.py to code_object_pickler.py

* Update dill_pickler.py to change consistent_pickle to code_object_pickler

* Fix formatting issue in code_object_pickler.py

* Fix formatting issue in dill_pickler.py

* Remove apache license from code_object_pickler.py

* Add apache license to code_object_pickler.py

* Fix some lints introduced in a recent PR. (apache#35193)

* small filesystem fixes (apache#34956)

* Fix typehint for get_filesystem

* Fix bug with creating files in cwd with LocalFileSystem

* lint

* revert precommit config change

* isort

* add file to start PR builder

* Revert "add file to start PR builder"

This reverts commit f80b361.

* fix isort again

* [YAML] Add a spec provider for transforms taking specifiable arguments (apache#35187)

* Add a test provider for specifiable and try it on AnomalyDetection.

Also add support on callable in spec.

* Minor renaming

* Fix lints.

* Touch trigger files to test WindowedValueReceiver in runners

* Introduce WindowedValue receivers and consolidate runner code to them

We need a receiver for WindowedValue to create an OutputBuilder. This change
introduces the interface and uses it in many places where it is appropriate,
replacing and simplifying internal runner code.

* Eliminate nullness errors from ByteBuddyDoFnInvokerFactory and DoFnOutputReceivers

* Fix null check when fetching driverJars from value provider

* Fix PostCommit Python ValidatesRunner Samza / Spark jobs (apache#35210)

* Skip Samza and Spark runner tests

* Fix formatting

* Update pypi documentation 30145 (apache#34329)

* Add Pypi friendly documentation

* provided full link path

* Add more YAML examples involving Kafka and Iceberg (apache#35151)

* Add more YAML examples involving Kafka and Iceberg

* Fix some missed details from rebasing

* Adding unit tests for YAML examples

* Clean up and address PR comments

* Formatting

* Formatting

* Evaluate namedTuples as equivalent to rows (apache#35218)

* Evaluate namedTuples as equivalent to rows

* lint

* Add a new experiment flag to enable real-time clock as processing time. (apache#35202)

* Touch trigger files for lightweight runners

* Eliminate WindowedValue.getPane() in preparation for making it a user-facing interface

* Do not fail if there were failed deletes (apache#35222)

* Force FnApiRunner in cases where prism can't handle use case (apache#35219)

* Force FnApiRunner in cases where prism can't handle use case

* yapf

* Bump golang.org/x/net from 0.40.0 to 0.41.0 in /sdks (apache#35206)

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.40.0 to 0.41.0.
- [Commits](golang/net@v0.40.0...v0.41.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-version: 0.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix incorrect typehints generated by FlatMap with default identity function (apache#35164)

* create unit test

* add some unit tests

* fix bug where T is considered iterable

* update strip_iterable to return Any for "stripped iterable" type of TypeVariable

* remove typehint from identity function and add a test to test for proper typechecking

* Move callablewrapp typehint test

* Remove print

* isort

* isort

* return any for yielded type of T

* Parse values returned from Dataflow API to BoundedTrieData (apache#34738)

* Parse struct returned from Dataflow API to BoundedTrieData

fix checkstyle

Use getBoundedTrie

add debug log

adapt to ArrayMap

* Add test, clean up

* Fix test: setTrie -> setBoundedTrie

* Fix comments

* Remove breaking PDone change (apache#35224)

* Remove breaking PDone change

* Update pipeline.py

* Update pipeline.py

* Update pipeline_test.py

* Move none check

* yapf

* fmt

* Generic Postgres + Cloudsql postgres embeddings. (apache#35215)

* Add base Postgres vector writer, CloudSQL vector writer and refactor.

* Trigger tests.

* Linter fixes.

* Fix test

* Add back tests. Update changes.md. Fix unrelated lint.

* Drop test tables.

* Fix test

* Fix tests.

* Fix test.

* Fix tests.

---------

Co-authored-by: Claude <cvandermerwe@google.com>

* Allow only one thread at a time to start the VLLM server. (apache#35234)

* [IcebergIO] Create namespaces if needed (apache#35228)

* create namespaces dynamically

* cleanup namespaces in ITs

* optimization

* Support configuring flush_count and max_row_bytes of WriteToBigTable (apache#34761)

* Update CHANGES.md (apache#35242)

Highlight improvements to the vllm model handler.

* [Beam SQL] Implement Catalog and CatalogManager (apache#35223)

* beam sql catalog

* api adjustment

* cover more naming syntax (quotes, backticks, none)

* spotless

* fix

* add documentation and cleanup

* rename to dropCatalog; mark BeamSqlCli @internal; rename to EmptyCatalogManager

* use registrars instead; remove initialize() method from Catalog

* cleanupo

* [IcebergIO] Fix conversion logic for arrays of structs and maps of structs; fix output Schema resolution with column pruning (apache#35230)

* fix complex types; filx filter schema logic

* update Changes

* add missing changes from other PRs

* trigger ITs

* make separate impl for iterable

* fix

* fix long_description when the md file cannot be found (apache#35246)

* fix long_description when the md file cannot be found

* yapf

* [IcebergIO] Create tables with a partition spec (apache#34966)

* create table with partition spec

* add description and regenerate config doc; trigger tests

* spotless

* log partition spec

* fixes

* Fix typo java-dependencies.md (apache#35251)

* Adding project and database support in write transform for firestoreIO (apache#35017)

* Adding project and database support in write transform for firestoreIO

* Spotless Apply

* Resolving comments

* Removing useless comments

* public to private

* Spotless apply

* Fix a logical type issue about JdbcDateType and JdbcTimeType (apache#35243)

* Fix a logical type issue about JdbcDateType

* Fix typo and also fix the logical class for java time.

* Get rid of the workaround on logical type registration. Trigger tests.

* Fix lints.

* Remove internal code. (apache#35239)

Co-authored-by: Claude <cvandermerwe@google.com>

* enable setting max_writer_per_bundle for avroIO and other IO (apache#35092)

* enable setting max_writer_per_bundle for avroIO and other IO

* enable setting max_writer_per_bundle for avroIO and other IO

* enable for TFRecordIO and corrections

* Updated standard_external_transforms.yaml

* [Java FnApi] Fix race in BeamFnStatusClient by initializing all fields before starting rpc. (apache#35252)

* [Java FnApi] Fix race in BeamFnStatusClient by initializing all fields before starting rpc.

* spotless

* [IcebergIO] Integrate with Beam SQL (apache#34799)

* add Iceberg table provider and tests

* properties go in the tableprovider initialization

* tobuilder

* streaming integration test

* spotless

* extend test to include multi nested types; fix iceberg <-> conversion logic

* add to changes.md

* spotless

* fix tests

* clean

* update CHANGES

* add projection pushdown and column pruning

* spotless

* fixes

* fixes

* updates

* sync with HEAD and use new Catalog implementation

* mark new interfaces @internal

* spotless

* fix unparse method

* update changes (apache#35256)

* [YAML]: fix postcommit oracle bug and reorganize postcommit tests (apache#35191)

* fix postcommit oracle test

* add revision

* switch to hosted runner to try with kafka test

* add extended timeout

* upgrade to 4.10 testcontainers

* switch out to redpanda for kafka

* remove redpandacontainer

* tmp comment

* add postgres comment

* revert to old kafkaContainer

* revert commented out code and revert testcontainer version change

* change mysql image version

* revert image change

* revert image change again :)

* add comments to mysql again

* shift post commit files to different folders

* rename to Data version

* add databases version

* add messaging version

* update readme for three post commits

* update gradle with new post commits

* fix job names

* uncomment fixture on mysql

* switch back to one workflow and update readme as such

* remove old workflow files

* update order and remove comment

* update gradle with parameterized options

* update gradle command calls to correct location

* update workflow to three jobs explicit

* add back Bigquery deselect

* fix mysql teardown error

* Simplify down to one from three explicit jobs

* remove tab

* remove Data

* fix parsing parameters

* update hadoop prefix (apache#35257)

* Set go version 1.24.4 (apache#35261)

* Set go version 1.24.4

* Set go version 1.24.4

* upgrade org.apache.parquet:parquet-avro to to 1.15.2 (apache#35037)

* upgrade org.apache.parquet:parquet-avro to to 1.15.2

* keep iceberg to 1.6.1

* use 1.9.0 for iceberg

* run all iceberg tests

* require Java 11

* fixed java11 for iceberg

* switched back to iceberg_version = "1.6.1"

* merged master

---------

Co-authored-by: Vitaly Terentyev <vitaly.terentyev.akv@gmail.com>

* Bump google.golang.org/grpc from 1.72.2 to 1.73.0 in /sdks (apache#35236)

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.72.2 to 1.73.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.72.2...v1.73.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.73.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add more doc strings for integration tests (apache#35171)

* [IcebergIO] extend table partitioning to Beam SQL (apache#35268)

* add sql partitioning

* trigger ITs and add to CHANGES

* simplify test

* include emoty namespace fix

* add error when using partitioning for unsupported tables

* fix test

* Moving to 2.67.0-SNAPSHOT on master branch.

* Update CHANGES.md to have fields for 2.67.0 release

* add free disk space step (apache#35260)

* [yaml]: Fix post commit yaml once more (apache#35273)

* switch back to self hosted runner and comment out most of kafka test for now

* add git issue to track

* revision

* remove free space step - seems to be causing more issues than helping

* Polish anomaly detection notebook and get ready to be imported in public devsite. (apache#35278)

* Polish anomaly detection zscore notebook for public doc.

* Adjust formatting.

* Adjust formatting.

* Suppress Findbugs (apache#35274)

* Support Java 17+ compiled Beam components for build, release, and xlang tests (apache#35232)

* Install JDK 21 in release build

Support Beam components requiring Java17+ for release workflows
They will be compiled with JDK21 with byte code compatibility
configured by applyJavaNature(requireJavaVersion)

* Use Java 21 for Python PostCommit

* Honor JAVA_HOME in JavaJarServer checks and tests

* Disable Debezium test on Java17-

* add example line

* [IO] Update Debezium in DebeziumIO to 3.1.1 (apache#34763)

* Updating Debezium IO to 3.1.1
Enforce JDK17 in build.gradle of Debezium IO

* adjust to review comments by @Abacn

* cleanup

* mention which PR needs to merge before unpinning

* Mention Beam version 2.66 in the README for Debezium

---------

Co-authored-by: Shunping Huang <shunping@google.com>

* Document BQ Storage API pipeline options (apache#35259)

* Document BQ Storage API pipeline options

* Fix whitespace

* Fix whitespace

* Speed up StreamingDataflowWorkerTest by removing 10 second wait from shutdown path. (apache#35275)

Takes test time from 5m18s to 2m40s.

The previous shutdown() call doesn't do anything since that just means
future scheduling won't trigger but we only schedule on the executor
once.

Also cleanup test logs by making sure to stop all workers we start so they
don't continue to run in the background and log.

This shutdown paths is only used in testing.

* Bump org.ajoberstar.grgit:grgit-gradle from 4.1.1 to 5.3.2 (apache#35301)

Bumps [org.ajoberstar.grgit:grgit-gradle](https://github.com/ajoberstar/grgit) from 4.1.1 to 5.3.2.
- [Release notes](https://github.com/ajoberstar/grgit/releases)
- [Commits](ajoberstar/grgit@4.1.1...5.3.2)

---
updated-dependencies:
- dependency-name: org.ajoberstar.grgit:grgit-gradle
  dependency-version: 5.3.2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump google.golang.org/api from 0.235.0 to 0.237.0 in /sdks (apache#35302)

Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.235.0 to 0.237.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md)
- [Commits](googleapis/google-api-go-client@v0.235.0...v0.237.0)

---
updated-dependencies:
- dependency-name: google.golang.org/api
  dependency-version: 0.237.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/linkedin/goavro/v2 from 2.13.1 to 2.14.0 in /sdks (apache#35205)

Bumps [github.com/linkedin/goavro/v2](https://github.com/linkedin/goavro) from 2.13.1 to 2.14.0.
- [Release notes](https://github.com/linkedin/goavro/releases)
- [Changelog](https://github.com/linkedin/goavro/blob/master/debug_release.go)
- [Commits](linkedin/goavro@v2.13.1...v2.14.0)

---
updated-dependencies:
- dependency-name: github.com/linkedin/goavro/v2
  dependency-version: 2.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update spotbugs version, fix runner ubuntu version, fix found spotbugs issues (apache#35303)

* feat: Add option to control resource cleanup failure for IT (apache#35287)

* Revert "Bump org.ajoberstar.grgit:grgit-gradle from 4.1.1 to 5.3.2 (apache#35301)" (apache#35305)

This reverts commit 0fd77ce.

* Add PeriodicStream in the new time series folder. (apache#35300)

* Add PeriodicStream in the new time series folder.

* Add some more docstrings and minor fix on test name.

* Fix lints and docs.

* try buildah to replace kaniko (apache#35289)

* try buildah to replace kaniko

* trigger post-commit

* Adding error handler for SpannerReadSchemaTransformProvider and missi… (apache#35241)

* Adding error handler for SpannerReadSchemaTransformProvider and missing tests for SpannerSchemaTransformProvider

* Removed not used logging

* Spotless Apply

* Spotless Apply

* Spotless Apply

* Typo correction

* requests vulnerability. (apache#35308)

* [IcebergIO] create custom java container image for tests (apache#35307)

* create custom java container image for tests

* syntax

* eval depends on df

* add streaming support to iobase (python) (apache#35137)

* Add support for streaming writes in IOBase (Python)

* add triggering_frequency in iobase.Sink

* fix whitespaces/newlines

* fixes per apache#35137 (review)

* refactor for code redability

* refactor _expand_unbounded , default num_shards to 1 , if undef or 0

* fix formatter

* space

* keep num_shards = 0 the same as before for bounded write

* add streaming to AvroIO, ParquetIO, TFRecordsIO

* reformat

* typo and spaces

* carry on the refactor from apache#35253

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: bullet03 <bulatkazan@yahoo.com>
Co-authored-by: Bulat Safiullin <v-safiullinb@microsoft.com>
Co-authored-by: liferoad <huxiangqian@gmail.com>
Co-authored-by: claudevdm <33973061+claudevdm@users.noreply.github.com>
Co-authored-by: Claude <cvandermerwe@google.com>
Co-authored-by: Yi Hu <yathu@google.com>
Co-authored-by: Danny McCormick <dannymccormick@google.com>
Co-authored-by: Jack McCluskey <34928439+jrmccluskey@users.noreply.github.com>
Co-authored-by: Derrick Williams <derrickaw@google.com>
Co-authored-by: Vitaly Terentyev <vitaly.terentyev.akv@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: scwhittle <scwhittle@users.noreply.github.com>
Co-authored-by: Radosław Stankiewicz <radoslaws@google.com>
Co-authored-by: Hai Joey Tran <joey.tran@schrodinger.com>
Co-authored-by: Tanu Sharma <53229637+TanuSharma2511@users.noreply.github.com>
Co-authored-by: Kenneth Knowles <klk@google.com>
Co-authored-by: Minbo Bae <49642083+baeminbo@users.noreply.github.com>
Co-authored-by: Shunping Huang <shunping@google.com>
Co-authored-by: Chenzo <120361592+Chenzo1001@users.noreply.github.com>
Co-authored-by: kristynsmith <kristynsmith@google.com>
Co-authored-by: Rakesh Kumar <rakeshcusat@gmail.com>
Co-authored-by: Charles Nguyen <phucnh402@gmail.com>
Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>
Co-authored-by: Ahmed Abualsaud <65791736+ahmedabu98@users.noreply.github.com>
Co-authored-by: Minh Son Nguyen <minh.son.nguyen.1209@gmail.com>
Co-authored-by: Amar3tto <actions@GitHub Actions 1000279405.local>
Co-authored-by: Vitaly Terentyev <vitaly.terentyev@akvelon.com>
Co-authored-by: Tobias Kaymak <tobias.kaymak@gmail.com>
Co-authored-by: Veronica Wasson <3992422+VeronicaWasson@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Python JDBCIO failed to handle DATE fields [Bug]: Reading from BigQuery provides inconsistent schemas

5 participants