Fix a flatten flaky test when two flattens are used sequentially. #34602

shunping · 2025-04-10T14:30:21Z

We see some test flakiness (https://github.com/apache/beam/actions/workflows/beam_PreCommit_Prism_Python.yml) after #34582.

Here is a simple pipeline to reproduce:

with beam.Pipeline(options=options) as p:
  side1 = p | 'side1' >> beam.Create([('a', 1)])
  side2 = p | 'side2' >> beam.Create([('b', 2)])
  third_element = [('another_type')]

  side3 = p | 'side3' >> beam.Create(third_element)
  side = (side1, side2) | 'Flatten1' >> beam.Flatten()
  _ = (side, side3) | 'Flatten2' >> beam.Flatten() | beam.Map(print)

In #34582, we replace the coder ids in the input PCollections of each flatten transform. However, if there are multiple flatten transforms and they are connected to each other, the order of replacing matters:

If we replace the coder of Flatten1 and then Flatten2:
- the coder of side1 and side2 will be Coder(Tuple[str, int])
- the coder of side (flattened output of side1 and side2) and side3 will be Coder(Tuple[str])
If we replace the coder of Flatten2 and then Flatten1:
- the coder of side (flattened output of side1 and side2) and side3 will be Coder(Tuple[str])
- the coder of side1 and side2 will also be Coder(Tuple[str]) (same as the coder of side)

The first scenario will cause some problem during prism handling runner transform of flatten, because when it tries to collect elements of Flatten2, it will get elements encoded with Coder(Tuple[str, int])) from side1 and side2, and elements encoded with Coder(Tuple[str]) from side3.

addresses #34587

github-actions · 2025-04-10T15:07:38Z

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @lostluck for label go.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

lostluck

I like it. Essentially marking that we've examined it before. Unlikely to collide with SDK coder names, so it's acceptable as an approach.

Fix a flatten flaky test when two flattens are used sequentially.

33522f5

github-actions bot added go runners prism labels Apr 10, 2025

github-actions bot added the Next Action: Reviewers label Apr 10, 2025

lostluck approved these changes Apr 10, 2025

View reviewed changes

lostluck merged commit 04251e7 into apache:master Apr 10, 2025
9 checks passed

This was referenced Apr 11, 2025

[Bug]: Prism overwrites coders while handling flatten #34587

Closed

[prism] Java FlattenTest.testFlattenMultipleCoders - worker crash #32930

Closed

Inject SDK-side flattens while handling input/output coder mismatch in flattens. #34641

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a flatten flaky test when two flattens are used sequentially. #34602

Fix a flatten flaky test when two flattens are used sequentially. #34602

Uh oh!

shunping commented Apr 10, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 10, 2025

Uh oh!

lostluck left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix a flatten flaky test when two flattens are used sequentially. #34602

Fix a flatten flaky test when two flattens are used sequentially. #34602

Uh oh!

Conversation

shunping commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 10, 2025

Uh oh!

lostluck left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shunping commented Apr 10, 2025 •

edited

Loading