Skip to content

Conversation

@shunping
Copy link
Collaborator

@shunping shunping commented Apr 10, 2025

We see some test flakiness (https://github.com/apache/beam/actions/workflows/beam_PreCommit_Prism_Python.yml) after #34582.

Here is a simple pipeline to reproduce:

with beam.Pipeline(options=options) as p:
  side1 = p | 'side1' >> beam.Create([('a', 1)])
  side2 = p | 'side2' >> beam.Create([('b', 2)])
  third_element = [('another_type')]

  side3 = p | 'side3' >> beam.Create(third_element)
  side = (side1, side2) | 'Flatten1' >> beam.Flatten()
  _ = (side, side3) | 'Flatten2' >> beam.Flatten() | beam.Map(print)

In #34582, we replace the coder ids in the input PCollections of each flatten transform. However, if there are multiple flatten transforms and they are connected to each other, the order of replacing matters:

  • If we replace the coder of Flatten1 and then Flatten2:
    • the coder of side1 and side2 will be Coder(Tuple[str, int])
    • the coder of side (flattened output of side1 and side2) and side3 will be Coder(Tuple[str])
  • If we replace the coder of Flatten2 and then Flatten1:
    • the coder of side (flattened output of side1 and side2) and side3 will be Coder(Tuple[str])
    • the coder of side1 and side2 will also be Coder(Tuple[str]) (same as the coder of side)

The first scenario will cause some problem during prism handling runner transform of flatten, because when it tries to collect elements of Flatten2, it will get elements encoded with Coder(Tuple[str, int])) from side1 and side2, and elements encoded with Coder(Tuple[str]) from side3.

addresses #34587

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @lostluck for label go.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. Essentially marking that we've examined it before. Unlikely to collide with SDK coder names, so it's acceptable as an approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants