Skip to content

[Bug]: YAML Flatten incorrectly drops fields when input PCollections' schema are different #35666

@charlespnh

Description

@charlespnh

What happened?

Run the following YAML pipeline:

pipeline:
  transforms:
    - type: Create
      name: Create1
      config:
        elements:
          - {'ride_id': '1',
             'passenger_count': 1}
          - {'ride_id': '2',
             'passenger_count': 2}

    - type: Create
      name: Create2
      config:
          elements:
            - {'ride_id': '3'}
            - {'ride_id': '4'}

    - type: Flatten
      name: Flatten1
      input:
        - Create1
        - Create2

    - type: LogForTesting
      name: LogForTesting
      input: Flatten1

... gives the following result:

INFO:root:{"ride_id": "3"}
INFO:root:{"ride_id": "4"}
INFO:root:{"ride_id": "1"}
INFO:root:{"ride_id": "2"}

... But I'm expecting:

INFO:root:{"ride_id": "3"}
INFO:root:{"ride_id": "4"}
INFO:root:{"ride_id": "1", "passenger_count": 1}
INFO:root:{"ride_id": "2", "passenger_count": 2}

This is found on apache_beam 2.66.0. Tested with Python SDK on Beam Playground and didn't see this issue of fields being dropped.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions