Skip to content

[Bug]: Iceberg sink is not resilient to worker crash #34074

@ahmedabu98

Description

@ahmedabu98

What happened?

When a worker crashes right after a snapshot is committed, Beam will retry the bundle and the same data files will get re-committed, leading to duplicate data.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Labels

IcebergIOIcebergIO: can only be used through ManagedIOP2bugiojava

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions