Skip to content

[Bug]: KafkaIO maintains duplicate caches for record size and offset estimators caches #33097

@sjvanrossum

Description

@sjvanrossum

What happened?

Offset consumers are created and cached for every KafkaSourceDescriptor processed per ReadFromKafkaDoFn instance. Caching multiple open Kafka consumers for the same KafkaSourceDescriptor is expensive both on the client and server side and may exhaust connection limits on some Kafka clusters.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions