Skip to content

[Task]: Spark Dataset runner should use default storage level MEMORY_AND_DISK #25737

@mosche

Description

@mosche

What needs to happen?

Depending on the API Spark uses different defaults as storage level:

  • RDD API: MEMORY_ONLY
  • Dataset API: MEMORY_AND_DISK

Currently, the default storage level is set to MEMORY_ONLY for all runners. However, the default storage level of the Dataset runner should match Sparks default MEMORY_AND_DISK.

Issue Priority

Priority: 3 (nice-to-have improvement)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions