Skip to content

[Java] Java Dataset API ScanOptions expansion #28866

@asfimport

Description

@asfimport

Currently, there are very few scanning options which we can set in the Java Dataset API [1].

Additionally, the options that exist now always must be set from Java, without the possibility to use sensible default values from core Arrow.

For my use-case, I want to be able to set the fragment_readahead option from the Java-side.

 

It would be great if:
 + ScanOptions.java would be expanded to allow us to set more, potentially all options related to scanner creation.
 + Java users can omit options to use the default values, e.g. [2].

It would be good to know what others think, and whether a PR for this is useful.

[1]https://github.com/apache/arrow/blob/master/java/dataset/src/main/java/org/apache/arrow/dataset/scanner/ScanOptions.java
[2]

constexpr int64_t kDefaultBatchSize = 1 << 20;
constexpr int32_t kDefaultBatchReadahead = 32;
constexpr int32_t kDefaultFragmentReadahead = 8;

Reporter: Sebastiaan Alvarez Rodriguez

Note: This issue was originally created as ARROW-13166. Please see the migration documentation for further details.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions