-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Currently, there are very few scanning options which we can set in the Java Dataset API [1].
Additionally, the options that exist now always must be set from Java, without the possibility to use sensible default values from core Arrow.
For my use-case, I want to be able to set the fragment_readahead option from the Java-side.
It would be great if:
+ ScanOptions.java would be expanded to allow us to set more, potentially all options related to scanner creation.
+ Java users can omit options to use the default values, e.g. [2].
It would be good to know what others think, and whether a PR for this is useful.
arrow/cpp/src/arrow/dataset/scanner.h
Lines 51 to 53 in ad5dc82
| constexpr int64_t kDefaultBatchSize = 1 << 20; | |
| constexpr int32_t kDefaultBatchReadahead = 32; | |
| constexpr int32_t kDefaultFragmentReadahead = 8; |
Reporter: Sebastiaan Alvarez Rodriguez
Note: This issue was originally created as ARROW-13166. Please see the migration documentation for further details.