-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Describe the enhancement requested
While reviewing:
I've realized that the following code path is never tested:
arrow/python/pyarrow/parquet/core.py
Lines 1807 to 1838 in e2a5b4e
| except ImportError: | |
| # fall back on ParquetFile for simple cases when pyarrow.dataset | |
| # module is not available | |
| if filters is not None: | |
| raise ValueError( | |
| "the 'filters' keyword is not supported when the " | |
| "pyarrow.dataset module is not available" | |
| ) | |
| if partitioning != "hive": | |
| raise ValueError( | |
| "the 'partitioning' keyword is not supported when the " | |
| "pyarrow.dataset module is not available" | |
| ) | |
| if schema is not None: | |
| raise ValueError( | |
| "the 'schema' argument is not supported when the " | |
| "pyarrow.dataset module is not available" | |
| ) | |
| filesystem, path = _resolve_filesystem_and_path(source, filesystem) | |
| if filesystem is not None: | |
| source = filesystem.open_input_file(path) | |
| # TODO test that source is not a directory or a list | |
| dataset = ParquetFile( | |
| source, read_dictionary=read_dictionary, | |
| memory_map=memory_map, buffer_size=buffer_size, | |
| pre_buffer=pre_buffer, | |
| coerce_int96_timestamp_unit=coerce_int96_timestamp_unit, | |
| decryption_properties=decryption_properties, | |
| thrift_string_size_limit=thrift_string_size_limit, | |
| thrift_container_size_limit=thrift_container_size_limit, | |
| page_checksum_verification=page_checksum_verification, | |
| ) |
I've tried to enable Parquet locally on the minimal Python example build but this would require to build with snappy and others. It would be good to have a test that exercises this code path.
This was tested before due to the use of legacy_dataset flag.
See:
Component(s)
Python