Skip to content

[C++] FileSystemDataset FilenamePartitioning error - fsspec filesystem #32471

@asfimport

Description

@asfimport

Unless this is user error (which it may well be!), it seems that Dataset FilenamePartitioning on read doesn't seem to work with an fsspec filesystem. From what I can glean, the filenames can be parsed successfully when passed to the parse() method, but do not seem to be being extracted as fields from the filenames passed to dataset() – instead, they appear as nulls. When trying to use the partitioning discover() method (assuming this is a reasonable thing to try), I get the below traceback. (Repro python script attached).

Traceback (most recent call last):
  File "/zip_of_csvs_test.py", line 82, in
    ds_partitioned = pds.dataset(
  File "/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pyarrow/dataset.py", line 697, in dataset
    return _filesystem_dataset(source, **kwargs)
  File "/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pyarrow/dataset.py", line 449, in _filesystem_dataset
    return factory.finish(schema)
  File "pyarrow/_dataset.pyx", line 1857, in pyarrow._dataset.DatasetFactory.finish
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: No non-null segments were available for field 'frequency'; couldn't infer type

Reporter: Adam Kirby

Related issues:

Original Issue Attachments:

Note: This issue was originally created as ARROW-17174. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions