Skip to content

[C++] Null values in partitioning field for FilenamePartitioning #31689

@asfimport

Description

@asfimport

While using FilenamePartitioning, currently when we read a dataset, say from PyArrow, the partitioning field only has Null values.

 

The issue can be reproduced with the following code

 

table = pa.table([
            pa.array(range(20)), pa.array(np.random.randn(20)),
            pa.array(np.repeat(['a', 'b'], 10))],
            names=["f1", "f2", "part"]
        ) 

part = ds.partitioning(pa.schema([("part", pa.string())]), flavor="filename")

# test is the directory where partitions are written
ds.write_dataset(
            table, "test",
            format="parquet"partitioning=part
        )

result = ds.dataset(
             "test"format="parquet", partitioning=part,
         ).to_table()

This results in something this:

pyarrow.Table
f1: int64
f2: double
part: string
----
f1: [[0,1,2,3,4,5,6,7,8,9],[10,11,12,13,14,15,16,17,18,19]]
f2: [[-1.1753280347394899,-0.9640239222827617,0.7907247451009602,1.3667778347936321,0.005079832420686733,0.9024313772071855,-1.01618656608383,-1.1459911861999188,-0.7407261867306765,-0.012823499364722428],[0.29893685698088185,1.3907720928021299,-0.48826416913435605,-1.3436821154932153,-0.5492388164165941,-0.07093280675027104,0.009918818541272493,-1.05561750529359,-2.0209000426858927,-0.28081085330210676]]
part: [[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null]] 

 

 

 

Reporter: Sanjiban Sengupta / @sanjibansg
Assignee: Sanjiban Sengupta / @sanjibansg

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-16302. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions