[C++][Dataset] Discovery of partition field as dictionary type segfaulting with HivePartitioning

Testing new feature from ARROW-8647, python test that reproduces it:

```python

@pytest.mark.parquet
@pytest.mark.parametrize('partitioning', ["directory", "hive"])
def test_open_dataset_partitioned_dictionary_type(tempdir, partitioning):
    import pyarrow.parquet as pq
    table = pa.table({'a': range(9), 'b': [0.] * 4 + [1.] * 5})

    path = tempdir / "dataset"
    path.mkdir()

    for part in ["A", "B", "C"]:
        fmt = "{}" if partitioning == "directory" else "part={}"
        part = path / fmt.format(part)
        part.mkdir()
        pq.write_table(table, part / "test.parquet")

    if partitioning == "directory":
        part = ds.DirectoryPartitioning.discover(["part"], max_partition_dictionary_size=-1)
    else:
        part = ds.HivePartitioning.discover(max_partition_dictionary_size=-1)
    
    dataset = ds.dataset(str(path), partitioning=part)
    expected_schema = table.schema.append(
        pa.field("part", pa.dictionary(pa.int32(), pa.string()))
    )
    assert dataset.schema.equals(expected_schema)
```

This test fails (segfaults) for HivePartitioning, but works for DirectoryPartitioning

**Reporter**: [Joris Van den Bossche](https://issues.apache.org/jira/browse/ARROW-9288) / @jorisvandenbossche
**Assignee**: [Joris Van den Bossche](https://issues.apache.org/jira/browse/ARROW-9288) / @jorisvandenbossche
#### Related issues:
- [[C++][Dataset] Optionally encode partition field values as dictionary type](https://github.com/apache/arrow/issues/24808) (is related to)
#### PRs and other links:
- [GitHub Pull Request #7608](https://github.com/apache/arrow/pull/7608)

<sub>**Note**: *This issue was originally created as [ARROW-9288](https://issues.apache.org/jira/browse/ARROW-9288). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++][Dataset] Discovery of partition field as dictionary type segfaulting with HivePartitioning #25380

Related issues:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[C++][Dataset] Discovery of partition field as dictionary type segfaulting with HivePartitioning #25380

Description

Related issues:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions