[Python] pa.array() doesn't respect specified dictionary type

This might be related to ARROW-6548 and others dealing with all NaN columns. When creating a dictionary array, even when fully specifying the desired type, this type is not respected when the data contains only NaNs:


```python

# This may look a little artificial but easily occurs when processing categorial data in batches and a particular batch containing only NaNs
ser = pd.Series([None, None]).astype('object').astype('category')
typ = pa.dictionary(index_type=pa.int8(), value_type=pa.string(), ordered=False)
pa.array(ser, type=typ).type
```

results in

```

>> DictionaryType(dictionary<values=null, indices=int8, ordered=0>)
```

which means that one cannot e.g. serialize batches of categoricals if the possibility of all-NaN batches exists, even when trying to enforce that each batch has the same schema (because the schema is not respected).

I understand that inferring the type in this case would be difficult, but I'd imagine that a fully specified type should be respected in this case?

In the meantime, is there a workaround to manually create a dictionary array of the desired type containing only NaNs?


**Reporter**: [Thomas Buhrmann](https://issues.apache.org/jira/browse/ARROW-7168) / @buhrmann
**Assignee**: [Joris Van den Bossche](https://issues.apache.org/jira/browse/ARROW-7168) / @jorisvandenbossche
#### PRs and other links:
- [GitHub Pull Request #5866](https://github.com/apache/arrow/pull/5866)

<sub>**Note**: *This issue was originally created as [ARROW-7168](https://issues.apache.org/jira/browse/ARROW-7168). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Python] pa.array() doesn't respect specified dictionary type #23469

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] pa.array() doesn't respect specified dictionary type #23469

Description

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions