Skip to content

[Python] NaN values silently casted to int64 when passing explicit schema for conversion in Table.from_pandas #15664

@asfimport

Description

@asfimport

If you create a Table from a DataFrame of ints with a NaN value the NaN is improperly cast. Since pandas casts these to floats, when converted to a table the NaN is interpreted as an integer. This seems like a bug since a known limitation in pandas (the inability to have null valued integers data) is taking precedence over arrow's functionality to store these as an IntArray with nulls.

 

import pyarrow as pa
import pandas as pd

df = pd.DataFrame({"a":[1, 2, pd.np.NaN]})
schema = pa.schema([pa.field("a", pa.int64(), nullable=True)])
table = pa.Table.from_pandas(df, schema=schema)
table[0]


<pyarrow.lib.Column object at 0x7f2151d19c90>
chunk 0: <pyarrow.lib.Int64Array object at 0x7f213bf356d8>
[
  1,
  2,
  -9223372036854775808
]

 

Reporter: Matthew Gilbert
Assignee: Antoine Pitrou / @pitrou

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-2135. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions