Skip to content

[Python] from_pandas gives incorrect results when converting floating point to bool #19753

@asfimport

Description

@asfimport

When converting Pandas data that contains floating point values to boolean, incorrect results are given


In [2]: import pyarrow as pa
   ...: import pandas as pd
   ...: a = [0.0, 1.0, 2.0, None, float('NaN')]
   ...: 

In [3]: s = pd.Series(a)

In [4]: pa.Array.from_pandas(s, type=pa.bool_())
Out[4]: 
<pyarrow.lib.BooleanArray object at 0x7f1bfd099e68>
[
  False,
  False,
  False,
  False,
  False
]

Expected output should be True when value != 0

This originated from SPARK-25461

Reporter: Bryan Cutler / @BryanCutler
Assignee: Bryan Cutler / @BryanCutler

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-3428. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions