-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the enhancement requested
See #34901 for a longer discussion, but summarizing: the pyarrow.Scalar object has a cast() method, but in contrast with other cast methods in pyarrow it does an unsafe cast by default. We should probably change this to do a safe cast by default, and at the same time also allow to specify CastOptions (so a user can still choose to do an unsafe cast).
Example:
# scalar behaviour
>>> pa.scalar(1.5)
<pyarrow.DoubleScalar: 1.5>
>>> pa.scalar(1.5).cast(pa.int64())
<pyarrow.Int64Scalar: 1>
# vs array behaviour
>>> pa.array([1.5]).cast(pa.int64())
...
ArrowInvalid: Float value 1.5 was truncated converting to int64The python cast() method calls the C++ Scalar::ToCast:
Lines 99 to 100 in e488942
| // TODO(bkietz) add compute::CastOptions | |
| Result<std::shared_ptr<Scalar>> CastTo(std::shared_ptr<DataType> to) const; |
which currently indeed doesn't have the option to pass CastOptions.
In addition, it seems that for casting Scalars, we do have a somewhat custom implementation, and this doesn't use the generic Cast implementation (from the compute kernels), but has a custom CastImpl in scalar.cc. Not fully sure about the reason for this, but maybe historically we wanted to have scalar casting without relying on the optional compute module? (cfr #25025)
Component(s)
C++