Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions python/pyspark/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -296,8 +296,7 @@ def create_array(s, t):
mask = s.isnull()
# Ensure timestamp series are in expected form for Spark internal representation
if t is not None and pa.types.is_timestamp(t):
s = _check_series_convert_timestamps_internal(s.fillna(0), self._timezone)

s = _check_series_convert_timestamps_internal(s, self._timezone)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this doesn't fail existing tests. @BryanCutler do you remember why are we doing fillna(0) here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it was due to a Pandas error, most likely because we were testing with 0.19.2 at the time. Can you manually run some tests with different Pandas versions? It will be best to test with older versions, but it might be kind of hard to get 0.19.2 working with pyarrow 0.12.1 though..

Copy link
Contributor Author

@icexelloss icexelloss Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah pandas 0.19.2 doesn't work with pyarrow 0.12. I cannot run arrow tests with pandas 0.19.2 anymore.

Since we are requiring min arrow version to be 0.12, it means pandas version 0.19.2 is not supported if the user wants to use Arrow.

try:
array = pa.Array.from_pandas(s, mask=mask, type=t, safe=self._safecheck)
except pa.ArrowException as e:
Expand Down
9 changes: 9 additions & 0 deletions python/pyspark/sql/tests/test_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -383,6 +383,15 @@ def test_timestamp_dst(self):
assert_frame_equal(pdf, df_from_python.toPandas())
assert_frame_equal(pdf, df_from_pandas.toPandas())

# Regression test for SPARK-28003
def test_timestamp_nat(self):
dt = [pd.NaT, pd.Timestamp('2019-06-11'), None] * 100
pdf = pd.DataFrame({'time': dt})
df_no_arrow, df_arrow = self._createDataFrame_toggle(pdf)

assert_frame_equal(pdf, df_no_arrow.toPandas())
assert_frame_equal(pdf, df_arrow.toPandas())

def test_toPandas_batch_order(self):

def delay_first_part(partition_index, iterator):
Expand Down