clib.conversion: Remove the unused array_to_datetime function#3507
clib.conversion: Remove the unused array_to_datetime function#3507
Conversation
e5cf4f1 to
705aa4a
Compare
705aa4a to
d7503d1
Compare
d7503d1 to
20b9215
Compare
pygmt/clib/session.py
Outdated
| # latitude, or datetime string types. | ||
| for col, array in enumerate(arrays[2:]): | ||
| if pd.api.types.is_string_dtype(array.dtype): | ||
| if array.dtype.type == np.str_: |
There was a problem hiding this comment.
I think we'll need to check if this can handle pandas.StringDtype and pyarrow.StringArray (xref #2933).
There was a problem hiding this comment.
Both can be converted to the numpy string dtype by the vectors_to_arrays method, so in virtualfile_from_vectors, we only need to check np.str_:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: import pyarrow as pa
In [4]: x = pd.Series(["abc", "defghi"], dtype="string")
In [5]: np.asarray(x)
Out[5]: array(['abc', 'defghi'], dtype=object)
In [6]: np.asarray(x, dtype=str)
Out[6]: array(['abc', 'defghi'], dtype='<U6')
In [7]: y = pa.array(["abc", "defghi"])
In [8]: np.asarray(y)
Out[8]: array(['abc', 'defghi'], dtype=object)
In [9]: np.asarray(y, dtype=str)
Out[9]: array(['abc', 'defghi'], dtype='<U6')
There was a problem hiding this comment.
The main idea of this PR is to let vectors_to_arrays do the conversion work from any dtypes (including pd.StringDtype and pa.StringArray) into the numpy dtypes, so that the virtualfile_from_vectors only need to care about how to map numpy dtypes into GMT data types.
For any special dtypes that we know how to convert it to numpy dtype, we can maintain a mapping dictionary, just like what you did to support pyarrow's date32[day] and date64[ms] in #2845:
pygmt/pygmt/clib/conversion.py
Lines 208 to 211 in c2e429c
There was a problem hiding this comment.
In 83673cf, I've moved most of the doctests into a separate test file pygmt/tests/test_clib_vectors_to_arrays.py. A test test_vectors_to_arrays_pandas_string is added to check that the function can handle pd.StringDtype correctly.
There was a problem hiding this comment.
For any special dtypes that we know how to convert it to numpy dtype, we can maintain a mapping dictionary, just like what you did to support pyarrow's date32[day] and date64[ms] in #2845:
pygmt/pygmt/clib/conversion.py
Lines 208 to 211 in c2e429c
Based on the tests below, I think we should add the entry "string": np.str_ to the dictionary:
In [1]: import pandas as pd
In [2]: x = pd.Series(["abc", "12345"])
In [3]: x.dtype
Out[3]: dtype('O')
In [4]: str(x.dtype)
Out[4]: 'object'
In [5]: x = pd.Series(["abc", "12345"], dtype="string")
In [6]: x.dtype
Out[6]: string[python]
In [7]: str(x.dtype)
Out[7]: 'string'
In [8]: x = pd.Series(["abc", "12345"], dtype="string[pyarrow]")
In [9]: x.dtype
Out[9]: string[pyarrow]
In [10]: str(x.dtype)
Out[10]: 'string'
In [11]: import pyarrow as pa
In [12]: x = pa.array(["abc", "defghi"])
In [13]: x.type
Out[13]: DataType(string)
In [14]: str(x.type)
Out[14]: 'string'There was a problem hiding this comment.
In PR #2774 and #2774, we only checked if PyGMT supports pandas with the pyarrow backend, but didn't check if the original pyarrow arrays works. For example, for a pyarrow date32 array, we need to check array.type rather than array.dtype:
In [1]: import datetime
In [2]: import pyarrow as pa
In [3]: x = pa.array([datetime.date(2020, 1, 1), datetime.date(2021, 12, 31)])
In [4]: str(x.type)
Out[4]: 'date32[day]'
There was a problem hiding this comment.
For example, for a pyarrow
date32array, we need to checkarray.type
Yes, raw pyarrow arrays of date32/date64 type are not supported yet. That's why it's marked 🚧 in #2800 (I was planning to modify array_to_datetime to handle it).
aa3ffc3 to
6338cde
Compare
a029191 to
54160bf
Compare
| # For 1-D arrays, try to convert unknown object type to np.datetime64. | ||
| if ndim == 1 and array.dtype.type is np.object_: | ||
| with contextlib.suppress(ValueError): | ||
| array = array_to_datetime(array) |
| If the datetime string is invalid. | ||
|
|
||
| Examples | ||
| -------- |
There was a problem hiding this comment.
These doctests are already covered by the tests in test_clib_to_numpy.py
|
Ping @GenericMappingTools/pygmt-maintainers for final reviews. I plan to merge this PR on 2025/02/08 if there are no objections. |
After a series of PRs (especially #3760), the
_to_numpyfunction takes the responsibility to convert any array-like objects into numpy arrays with recognized dtypes. So, if_to_numpyreturns anp.objectdtype string, it means the original array cannot to converted to a numpy array that can be passed to GMT.After #3760, the
array_to_datetimefunction is no longer used and can be removed.