Implement DataArray.to_dask_dataframe()#7635
Conversation
xarray/core/dataarray.py
Outdated
| Examples | ||
| -------- | ||
|
|
||
| da=xr.DataArray(np.random.rand(4,3,2), |
There was a problem hiding this comment.
This example is not correctly formatted, see other functions as reference.
Also, don't use random values in examples, simply use np.ones(...)
There was a problem hiding this comment.
np.ones can hide errors when dealing with tricky shapes so something like np.arange(4*3*2).reshape(4,3,2) is a little better.
xarray/tests/test_dataarray.py
Outdated
| assert_array_equal(actual.index.names, list("ABC")) | ||
|
|
||
| def test_to_dask_dataframe(self) -> None: | ||
| arr_np = np.random.randn(3, 4) |
There was a problem hiding this comment.
Even though it doesn't really matter in most cases, we try to avoid random values in tests.
Maybe use np.arange(3*4).reshape(3,4).
xarray/core/dataarray.py
Outdated
| if self.ndim == 0: | ||
| raise ValueError("Cannot convert a scalar to a dataframe") | ||
|
|
||
| tmp_dataset = Dataset({name: self}) |
There was a problem hiding this comment.
Normally we use the to tmp dataset method here, but since we only use it to construct the data frame and don't roundtrip it doesn't actually matter?
xarray/core/dataarray.py
Outdated
| dim_order: Sequence of Hashable or None , optional | ||
| Hierarchical dimension order for the resulting dataframe. |
There was a problem hiding this comment.
| dim_order: Sequence of Hashable or None , optional | |
| Hierarchical dimension order for the resulting dataframe. | |
| dim_order: Sequence of Hashable or None , optional | |
| Hierarchical dimension order for the resulting dataframe. |
Follow numpys docstring conventions. More errors above and below.
xarray/core/dataarray.py
Outdated
| if name is None: | ||
| name = self.name | ||
|
|
||
| if name is None: | ||
| raise ValueError( | ||
| "Cannot convert an unnamed DataArray to a " | ||
| "dask dataframe : use the ``name`` parameter" | ||
| ) |
There was a problem hiding this comment.
| if name is None: | |
| name = self.name | |
| if name is None: | |
| raise ValueError( | |
| "Cannot convert an unnamed DataArray to a " | |
| "dask dataframe : use the ``name`` parameter" | |
| ) |
Not needed when using self._to_dataset_whole.
There was a problem hiding this comment.
Can it be better to keep this error message ? When I removed it, The error shown was ' unable to convert unnamed DataArray to a Dataset without providing an explicit name ' . Keeping these lines can show the error message specific to dataarray to daskdataframe conversion.
xarray/core/dataarray.py
Outdated
| Examples | ||
| -------- | ||
|
|
||
| da=xr.DataArray(np.random.rand(4,3,2), |
There was a problem hiding this comment.
np.ones can hide errors when dealing with tricky shapes so something like np.arange(4*3*2).reshape(4,3,2) is a little better.
|
I have made the changes as suggested. Please review them .Thanks |
xarray/core/dataarray.py
Outdated
| if name is None: | ||
| name = self.name | ||
|
|
||
| if name is None: | ||
| raise ValueError( | ||
| "Cannot convert an unnamed DataArray to a " | ||
| "dask dataframe : use the ``name`` parameter ." | ||
| ) | ||
| ds = self._to_dataset_whole(name) | ||
| return ds.to_dask_dataframe(dim_order, set_index) |
There was a problem hiding this comment.
| if name is None: | |
| name = self.name | |
| if name is None: | |
| raise ValueError( | |
| "Cannot convert an unnamed DataArray to a " | |
| "dask dataframe : use the ``name`` parameter ." | |
| ) | |
| ds = self._to_dataset_whole(name) | |
| return ds.to_dask_dataframe(dim_order, set_index) | |
| name = self.name if self.name is not None else _THIS_ARRAY | |
| ds = self._to_dataset_whole(name, shallow_copy=False) | |
| return ds.to_dask_dataframe(dim_order, set_index) |
I think we go with this. I don't think it should be necessary to name the dataarray which is more in line with how self._to_temp_dataset works and setting dataarray.name = "new_name" is easy enough.
xarray/core/dataarray.py
Outdated
|
|
||
| name : Hashable or None, optional | ||
| Name given to this array(required if unnamed). | ||
| It is a keyword-only argument. A keyword-only argument can only be passed | ||
| to the function using its name as a keyword argument , and not as a | ||
| positional argument. | ||
|
|
There was a problem hiding this comment.
| name : Hashable or None, optional | |
| Name given to this array(required if unnamed). | |
| It is a keyword-only argument. A keyword-only argument can only be passed | |
| to the function using its name as a keyword argument , and not as a | |
| positional argument. |
xarray/core/dataarray.py
Outdated
| *, | ||
| name: Hashable | None = None, |
There was a problem hiding this comment.
| *, | |
| name: Hashable | None = None, |
There was a problem hiding this comment.
I have made the changes .Please review them.
…n2/xarray into method-dataarray-to-daskdataframe Updating branch doc/whats-new.rst
Illviljan
left a comment
There was a problem hiding this comment.
There's still an issue with the docstring example, probably some whitespace mismatch somewhere. It should just be copy/pasting the results from the ipython console.
doc/whats-new.rst
Outdated
| By `Deepak Cherian <https://github.com/dcherian>`_. | ||
| - Improved performance in ``open_dataset`` for datasets with large object arrays (:issue:`7484`, :pull:`7494`). | ||
| By `Alex Goodman <https://github.com/agoodm>`_ and `Deepak Cherian <https://github.com/dcherian>`_. | ||
| - Added new method :py:meth:`DataArray.to_dask_dataframe`,convert a dataarray into a dask dataframe (:issue:`7409`). |
xarray/core/dataarray.py
Outdated
| vectors in contiguous order , so the last dimension in this list | ||
| will be contiguous in the resulting DataFrame. This has a major influence | ||
| on which operations are efficient on the resulting dask dataframe. | ||
|
|
There was a problem hiding this comment.
I have made the changes .Please review them. Thanks
xarray/core/dataarray.py
Outdated
| if self.name is None: | ||
| raise ValueError( | ||
| "Cannot convert an unnamed DataArray to a " | ||
| "dask dataframe : use the ``name`` parameter ." |
There was a problem hiding this comment.
| "dask dataframe : use the ``name`` parameter ." | |
| "dask dataframe : use the ``.rename`` method to assign a name." |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
Thanks for your patience here @dsgreen2 . This is a nice contribution. Welcome to Xarray! |
for more information, see https://pre-commit.ci
* main: Introduce Grouper objects internally (pydata#7561) [skip-ci] Add cftime groupby, resample benchmarks (pydata#7795) Fix groupby binary ops when grouped array is subset relative to other (pydata#7798) adjust the deprecation policy for python (pydata#7793) [pre-commit.ci] pre-commit autoupdate (pydata#7803) Allow the label run-upstream to run upstream CI (pydata#7787) Update asv links in contributing guide (pydata#7801) Implement DataArray.to_dask_dataframe() (pydata#7635) `ds.to_dict` with data as arrays, not lists (pydata#7739) Add lshift and rshift operators (pydata#7741) Use canonical name for set_horizonalalignment over alias set_ha (pydata#7786) Remove pandas<2 pin (pydata#7785) [pre-commit.ci] pre-commit autoupdate (pydata#7783)

Adds a method to_dask_dataframe() to convert a dataarray to a dask dataframe.
DataArray.to_dask_dataframe()#7409whats-new.rstapi.rstI have added the function to_dask_dataframe() in dataarray.py . This implementation is as suggested in issue #7409 . The function first converts the data array to a temporary dataset and then calls Dataset.to_dask_dataframe() method.
Could you please review it . Thank you.