Implement multiple axis for dropna#13
Conversation
kunalgosar
left a comment
There was a problem hiding this comment.
Thanks for the addition! A few comments though.
| cp = ray_df.copy() | ||
| result = ray_df.dropna(how='all', axis=[0, 1]) | ||
| result2 = ray_df.dropna(how='all', axis=(0, 1)) | ||
| expected = pd_df.dropna(how='all').dropna(how='all', axis=1) |
There was a problem hiding this comment.
You don't need this. For now, just comparing ray and pandas is fine.
|
|
||
| ray_df_equals_pandas(ray_df.dropna(axis=1, how='any'), | ||
| pd_df.dropna(axis=1, how='any')) | ||
| assert ray_df_equals_pandas(ray_df.dropna(axis=1, how='any'), |
python/ray/dataframe/dataframe.py
Outdated
| raise NotImplementedError( | ||
| "To contribute to Pandas on Ray, please visit " | ||
| "github.com/ray-project/ray.") | ||
| axis = set([pd.DataFrame()._get_axis_number(ax) for ax in axis]) |
| "To contribute to Pandas on Ray, please visit " | ||
| "github.com/ray-project/ray.") | ||
| axis = set([pd.DataFrame()._get_axis_number(ax) for ax in axis]) | ||
| result = self |
There was a problem hiding this comment.
I think it is necessary in this case b/c the for loop makes reference to result. this is consistent with the pandas source code.
| "github.com/ray-project/ray.") | ||
| axis = set([pd.DataFrame()._get_axis_number(ax) for ax in axis]) | ||
| result = self | ||
| for ax in axis: |
There was a problem hiding this comment.
Add a comment here that this is inefficient since it forces the DataFrame to be built as an intermediate and should be fixed later.
| if not inplace: | ||
| return result | ||
|
|
||
| return self._update_inplace( |
There was a problem hiding this comment.
Pass _block_partitions in instead, it's more efficient.
| assert ray_df_equals_pandas(result, expected) | ||
| assert ray_df_equals_pandas(result2, expected) | ||
|
|
||
| assert ray_df_equals_pandas(result, expected) |
| assert ray_df_equals_pandas(result2, expected) | ||
| assert ray_df_equals(ray_df, cp) | ||
|
|
||
| inp = ray_df.copy() |
There was a problem hiding this comment.
You already make a copy above, use it since none of those operations mutated the copy.
|
|
||
| inp = ray_df.copy() | ||
| inp.dropna(how='all', axis=(0, 1), inplace=True) | ||
| assert ray_df_equals_pandas(inp, expected) |
There was a problem hiding this comment.
Split inplace test from other test and model after the other dropna tests.
|
|
||
| @pytest.fixture | ||
| def test_dropna_multiple_axes_inplace(ray_df, pd_df): | ||
| ray_df = ray_df.copy() |
There was a problem hiding this comment.
change the name here to ray_df_copy
|
|
||
| assert ray_df_equals_pandas(ray_df, pd_df) | ||
|
|
||
| ray_df.dropna(how='all', axis=(0, 1), inplace=True) |
There was a problem hiding this comment.
create a new copy of the original dfs first, else this operates on the result of the previous dropna call.
python/ray/dataframe/dataframe.py
Outdated
|
|
||
| return self._update_inplace( | ||
| block_partitions=result._block_partitions, | ||
| columns=result._col_metadata.index, |
python/ray/dataframe/dataframe.py
Outdated
| return self._update_inplace( | ||
| block_partitions=result._block_partitions, | ||
| columns=result._col_metadata.index, | ||
| index=result._row_metadata.index |
python/ray/dataframe/dataframe.py
Outdated
| "To contribute to Pandas on Ray, please visit " | ||
| "github.com/ray-project/ray.") | ||
| result = self | ||
| for ax in axis: # TODO: inefficient, df built as intermediate |
There was a problem hiding this comment.
prefer comment: # TODO(kunalgosar): this builds an intermediate dataframe, which does unnecessary computation
Also, put comment on its own line.
* implement filter * begin implementation of dropna * implement dropna * docs and tests * resolving comments * resolving merge * add error checking to dropna * fix update inplace call * Implement multiple axis for dropna (#13) * Implement multiple axis for dropna * Add multiple axis dropna test * Fix using dummy_frame in dropna * Clean up dropna multiple axis tests * remove unnecessary axis modification * Clean up dropna tests * resolve comments * fix lint
What do these changes do?
Implement dropna when axis is tuple or list.