Implement multiple axis for dropna by SaladRaider · Pull Request #13 · kunalgosar/ray

SaladRaider · 2018-05-03T01:10:11Z

What do these changes do?

Implement dropna when axis is tuple or list.

kunalgosar

Thanks for the addition! A few comments though.

kunalgosar · 2018-05-04T02:11:45Z

python/ray/dataframe/test/test_dataframe.py

+    cp = ray_df.copy()
+    result = ray_df.dropna(how='all', axis=[0, 1])
+    result2 = ray_df.dropna(how='all', axis=(0, 1))
+    expected = pd_df.dropna(how='all').dropna(how='all', axis=1)


You don't need this. For now, just comparing ray and pandas is fine.

kunalgosar · 2018-05-04T02:11:53Z

python/ray/dataframe/test/test_dataframe.py


-    ray_df_equals_pandas(ray_df.dropna(axis=1, how='any'),
-                         pd_df.dropna(axis=1, how='any'))
+    assert ray_df_equals_pandas(ray_df.dropna(axis=1, how='any'),


Great catch, thanks!

kunalgosar · 2018-05-04T02:13:57Z

python/ray/dataframe/dataframe.py

-            raise NotImplementedError(
-                "To contribute to Pandas on Ray, please visit "
-                "github.com/ray-project/ray.")
+            axis = set([pd.DataFrame()._get_axis_number(ax) for ax in axis])


Don't put it in a set here.

kunalgosar · 2018-05-04T02:14:11Z

python/ray/dataframe/dataframe.py

-                "To contribute to Pandas on Ray, please visit "
-                "github.com/ray-project/ray.")
+            axis = set([pd.DataFrame()._get_axis_number(ax) for ax in axis])
+            result = self


unnecessary

I think it is necessary in this case b/c the for loop makes reference to result. this is consistent with the pandas source code.

kunalgosar · 2018-05-04T02:14:44Z

python/ray/dataframe/dataframe.py

-                "github.com/ray-project/ray.")
+            axis = set([pd.DataFrame()._get_axis_number(ax) for ax in axis])
+            result = self
+            for ax in axis:


Add a comment here that this is inefficient since it forces the DataFrame to be built as an intermediate and should be fixed later.

kunalgosar · 2018-05-04T02:15:07Z

python/ray/dataframe/dataframe.py

+            if not inplace:
+                return result
+
+            return self._update_inplace(


Pass _block_partitions in instead, it's more efficient.

kunalgosar · 2018-05-04T02:15:43Z

python/ray/dataframe/test/test_dataframe.py

+    assert ray_df_equals_pandas(result, expected)
+    assert ray_df_equals_pandas(result2, expected)
+
+    assert ray_df_equals_pandas(result, expected)


Why is this code duplicated?

kunalgosar · 2018-05-04T02:16:23Z

python/ray/dataframe/test/test_dataframe.py

+    assert ray_df_equals_pandas(result2, expected)
+    assert ray_df_equals(ray_df, cp)
+
+    inp = ray_df.copy()


You already make a copy above, use it since none of those operations mutated the copy.

kunalgosar · 2018-05-04T02:16:41Z

python/ray/dataframe/test/test_dataframe.py

+
+    inp = ray_df.copy()
+    inp.dropna(how='all', axis=(0, 1), inplace=True)
+    assert ray_df_equals_pandas(inp, expected)


Split inplace test from other test and model after the other dropna tests.

kunalgosar

Few more!

kunalgosar · 2018-05-04T02:29:24Z

python/ray/dataframe/test/test_dataframe.py

+
+@pytest.fixture
+def test_dropna_multiple_axes_inplace(ray_df, pd_df):
+    ray_df = ray_df.copy()


change the name here to ray_df_copy

kunalgosar · 2018-05-04T02:30:09Z

python/ray/dataframe/test/test_dataframe.py

+
+    assert ray_df_equals_pandas(ray_df, pd_df)
+
+    ray_df.dropna(how='all', axis=(0, 1), inplace=True)


create a new copy of the original dfs first, else this operates on the result of the previous dropna call.

kunalgosar · 2018-05-04T02:30:36Z

python/ray/dataframe/dataframe.py

+
+            return self._update_inplace(
+                block_partitions=result._block_partitions,
+                columns=result._col_metadata.index,


self.columns here

kunalgosar · 2018-05-04T02:30:45Z

python/ray/dataframe/dataframe.py

+            return self._update_inplace(
+                block_partitions=result._block_partitions,
+                columns=result._col_metadata.index,
+                index=result._row_metadata.index


self.index here

kunalgosar · 2018-05-04T02:32:13Z

python/ray/dataframe/dataframe.py

-                "To contribute to Pandas on Ray, please visit "
-                "github.com/ray-project/ray.")
+            result = self
+            for ax in axis:  # TODO: inefficient, df built as intermediate


prefer comment: # TODO(kunalgosar): this builds an intermediate dataframe, which does unnecessary computation

Also, put comment on its own line.

kunalgosar

Looks great!! Thanks

* implement filter * begin implementation of dropna * implement dropna * docs and tests * resolving comments * resolving merge * add error checking to dropna * fix update inplace call * Implement multiple axis for dropna (#13) * Implement multiple axis for dropna * Add multiple axis dropna test * Fix using dummy_frame in dropna * Clean up dropna multiple axis tests * remove unnecessary axis modification * Clean up dropna tests * resolve comments * fix lint

kunalgosar force-pushed the filter branch from ed6c9a3 to f082a5d Compare May 4, 2018 01:22

SaladRaider added 3 commits May 3, 2018 19:03

Implement multiple axis for dropna

9c4e2b1

Add multiple axis dropna test

b6403f4

Fix using dummy_frame in dropna

666860c

SaladRaider force-pushed the peter/filter branch from 685ceae to 666860c Compare May 4, 2018 02:05

kunalgosar requested changes May 4, 2018

View reviewed changes

kunalgosar mentioned this pull request May 4, 2018

[DataFrame] Implements filter and dropna ray-project/ray#1959

Merged

SaladRaider added 2 commits May 3, 2018 19:26

Clean up dropna multiple axis tests

582f628

remove unnecessary axis modification

f9e14e7

kunalgosar requested changes May 4, 2018

View reviewed changes

Clean up dropna tests

fed275e

kunalgosar approved these changes May 4, 2018

View reviewed changes

kunalgosar merged commit 16faf64 into kunalgosar:filter May 4, 2018


		assert ray_df_equals_pandas(ray_df, pd_df)

		ray_df.dropna(how='all', axis=(0, 1), inplace=True)

Conversation

SaladRaider commented May 3, 2018

What do these changes do?

Uh oh!

kunalgosar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kunalgosar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kunalgosar left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants