[DataFrame] Implements filter and dropna#1959
[DataFrame] Implements filter and dropna#1959devin-petersohn merged 11 commits intoray-project:masterfrom
Conversation
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
subset = [item for item in self.index if item in subset]Is slightly faster. Also you can probably refactor a bit by putting the top level if statement just be if subset is not None
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
This doesn't return another _IndexMetadata
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
Any reason this import is local to filter?
python/ray/dataframe/dataframe.py
Outdated
There was a problem hiding this comment.
Possible subtle errors here with empty strings/lists or other funny falsy values
python/ray/dataframe/dataframe.py
Outdated
|
Test PASSed. |
|
Test FAILed. |
|
Test PASSed. |
|
Test FAILed. |
* Implement multiple axis for dropna * Add multiple axis dropna test * Fix using dummy_frame in dropna * Clean up dropna multiple axis tests * remove unnecessary axis modification * Clean up dropna tests
|
Test PASSed. |
devin-petersohn
left a comment
There was a problem hiding this comment.
Looks good, just a few minor comments.
python/ray/dataframe/dataframe.py
Outdated
| if not inplace: | ||
| return result | ||
|
|
||
| return self._update_inplace( |
There was a problem hiding this comment.
Don't return self._update_inplace, it will return None.
There was a problem hiding this comment.
Based on the docs, when inplace=True, the function should return false. I made this a bit clearer in the code that this is intentional.
python/ray/dataframe/dataframe.py
Outdated
| ) | ||
|
|
||
| axis = pd.DataFrame()._get_axis_number(axis) | ||
| inplace = validate_bool_kwarg(inplace, "inplace") |
There was a problem hiding this comment.
You should move this up higher, before the is_list_like block.
|
|
||
| if axis == 1: | ||
| new_vals = [self._col_metadata.get_global_indices(i, vals) | ||
| for i, vals in enumerate(ray.get(new_vals))] |
There was a problem hiding this comment.
You can build the columns in a remote task. It might be better
There was a problem hiding this comment.
If I move this code to a remote task, it would require serializing self._col_metadata._lengths and self.columns, which could potentially be expensive. The code would later block on a ray.get on the new columns generated remotely.
Given there would be a blocking operation either way, I think this is fine for now but can be revisited.
There was a problem hiding this comment.
Where the blocking operation occurs does matter. Here the main thread will be waiting until the entire dropna operation is complete. This happens whether you need the columns immediately or not. We should block when the user needs it, not when the user calls an operation. Given this can be an iteratively executed operation, we should not block on each call.
|
|
||
| else: | ||
| new_vals = [self._row_metadata.get_global_indices(i, vals) | ||
| for i, vals in enumerate(ray.get(new_vals))] |
There was a problem hiding this comment.
Same here with the index.
|
Test FAILed. |
|
Jenkins, retest this please |
|
Test PASSed. |
|
Test FAILed. |
|
Jenkins, retest this please |
|
Test PASSed. |
|
Passes private-travis. Merged, thanks @kunalgosar ! |
* master: (21 commits) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973) Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) add pthread linking (ray-project#1986) [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984) ...
* master: (25 commits) [DataFrame] Add direct pandas imports for MVP (ray-project#1960) Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973) Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) ...
* master: [DataFrame] Add direct pandas imports for MVP (ray-project#1960) Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973)
This PR implements
df.filteranddf.dropna.