[DataFrame] Implement where#1989
Conversation
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
kunalgosar
left a comment
There was a problem hiding this comment.
Looks good! A few comments.
python/ray/dataframe/utils.py
Outdated
| """ | ||
| if df._row_partitions is not None: | ||
| pd_df = pd.concat(ray.get(df._row_partitions)) | ||
| print("Yes") |
python/ray/dataframe/dataframe.py
Outdated
| other_zipped = (v for k, v in self._copartition(other, | ||
| self.index)) | ||
|
|
||
| new_partitions = [where_helper.remote(k, v, next(other_zipped), |
There was a problem hiding this comment.
Can k, v be converted to lists and passed in by reference? Ray will automatically deserialize then.
There was a problem hiding this comment.
not without merging them together, then also passing the length of the left. Performance-wise it's not much different.
python/ray/dataframe/dataframe.py
Outdated
| # from blocks and the axes are set according to the blocks. We have | ||
| # already correctly copartitioned everything, so there's no | ||
| # correctness problems with doing this. | ||
| left.reset_index(inplace=True, drop=True) |
There was a problem hiding this comment.
Since everything is concatenated into row partitions, can you only reset the column index?
There was a problem hiding this comment.
We have to reset the index here because that's what other is relying on.
|
Test FAILed. |
|
Test PASSed. |
python/ray/dataframe/dataframe.py
Outdated
| args = (False, axis, level, errors, try_cast, raise_on_error) | ||
|
|
||
| @ray.remote | ||
| def where_helper(left, cond, other, left_columns, cond_columns, |
There was a problem hiding this comment.
It's dangerous defining a remote function inside of a method call, because this will define a new remote function every time the method is called. Currently this is a bit heavyweight. We probably want to move this outside.
There was a problem hiding this comment.
Oh yeah, thanks. I had this in during development and forgot to move it. Sorry about that!
|
Test PASSed. |
* master: [DataFrame] Implement where (ray-project#1989) [DataFrame] Add direct pandas imports for MVP (ray-project#1960) Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007)
* master: [xray] Fix UniqueID hashing for object and task IDs. (ray-project#2017) [DataFrame] Fixing bugs in groupby (ray-project#2031) [DataFrame] Fixes dropna subset bug (ray-project#2018) [DataFrame] Implement where (ray-project#1989)
Implement
DataFrame.where.Still needs:
wherewhenSeriesobjects are passed in asother.