[Datasets] Update docs for drop_columns and fix typos#26317
[Datasets] Update docs for drop_columns and fix typos#26317clarkzinzow merged 2 commits intoray-project:masterfrom
Conversation
| ~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| Similarly, you can pass in a filter to ``ray.data.read_parquet()`` (selection pushdown) | ||
| Similarly, you can pass in a filter to ``ray.data.read_parquet()`` (filter pushdown) |
There was a problem hiding this comment.
selection pushdown is confusing in data world, as it normally means projection to me. Other systems (such as Spark, Presto, Parquet, etc) are always using filter pushdown. We are also using filter pushdown in other places e.g. here
There was a problem hiding this comment.
Yeah, "selection" means differently in SQL v.s. relational algebra. Using "filter" seems a good choice.
jianoaix
left a comment
There was a problem hiding this comment.
nit: there seems a typo in API comment (closing bracket): https://sourcegraph.com/github.com/ray-project/ray@master/-/blob/python/ray/data/dataset.py?L564
| ~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| Similarly, you can pass in a filter to ``ray.data.read_parquet()`` (selection pushdown) | ||
| Similarly, you can pass in a filter to ``ray.data.read_parquet()`` (filter pushdown) |
There was a problem hiding this comment.
Yeah, "selection" means differently in SQL v.s. relational algebra. Using "filter" seems a good choice.
@jianoaix - ah good catch, fixed it. |
|
@clarkzinzow review or merge? |
clarkzinzow
left a comment
There was a problem hiding this comment.
LGTM, and many thanks for the drivebys!
| :ref:`tensor data guide <datasets_tensor_support>` for more information on working | ||
| with tensors in Datasets. Although this simple example demonstrates reading a single | ||
| file, note that Datasets can also read directories of JSON files, with one tensor | ||
| file, note that Datasets can also read directories of NumPy files, with one tensor |
| ~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| Similarly, you can pass in a filter to ``ray.data.read_parquet()`` (selection pushdown) | ||
| Similarly, you can pass in a filter to ``ray.data.read_parquet()`` (filter pushdown) |
|
Thank you @clarkzinzow and @jianoaix for review! |
* master: (42 commits) [dashboard][2/2] Add endpoints to dashboard and dashboard_agent for liveness check of raylet and gcs (ray-project#26408) [Doc] Fix docs feedback button (ray-project#26402) [core][1/2] Improve liveness check in GCS (ray-project#26405) [RLlib] Checkpoint and restore connectors. (ray-project#26253) [Workflow] Minor refactoring of workflow exceptions (ray-project#26398) [workflow] Workflow queue (ray-project#24697) [RLlib] Minor simplification of code. (ray-project#26312) [AIR] Update TensorflowPredictor to new API (ray-project#26215) [RLlib] Make Dataset reader default reader and enable CRR to use dataset (ray-project#26304) [runtime_env] [doc] Remove outdated info about "isolated" environment (ray-project#26314) [Doc] Fix rate-the-docs plugin (ray-project#26384) [Docs] [Serve] Has a consistent landing page style (ray-project#26029) [dashboard] Add `RAY_CLUSTER_ACTIVITY_HOOK` to `/api/component_activities` (ray-project#26297) [tune] Use `Checkpoint.to_bytes()` for store_to_object (ray-project#25805) [tune] Fix `SyncerCallback` having a size limit (ray-project#26371) [air] Serialize additional files in dict checkpoints turned dir checkpoints (ray-project#26351) [Docs] Add "rate the docs" plugin for feedback on docs (ray-project#26330) [Doc] Fix actor example (ray-project#26381) Set RAY_USAGE_STATS_EXTRA_TAGS for release tests (ray-project#26366) [Datasets] Update docs for drop_columns and fix typos (ray-project#26317) ...
) We added drop_columns() API to datasets in ray-project#26200, so updating documentation here to use the new API - doc/source/data/examples/nyc_taxi_basic_processing.ipynb. In addition, fixing some minor typos after proofreading the datasets documentation. Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>
Why are these changes needed?
We added
drop_columns()API to datasets in #26200, so updating documentation here to use the new API -doc/source/data/examples/nyc_taxi_basic_processing.ipynb. In addition, fixing some minor typos after proofreading the datasets documentation.Related issue number
Closes #26113
Checks
scripts/format.shto lint the changes in this PR.