From 6d7c7af467c033fa4407e889b6a7bd14621fea3e Mon Sep 17 00:00:00 2001 From: Alenka Frim Date: Thu, 9 Dec 2021 13:57:58 +0100 Subject: [PATCH 1/9] "we" to "you" --- r/vignettes/python.Rmd | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/r/vignettes/python.Rmd b/r/vignettes/python.Rmd index c05ee7dc7c3..6c4508fb2bd 100644 --- a/r/vignettes/python.Rmd +++ b/r/vignettes/python.Rmd @@ -62,8 +62,8 @@ a ## ] ``` -`a` is now an `Array` object in our R session, even though we created it in Python. -We can apply R methods on it: +`a` is now an `Array` object in our R session, even though you created it in Python. +You can apply R methods on it: ```r a[a > 1] @@ -76,10 +76,10 @@ a[a > 1] ## ] ``` -We can send data both ways. One reason we might want to use `pyarrow` in R is +You can send data both ways. One reason you might want to use `pyarrow` in R is to take advantage of functionality that is better supported in Python than in R. For example, `pyarrow` has a `concat_arrays` function, but as of 0.17, this -function is not implemented in the `arrow` R package. We can use `reticulate` +function is not implemented in the `arrow` R package. You can use `reticulate` to use it efficiently. ```r @@ -101,7 +101,7 @@ a_and_b ## ] ``` -Now we have a single `Array` in R. +Now you have a single `Array` in R. "Send", however, isn't the correct word. Internally, we're passing pointers to the data between the R and Python interpreters running together in the same From 2ae9429c2ee53c2d43a3e995a472236996e7ab99 Mon Sep 17 00:00:00 2001 From: Alenka Frim Date: Thu, 9 Dec 2021 14:25:06 +0100 Subject: [PATCH 2/9] Add link to the arrow vignette --- r/vignettes/python.Rmd | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/r/vignettes/python.Rmd b/r/vignettes/python.Rmd index 6c4508fb2bd..05a1a486e06 100644 --- a/r/vignettes/python.Rmd +++ b/r/vignettes/python.Rmd @@ -108,6 +108,15 @@ the data between the R and Python interpreters running together in the same process, without copying anything. Nothing is being sent: we're sharing and accessing the same internal Arrow memory buffers. +## Arrow object types + +For more information about Arrow object types see the Internals section of +the Arrow vignette by calling: + +```r +vignette("arrow", package = "arrow") +``` + ## Troubleshooting If you get an error like From c47a38023c1ceb10011de0dbc03b3769273a1f39 Mon Sep 17 00:00:00 2001 From: Alenka Frim Date: Thu, 9 Dec 2021 14:31:57 +0100 Subject: [PATCH 3/9] Add info about virtualenv --- r/vignettes/python.Rmd | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/r/vignettes/python.Rmd b/r/vignettes/python.Rmd index 05a1a486e06..a51273c910d 100644 --- a/r/vignettes/python.Rmd +++ b/r/vignettes/python.Rmd @@ -28,6 +28,11 @@ add `nightly = TRUE`: install_pyarrow("arrow-env", nightly = TRUE) ``` +A virtualenv or a virtual environment is a specific Python installation +created for one project or purpose. It is a good practice to use +specific environments in Python so that updating a package doesn't +impact packages in other projects. + `install_pyarrow()` also works with `conda` environments (`conda_create()` instead of `virtualenv_create()`). From c931775409e38ac04b2f049731c8f84f73969d81 Mon Sep 17 00:00:00 2001 From: Alenka Frim Date: Thu, 9 Dec 2021 14:36:39 +0100 Subject: [PATCH 4/9] Make clear the package is pyarrow --- r/vignettes/python.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/r/vignettes/python.Rmd b/r/vignettes/python.Rmd index a51273c910d..7b583e86c07 100644 --- a/r/vignettes/python.Rmd +++ b/r/vignettes/python.Rmd @@ -50,7 +50,7 @@ use_virtualenv("arrow-env") pa <- import("pyarrow") ``` -The package includes support for sharing Arrow `Array` and `RecordBatch` +The `pyarrow` package includes support for sharing Arrow `Array` and `RecordBatch` objects in-process between R and Python. For example, let's create an `Array` in `pyarrow`. From b9b4dc4f6941f2a09fe3aced83335fefaab1a56f Mon Sep 17 00:00:00 2001 From: Alenka Frim Date: Thu, 9 Dec 2021 14:45:47 +0100 Subject: [PATCH 5/9] Add a list of reasons to use pyarrow in R --- r/vignettes/python.Rmd | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/r/vignettes/python.Rmd b/r/vignettes/python.Rmd index 7b583e86c07..c2268d18cf4 100644 --- a/r/vignettes/python.Rmd +++ b/r/vignettes/python.Rmd @@ -10,6 +10,11 @@ vignette: > The `arrow` package provides `reticulate` methods for passing data between R and Python in the same process. This document provides a brief overview. +Why you might want to use `pyarrow`? +* To use a Python functionality, for example: `concat_arrays` function. +* To transfer Python objects into R, for example Pandas datatrame into R + arrow array. + ## Installing To use `arrow` in Python, at a minimum you'll need the `pyarrow` library. From 433774f1eed26b49ad78bea2c10a47e9ea09b1fe Mon Sep 17 00:00:00 2001 From: Alenka Frim Date: Thu, 9 Dec 2021 14:48:11 +0100 Subject: [PATCH 6/9] Add How this works section --- r/vignettes/python.Rmd | 2 ++ 1 file changed, 2 insertions(+) diff --git a/r/vignettes/python.Rmd b/r/vignettes/python.Rmd index c2268d18cf4..8b174ead14e 100644 --- a/r/vignettes/python.Rmd +++ b/r/vignettes/python.Rmd @@ -113,6 +113,8 @@ a_and_b Now you have a single `Array` in R. +## How this works + "Send", however, isn't the correct word. Internally, we're passing pointers to the data between the R and Python interpreters running together in the same process, without copying anything. Nothing is being sent: we're sharing and From 5424518649494a181a810c75cdf14ac89132dffc Mon Sep 17 00:00:00 2001 From: Alenka Frim Date: Thu, 9 Dec 2021 15:22:52 +0100 Subject: [PATCH 7/9] Apply suggestions from code review Co-authored-by: Nic Crane --- r/vignettes/python.Rmd | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/r/vignettes/python.Rmd b/r/vignettes/python.Rmd index 8b174ead14e..5710834aac3 100644 --- a/r/vignettes/python.Rmd +++ b/r/vignettes/python.Rmd @@ -7,13 +7,12 @@ vignette: > %\VignetteEncoding{UTF-8} --- -The `arrow` package provides `reticulate` methods for passing data between +The arrow package provides [reticulate](https://rstudio.github.io/reticulate/) methods for passing data between R and Python in the same process. This document provides a brief overview. Why you might want to use `pyarrow`? -* To use a Python functionality, for example: `concat_arrays` function. -* To transfer Python objects into R, for example Pandas datatrame into R - arrow array. +* To use some Python functionality that is not yet implemented in R, for example, the `concat_arrays` function. +* To transfer Python objects into R, for example, a Pandas dataframe into an R Arrow Array. ## Installing @@ -55,9 +54,9 @@ use_virtualenv("arrow-env") pa <- import("pyarrow") ``` -The `pyarrow` package includes support for sharing Arrow `Array` and `RecordBatch` +The pyarrow package includes support for sharing Arrow `Array` and `RecordBatch` objects in-process between R and Python. For example, let's create an `Array` -in `pyarrow`. +in pyarrow. ```r a <- pa$array(c(1, 2, 3)) @@ -72,7 +71,7 @@ a ## ] ``` -`a` is now an `Array` object in our R session, even though you created it in Python. +`a` is now an `Array` object in your R session, even though you created it in Python. You can apply R methods on it: ```r @@ -86,10 +85,10 @@ a[a > 1] ## ] ``` -You can send data both ways. One reason you might want to use `pyarrow` in R is +You can send data both ways. One reason you might want to use pyarrow in R is to take advantage of functionality that is better supported in Python than in R. -For example, `pyarrow` has a `concat_arrays` function, but as of 0.17, this -function is not implemented in the `arrow` R package. You can use `reticulate` +For example, pyarrow has a `concat_arrays()` function, but as of 0.17, this +function is not implemented in the arrow R package. You can use reticulate to use it efficiently. ```r @@ -111,7 +110,7 @@ a_and_b ## ] ``` -Now you have a single `Array` in R. +Now you have a single Array in R. ## How this works @@ -122,8 +121,8 @@ accessing the same internal Arrow memory buffers. ## Arrow object types -For more information about Arrow object types see the Internals section of -the Arrow vignette by calling: +For more information about Arrow object types see the "Internals" section of +the "arrow" vignette: ```r vignette("arrow", package = "arrow") From 9fdc08d119c0d9ddca9ad8e77f95f6f4587cb9a3 Mon Sep 17 00:00:00 2001 From: Alenka Frim Date: Thu, 9 Dec 2021 15:31:06 +0100 Subject: [PATCH 8/9] Update r/vignettes/python.Rmd Co-authored-by: Nic Crane --- r/vignettes/python.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/r/vignettes/python.Rmd b/r/vignettes/python.Rmd index 5710834aac3..891ac81cb87 100644 --- a/r/vignettes/python.Rmd +++ b/r/vignettes/python.Rmd @@ -54,7 +54,7 @@ use_virtualenv("arrow-env") pa <- import("pyarrow") ``` -The pyarrow package includes support for sharing Arrow `Array` and `RecordBatch` +The arrow R package include support for sharing Arrow `Array` and `RecordBatch` objects in-process between R and Python. For example, let's create an `Array` in pyarrow. From e5ac76ed9cbb31abe9c84c2a7f2d5edb1a57fc6d Mon Sep 17 00:00:00 2001 From: Alenka Frim Date: Thu, 9 Dec 2021 15:42:00 +0100 Subject: [PATCH 9/9] Update r/vignettes/python.Rmd Co-authored-by: Nic Crane --- r/vignettes/python.Rmd | 1 + 1 file changed, 1 insertion(+) diff --git a/r/vignettes/python.Rmd b/r/vignettes/python.Rmd index 891ac81cb87..9600cdf5ffd 100644 --- a/r/vignettes/python.Rmd +++ b/r/vignettes/python.Rmd @@ -11,6 +11,7 @@ The arrow package provides [reticulate](https://rstudio.github.io/reticulate/) m R and Python in the same process. This document provides a brief overview. Why you might want to use `pyarrow`? + * To use some Python functionality that is not yet implemented in R, for example, the `concat_arrays` function. * To transfer Python objects into R, for example, a Pandas dataframe into an R Arrow Array.