From 1d52b749f9ce63f66254efeaea7be6f84f0a769a Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Tue, 21 Sep 2021 16:56:11 +0100 Subject: [PATCH 1/3] Re-order conversion tables, restart numbering, add explanation of float64 == double --- r/vignettes/arrow.Rmd | 66 ++++++++++++++++++++++++++++--------------- 1 file changed, 43 insertions(+), 23 deletions(-) diff --git a/r/vignettes/arrow.Rmd b/r/vignettes/arrow.Rmd index 4c5da501435..270287816f4 100644 --- a/r/vignettes/arrow.Rmd +++ b/r/vignettes/arrow.Rmd @@ -125,22 +125,31 @@ In the tables, entries with a `-` are not currently implemented. |--------------------------|------------| | logical | boolean | | integer | int32 | -| double ("numeric") | float64 | -| character | utf8^1^ | +| double ("numeric") | float64^1^ | +| character | utf8^2^ | | factor | dictionary | | raw | uint8 | | Date | date32 | | POSIXct | timestamp | | POSIXlt | struct | | data.frame | struct | -| list^2^ | list | +| list^3^ | list | | bit64::integer64 | int64 | | difftime | time32 | | vctrs::vctrs_unspecified | null | -^1^: If the character vector exceeds 2GB of strings, it will be converted to a `large_utf8` Arrow type -^2^: Only lists where all elements are the same type are able to be translated to Arrow list type (which is a "list of" some type). + +^1^: `float64` and `double` are the same concept and data type in Arrow C++; +however, only `float64()` is used in arrow as the function `double()` already +exists in base R + +^2^: If the character vector exceeds 2GB of strings, it will be converted to a +`large_utf8` Arrow type + +^3^: Only lists where all elements are the same type are able to be translated +to Arrow list type (which is a "list of" some type). + ### Arrow to R @@ -150,42 +159,53 @@ In the tables, entries with a `-` are not currently implemented. | int8 | integer | | int16 | integer | | int32 | integer | -| int64 | integer^3^ | +| int64 | integer^1^ | | uint8 | integer | | uint16 | integer | -| uint32 | integer^3^ | -| uint64 | integer^3^ | -| float16 | - | +| uint32 | integer^1^ | +| uint64 | integer^1^ | +| float16 | -^2^ | | float32 | double | | float64 | double | | utf8 | character | -| binary | arrow_binary ^5^ | -| fixed_size_binary | arrow_fixed_size_binary ^5^ | +| large_utf8 | character | +| binary | arrow_binary ^3^ | +| large_binary | arrow_large_binary ^3^ | +| fixed_size_binary | arrow_fixed_size_binary ^3^ | | date32 | Date | | date64 | POSIXct | | time32 | hms::difftime | | time64 | hms::difftime | | timestamp | POSIXct | -| duration | - | +| duration | -^2^ | | decimal | double | | dictionary | factor^4^ | -| list | arrow_list ^6^ | -| fixed_size_list | arrow_fixed_size_list ^6^ | +| list | arrow_list ^5^ | +| large_list | arrow_large_list ^5^ | +| fixed_size_list | arrow_fixed_size_list ^5^ | | struct | data.frame | | null | vctrs::vctrs_unspecified | -| map | - | -| union | - | -| large_utf8 | character | -| large_binary | arrow_large_binary ^5^ | -| large_list | arrow_large_list ^6^ | +| map | -^2^ | +| union | -^2^ | + +^1^: These integer types may contain values that exceed the range of R's +`integer` type (32-bit signed integer). When they do, `uint32` and `uint64` are +converted to `double` ("numeric") and `int64` is converted to +`bit64::integer64`. This conversion can be disabled (so that `int64` always +yields a `bit64::integer64` vector) by setting `options(arrow.int64_downcast = FALSE)`. + +^2^: Some Arrow data types do not have an R equivalent and will raise an error +if cast to or mapped to via a schema. See +[this discussion section](#no-compat-type) for an example of this. -^3^: These integer types may contain values that exceed the range of R's `integer` type (32-bit signed integer). When they do, `uint32` and `uint64` are converted to `double` ("numeric") and `int64` is converted to `bit64::integer64`. This conversion can be disabled (so that `int64` always yields a `bit64::integer64` vector) by setting `options(arrow.int64_downcast = FALSE)`. +^3^: `arrow*_binary` classes are implemented as lists of raw vectors. -^4^: Due to the limitation of R `factor`s, Arrow `dictionary` values are coerced to string when translated to R if they are not already strings. +^4^: Due to the limitation of R factors, Arrow `dictionary` values are coerced +to string when translated to R if they are not already strings. -^5^: `arrow*_binary` classes are implemented as lists of raw vectors. +^5^: `arrow*_list` classes are implemented as subclasses of `vctrs_list_of` +with a `ptype` attribute set to what an empty Array of the value type converts to. -^6^: `arrow*_list` classes are implemented as subclasses of `vctrs_list_of` with a `ptype` attribute set to what an empty Array of the value type converts to. ### R object attributes From c367f046746791c4d6a930eb5aacd0da984ad0d0 Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Mon, 4 Oct 2021 08:51:23 +0100 Subject: [PATCH 2/3] Remove extraneous reference --- r/vignettes/arrow.Rmd | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/r/vignettes/arrow.Rmd b/r/vignettes/arrow.Rmd index 270287816f4..3271462cf8e 100644 --- a/r/vignettes/arrow.Rmd +++ b/r/vignettes/arrow.Rmd @@ -195,8 +195,7 @@ converted to `double` ("numeric") and `int64` is converted to yields a `bit64::integer64` vector) by setting `options(arrow.int64_downcast = FALSE)`. ^2^: Some Arrow data types do not have an R equivalent and will raise an error -if cast to or mapped to via a schema. See -[this discussion section](#no-compat-type) for an example of this. +if cast to or mapped to via a schema. ^3^: `arrow*_binary` classes are implemented as lists of raw vectors. From 2d90eefdf8c7aa259e6ca2d54a9e1fee11aafb86 Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Wed, 6 Oct 2021 12:51:51 +0100 Subject: [PATCH 3/3] add "currently" --- r/vignettes/arrow.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/r/vignettes/arrow.Rmd b/r/vignettes/arrow.Rmd index 3271462cf8e..2451a3ec497 100644 --- a/r/vignettes/arrow.Rmd +++ b/r/vignettes/arrow.Rmd @@ -194,7 +194,7 @@ converted to `double` ("numeric") and `int64` is converted to `bit64::integer64`. This conversion can be disabled (so that `int64` always yields a `bit64::integer64` vector) by setting `options(arrow.int64_downcast = FALSE)`. -^2^: Some Arrow data types do not have an R equivalent and will raise an error +^2^: Some Arrow data types do not currently have an R equivalent and will raise an error if cast to or mapped to via a schema. ^3^: `arrow*_binary` classes are implemented as lists of raw vectors.