Skip to content

Conversation

@romainfrancois
Copy link
Contributor

struct arrays become data frame columns, i.e.

library(arrow, warn.conflicts = FALSE)
library(tibble)

tf <- tempfile()
writeLines('
    { "hello": 3.5, "world": false, "yo": "thing", "nuf": {} }
    { "hello": 3.25, "world": null, "nuf": null }
    { "hello": 3.125, "world": null, "yo": "\u5fcd", "nuf": { "ps": 78.0, "house": "Gryffindor"} }
    { "hello": 0.0, "world": true, "yo": null, "nuf": { "ps": 90.0, "house": "Slytherin" } }
  ', tf)

tab1 <- read_json_arrow(tf, as_tibble = FALSE)
array <- tab1$column(3)$data()$chunk(0)
array$field(0)
#> arrow::Array 
#> [
#>   null,
#>   null,
#>   78,
#>   90
#> ]
array$as_vector()
#>   ps      house
#> 1 NA       <NA>
#> 2 NA       <NA>
#> 3 78 Gryffindor
#> 4 90  Slytherin
as.data.frame(tab1)
#> # A tibble: 4 x 4
#>   hello world yo    nuf$ps $house    
#>   <dbl> <lgl> <chr>  <dbl> <chr>     
#> 1  3.5  FALSE thing     NA <NA>      
#> 2  3.25 NA    <NA>      NA <NA>      
#> 3  3.12 NA    忍        78 Gryffindor
#> 4  0    TRUE  <NA>      90 Slytherin

Created on 2019-06-17 by the reprex package (v0.3.0.9000)

@romainfrancois
Copy link
Contributor Author

This is similar to #4575, @fsaintjacques you might want to adopt this one too.

{ "hello": 3.125, "world": null, "yo": "\u5fcd", "arr": [], "nuf": { "ps": 78 } }
{ "hello": 0.0, "world": true, "yo": null, "arr": null, "nuf": { "ps": 90 } }
{ "hello": 3.125, "world": null, "yo": "\u5fcd", "arr": [], "nuf": { "ps": 78.0 } }
{ "hello": 0.0, "world": true, "yo": null, "arr": null, "nuf": { "ps": 90.0 } }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about omitting fields, e.g. {"hello":"hi"}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

library(arrow, warn.conflicts = FALSE)
library(tibble)

tf <- tempfile()
writeLines('
    { "hello": 3.5, "world": false, "yo": "thing" }
    { "hello": 2.3}
  ', tf)

read_json_arrow(tf, as_tibble = TRUE)
#> # A tibble: 2 x 3
#>   hello world yo   
#>   <dbl> <lgl> <chr>
#> 1   3.5 FALSE thing
#> 2   2.3 NA    <NA>

Maybe this read_json_arrow() needs more tests.

library(arrow, warn.conflicts = FALSE)
library(tibble)

tf <- tempfile()
writeLines('
    { "hello": 3.5, "world": false, "yo": "thing" }
    { "hello": "hi"}
  ', tf)

read_json_arrow(tf, as_tibble = TRUE)
#> Error in json___TableReader__Read(self): Invalid: Empty JSON file

but it's not really this pull request territory. I'm using this at the moment because we currently don't have an R way to create list arrays or struct arrays, i.e. we need to be able to go the other direction with e.g.

library(arrow, warn.conflicts = FALSE)
library(tibble)

array(list(1:3, 4:5), type = list_of(int32()))
#> Error in Array__from_vector(x, type): NotImplemented: type not implemented

which is the purpose of this one: https://issues.apache.org/jira/browse/ARROW-3809?filter=12344983

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an error of mine, I meant {"hello": 1.2} not a bad type. But good if it uncovered an error :)

@codecov-io
Copy link

Codecov Report

Merging #4593 into master will decrease coverage by 12.76%.
The diff coverage is 92.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #4593       +/-   ##
==========================================
- Coverage   88.57%   75.8%   -12.77%     
==========================================
  Files         860      56      -804     
  Lines      108022    3278   -104744     
  Branches     1253       0     -1253     
==========================================
- Hits        95678    2485    -93193     
+ Misses      12065     793    -11272     
+ Partials      279       0      -279
Impacted Files Coverage Δ
r/src/arrow_types.h 96% <ø> (ø) ⬆️
r/src/datatype.cpp 74.48% <100%> (+1.08%) ⬆️
r/src/symbols.cpp 54.54% <100%> (+4.54%) ⬆️
r/src/array.cpp 79.16% <100%> (+4.8%) ⬆️
r/R/ChunkedArray.R 100% <100%> (ø) ⬆️
r/R/arrowExports.R 73.73% <100%> (+0.61%) ⬆️
r/R/Struct.R 100% <100%> (ø) ⬆️
r/src/arrowExports.cpp 73.9% <100%> (+0.67%) ⬆️
r/R/array.R 75% <60%> (+3%) ⬆️
r/src/array__to_vector.cpp 76.63% <86.48%> (+1.16%) ⬆️
... and 804 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9425831...b1f087e. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants