Skip to content

Conversation

@romainfrancois
Copy link
Contributor

@romainfrancois romainfrancois commented Jun 14, 2019

We don't have the reverse operation yet (convert from an R data structure) to a list array, so those aren't easy to make, but e.g. we can get some with the json reader:

library(arrow, warn.conflicts = FALSE)

tf <- tempfile()
writeLines('
    { "hello": 3.5, "world": false, "yo": "thing", "arr": [1, 2, 3], "nuf": {} }
    { "hello": 3.25, "world": null, "arr": [2], "nuf": null }
    { "hello": 3.125, "world": null, "yo": "\u5fcd", "arr": [], "nuf": { "ps": 78 } }
    { "hello": 0.0, "world": true, "yo": null, "arr": null, "nuf": { "ps": 90 } }
  ', tf)

tab1 <- read_json_arrow(tf, as_tibble = FALSE)
list_array <- tab1$column(3L)$data()$chunk(0)
list_array
#> arrow::ListArray 
#> [
#>   [
#>     1,
#>     2,
#>     3
#>   ],
#>   [
#>     2
#>   ],
#>   [],
#>   null
#> ]
list_array$values
#> function() `arrow::Array`$dispatch(ListArray__values(self))
#> <environment: 0x7f93c3b5cf88>
list_array$value_length(0)
#> [1] 3
list_array$value_offset(0)
#> [1] 0
list_array$raw_value_offsets()
#> [1] 0 3 4 4

list_array$as_vector()
#> [[1]]
#> integer64
#> [1] 1 2 3
#> 
#> [[2]]
#> integer64
#> [1] 2
#> 
#> [[3]]
#> integer64
#> character(0)
#> 
#> [[4]]
#> NULL

Created on 2019-06-14 by the reprex package (v0.3.0.9000)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a relatively costly thing to do. I'm not sure how we can escape it. Do you keep a reference to the generated Slice/Array, or does it get consumed and transformed in something else. See #4366 (comment) for reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

::Allocate() makes an R list, and the various ::Ingest.*() fill that list.

I'll look if/how I can do this without actually calling Slice(), might be easy enough to just scan through values_array I guess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenging part is dealing with the bitmap and null_count.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a huge timesink, I think it's ok for now, but be wary that this will be a problem on large Array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I don't think it's too much trouble to skip using Slice and use the bitmap of the values_array etc ... I'll have a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 yeah it's about the Array__as_vector(slice) which needs slice as a proper array already. The Converter api is designed to ingest all of an array, so I guess it would need some extra methods to only ingest a slice of an array without. I'll open another issue for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this file very error prone to read. I'd recommend going selective import route.

arrowExports.cpp # with ifdef import of impl/mock depending on ARROW_R_WITH_ARROW
arrowExports_impl.cpp
arrowExports_mock.cpp

Don't do this in this PR, open a followup ticket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I would expect anyone to actually read that file (hence it is supposed to be hidden in pull requests, ...) but I've open that issue:

https://issues.apache.org/jira/browse/ARROW-5627

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding support for FixedSizeList?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is new to me, that probably is another issue/pr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, but the code will be similar, almost equal, FixedSizeList also export value_offset.

r/src/array.cpp Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are "low-level" functions, they don't perform any bound checking, you can segfault with the wrong i. Do they need to be exported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷‍♂ what's the rule about what is export worthy ?

@codecov-io
Copy link

codecov-io commented Jun 17, 2019

Codecov Report

Merging #4575 into master will decrease coverage by 8.43%.
The diff coverage is 42.68%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4575      +/-   ##
==========================================
- Coverage   83.42%   74.98%   -8.44%     
==========================================
  Files         209       56     -153     
  Lines       16062     3358   -12704     
  Branches     1253        0    -1253     
==========================================
- Hits        13399     2518   -10881     
+ Misses       2384      840    -1544     
+ Partials      279        0     -279
Impacted Files Coverage Δ
r/R/array.R 63.63% <0%> (-11.37%) ⬇️
r/R/List.R 100% <100%> (ø) ⬆️
r/src/datatype.cpp 75.49% <100%> (+1%) ⬆️
r/src/arrowExports.cpp 72.46% <26.66%> (-1.44%) ⬇️
r/R/arrowExports.R 72.32% <28.57%> (-1.42%) ⬇️
r/src/array.cpp 64.4% <8.33%> (-14.76%) ⬇️
r/src/array__to_vector.cpp 76.88% <80.95%> (+0.24%) ⬆️
go/arrow/ipc/writer.go
go/arrow/math/uint64_amd64.go
go/arrow/memory/memory_avx2_amd64.go
... and 150 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0631357...300f897. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants