ARROW-13852: [R] Handle Dataset schema metadata in ExecPlan #11183

nealrichardson · 2021-09-18T14:49:13Z

No description provided.

github-actions · 2021-09-18T14:49:30Z

https://issues.apache.org/jira/browse/ARROW-13852

jonkeane

This looks good — one comment about messaging if we do encounter non-R metadata it would be nice to at least flag that something is getting lost

jonkeane · 2021-09-20T19:53:59Z

r/R/query-engine.R

+  if (ncol(tab)) {
+    # Apply any column metadata from the original schema, where appropriate
+    original_schema <- source_data(.data)$schema
+    # TODO: do we care about other (non-R) metadata preservation?


Could we detect that it is there and warn/message that we've discarded it? It might be a little bit confusing, but I think that would be preferable to "Arrow lost all my metadata!"

We already do discard other metadata when you convert from Arrow to R. Check out the example parquet file in inst/, for example: it has "pandas" metadata. I suspect it would be more surprising/alarming if we started messaging about this metadata all of the sudden.

Hmm, Arrow to R conversion losing the metadata sounds like what I expected. TBH I am a little surprised that the following happens silently, since the tab below does have the pandas metadata there. Though that's been in Arrow for a while and if no one has complained, I guess it's not that important.

> tab <- read_parquet(system.file("v0.7.1.parquet", package = "arrow"), as_data_frame = FALSE) > > dplyr::select(tab, carat)$metadata NULL

Actually, on 5.0:

new <- tab %>% select(carat) %>% compute() new$metadata $pandas [1] "{\"index_columns\": [\"__index_level_0__\"], \"column_indexes\": [{\"name\": ...

I still don't think this is necessarily meaningful--especially when you're dealing with multiple files with potentially different metadata.

Closes apache#11183 from nealrichardson/exec-plan-metadata Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>

nealrichardson requested a review from jonkeane September 18, 2021 14:49

github-actions bot added the Component: R label Sep 18, 2021

nealrichardson added 2 commits September 20, 2021 15:08

Preserve original R schema metadata where appropriate

4a2b61d

Fix a couple of test failures and update metadata tests to testthat 3e

6f013f6

nealrichardson force-pushed the exec-plan-metadata branch from ea1cd73 to 6f013f6 Compare September 20, 2021 19:37

jonkeane approved these changes Sep 20, 2021

View reviewed changes

nealrichardson closed this in d0a4263 Sep 21, 2021

nealrichardson deleted the exec-plan-metadata branch September 21, 2021 12:43

asfimport mentioned this pull request Sep 21, 2021

[R] Handle Dataset schema metadata in ExecPlan #29473

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-13852: [R] Handle Dataset schema metadata in ExecPlan #11183

ARROW-13852: [R] Handle Dataset schema metadata in ExecPlan #11183

Uh oh!

nealrichardson commented Sep 18, 2021

Uh oh!

github-actions bot commented Sep 18, 2021

Uh oh!

jonkeane left a comment

Uh oh!

jonkeane Sep 20, 2021

Uh oh!

nealrichardson Sep 20, 2021

Uh oh!

jonkeane Sep 20, 2021

Uh oh!

nealrichardson Sep 21, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ARROW-13852: [R] Handle Dataset schema metadata in ExecPlan #11183

ARROW-13852: [R] Handle Dataset schema metadata in ExecPlan #11183

Uh oh!

Conversation

nealrichardson commented Sep 18, 2021

Uh oh!

github-actions bot commented Sep 18, 2021

Uh oh!

jonkeane left a comment

Choose a reason for hiding this comment

Uh oh!

jonkeane Sep 20, 2021

Choose a reason for hiding this comment

Uh oh!

nealrichardson Sep 20, 2021

Choose a reason for hiding this comment

Uh oh!

jonkeane Sep 20, 2021

Choose a reason for hiding this comment

Uh oh!

nealrichardson Sep 21, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants