-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10623: [CI][R] Version 1.0.1 breaks data.frame attributes when reading file written by 2.0.0 #9118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
ARROW-10623: [CI][R] Version 1.0.1 breaks data.frame attributes when reading file written by 2.0.0 #9118
Changes from all commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
babe5b7
Basic forwards compatibility testing
jonkeane f0ce0f1
Trying on GHA
jonkeane 8ef0c8d
run on all pushes
jonkeane 2dba6c7
hardcode some crossbowery
jonkeane 90ff23f
oops
jonkeane ba73af7
oh and needs too
jonkeane c792d4b
Don't cd
jonkeane 4102ec5
Try running directly on GHA
jonkeane fcf56cd
Make sure the dir exists
jonkeane bfb96a4
Add tests
jonkeane cdf162d
try a different reference
jonkeane ac71865
oops!
jonkeane a426014
clean up
jonkeane 596d94d
Can we skip known-bad versions?
jonkeane 83d2c07
What happens in 1.0.1?
jonkeane 6696029
oops
jonkeane 3f55255
Some more cleanup
jonkeane 0961b2b
Cleanup
jonkeane afb0aee
oops
jonkeane 350c483
Oy
jonkeane b4c23df
still more changes
jonkeane d9e7e61
Move workflow to crossbox
jonkeane 8b92050
Add a crossbow task
jonkeane 5cd59d5
Escape jinja
jonkeane 4814d4a
More indirection?
jonkeane faa9f37
Oops extra typo
jonkeane b6ec09f
A slightly different way of loading the crossbow...
jonkeane 3959ed3
Add feather tests
jonkeane afed5b6
More filetypes, will probably fail on 1.0.1 for streams + metadata.
jonkeane 71e74ea
Helpers need some copying too
jonkeane 54d8d2d
Add backwards compatibility testing
jonkeane 9871abc
Add skips if snappy isn't available
jonkeane 5cab1b2
Add feather files
jonkeane 70abb3e
Add apache licensing where it should be.
jonkeane 2323e8f
PR comments, add 0.17.0 fixture that exhibits the case mis-match on c…
jonkeane 8bcb580
Add skip for upcoming compatibility change.
jonkeane 5c9fe3f
Remove the skip now that ARROW-11163 is merged
jonkeane e134e60
Make a helper, test some (but not all) of the 0.17.0 reading.
jonkeane File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| # NOTE: must set "Crossbow" as name to have the badge links working in the | ||
| # github comment reports! | ||
| name: Crossbow | ||
|
|
||
| on: | ||
| push | ||
|
|
||
| jobs: | ||
| write-files: | ||
| name: "Write files" | ||
| runs-on: ubuntu-20.04 | ||
| strategy: | ||
| fail-fast: false | ||
| env: | ||
| ARROW_R_DEV: "TRUE" | ||
| RSPM: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest" | ||
| steps: | ||
| - name: Checkout Arrow | ||
| run: | | ||
| git clone --no-checkout {{ arrow.remote }} arrow | ||
| git -C arrow fetch -t {{ arrow.remote }} {{ arrow.branch }} | ||
| git -C arrow checkout FETCH_HEAD | ||
| git -C arrow submodule update --init --recursive | ||
| - name: Free Up Disk Space | ||
| shell: bash | ||
| run: arrow/ci/scripts/util_cleanup.sh | ||
| - name: Fetch Submodules and Tags | ||
| shell: bash | ||
| run: cd arrow && ci/scripts/util_checkout.sh | ||
| - uses: r-lib/actions/setup-r@v1 | ||
| - name: Install dependencies | ||
| run: | | ||
| install.packages(c("remotes", "glue", "sys")) | ||
| remotes::install_deps("arrow/r", dependencies = TRUE) | ||
| shell: Rscript {0} | ||
| - name: Install Arrow | ||
| run: | | ||
| cd arrow/r | ||
| R CMD INSTALL . | ||
| shell: bash | ||
| - name: Write files | ||
| run: | | ||
| cd arrow/r | ||
| R -f extra-tests/write-files.R | ||
| shell: bash | ||
|
|
||
| - name: Upload the parquet artifacts | ||
| uses: actions/upload-artifact@v2 | ||
| with: | ||
| name: files | ||
| path: arrow/r/extra-tests/files | ||
|
|
||
| read-files: | ||
| name: "Read files with Arrow {{ '${{ matrix.old_arrow_version }}' }}" | ||
| needs: [write-files] | ||
| runs-on: ubuntu-20.04 | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| old_arrow_version: | ||
| - "2.0.0" | ||
| - "1.0.1" | ||
| env: | ||
| RSPM: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest" | ||
| OLD_ARROW_VERSION: {{ '${{ matrix.old_arrow_version }}' }} | ||
| steps: | ||
| - name: Checkout Arrow | ||
| run: | | ||
| git clone --no-checkout {{ arrow.remote }} arrow | ||
| git -C arrow fetch -t {{ arrow.remote }} {{ arrow.branch }} | ||
| git -C arrow checkout FETCH_HEAD | ||
| git -C arrow submodule update --init --recursive | ||
| - uses: r-lib/actions/setup-r@v1 | ||
| - name: Install old Arrow | ||
| run: | | ||
| install.packages(c("remotes", "testthat")) | ||
| remotes::install_version("arrow", "{{ '${{ matrix.old_arrow_version }}' }}") | ||
| shell: Rscript {0} | ||
| - name: Setup our testing directory, copy only the tests to it. | ||
| run: | | ||
| mkdir -p extra-tests/files | ||
| cp arrow/r/extra-tests/helper*.R extra-tests/ | ||
| cp arrow/r/extra-tests/test-*.R extra-tests/ | ||
| - name: Download artifacts | ||
| uses: actions/download-artifact@v2 | ||
| with: | ||
| name: files | ||
| path: extra-tests/files | ||
| - name: Test reading | ||
| run: | | ||
| testthat::test_dir("extra-tests") | ||
| shell: Rscript {0} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,3 +23,4 @@ clang_format.sh | |
| ^autobrew$ | ||
| ^apache-arrow.rb$ | ||
| ^.*\.Rhistory$ | ||
| ^extra-tests | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,3 +17,4 @@ revdep/ | |
| vignettes/nyc-taxi/ | ||
| arrow_*.tar.gz | ||
| arrow_*.tgz | ||
| extra-tests/files | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| if_version <- function(version, op = `==`) { | ||
| op(packageVersion("arrow"), version) | ||
| } | ||
|
|
||
| skip_if_version_less_than <- function(version, msg) { | ||
| if(if_version(version, `<`)) { | ||
| skip(msg) | ||
| } | ||
| } | ||
|
|
||
| skip_if_version_equals <- function(version, msg) { | ||
| if(if_version(version, `==`)) { | ||
| skip(msg) | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,165 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| library(arrow) | ||
| library(testthat) | ||
|
|
||
| pq_file <- "files/ex_data.parquet" | ||
|
|
||
| test_that("Can read the file (parquet)", { | ||
| # We can read with no error, we assert metadata below | ||
| expect_error( | ||
| df <- read_parquet(pq_file), | ||
| NA | ||
| ) | ||
| }) | ||
|
|
||
| ### Parquet | ||
| test_that("Can see the metadata (parquet)", { | ||
| skip_if_version_less_than("2.0.0", "Version 1.0.1 can't read new version metadata.") | ||
|
|
||
| df <- read_parquet(pq_file) | ||
| expect_s3_class(df, "tbl") | ||
|
|
||
| expect_equal( | ||
| attributes(df), | ||
| list( | ||
| names = letters[1:4], | ||
| row.names = 1L, | ||
| top_level = list( | ||
| field_one = 12, | ||
| field_two = "more stuff" | ||
| ), | ||
| class = c("tbl_df", "tbl", "data.frame") | ||
| ) | ||
| ) | ||
|
|
||
| # column-level attributes | ||
| expect_equal(attributes(df$a), list(class = "special_string")) | ||
| expect_equal( | ||
| attributes(df$c), | ||
| list( | ||
| row.names = 1L, | ||
| names = c("c1", "c2", "c3"), | ||
| class = c("tbl_df", "tbl", "data.frame") | ||
| ) | ||
| ) | ||
| }) | ||
|
|
||
| ### Feather | ||
| for (comp in c("lz4", "uncompressed", "zstd")) { | ||
| feather_file <- paste0("files/ex_data_", comp, ".feather") | ||
|
|
||
| test_that(paste0("Can read the file (feather ", comp, ")"), { | ||
| # We can read with no error, we assert metadata below | ||
| expect_error( | ||
| df <- read_feather(feather_file), | ||
| NA | ||
| ) | ||
| }) | ||
|
|
||
| test_that(paste0("Can see the metadata (feather ", comp, ")"), { | ||
| skip_if_version_less_than("2.0.0", "Version 1.0.1 can't read new version metadata.") | ||
|
|
||
| df <- read_feather(feather_file) | ||
| expect_s3_class(df, "tbl") | ||
|
|
||
| expect_equal( | ||
| attributes(df), | ||
| list( | ||
| names = letters[1:4], | ||
| row.names = 1L, | ||
| top_level = list( | ||
| field_one = 12, | ||
| field_two = "more stuff" | ||
| ), | ||
| class = c("tbl_df", "tbl", "data.frame") | ||
| ) | ||
| ) | ||
|
|
||
| # column-level attributes | ||
| expect_equal(attributes(df$a), list(class = "special_string")) | ||
| expect_equal( | ||
| attributes(df$c), | ||
| list( | ||
| row.names = 1L, | ||
| names = c("c1", "c2", "c3"), | ||
| class = c("tbl_df", "tbl", "data.frame") | ||
| ) | ||
| ) | ||
| }) | ||
| } | ||
|
|
||
| test_that("Can read feather version 1", { | ||
| feather_v1_file <- "files/ex_data_v1.feather" | ||
|
|
||
| df <- read_feather(feather_v1_file) | ||
| expect_s3_class(df, "tbl") | ||
|
|
||
| expect_equal( | ||
| attributes(df), | ||
| list( | ||
| names = c("a", "b", "d"), | ||
| class = c("tbl_df", "tbl", "data.frame"), | ||
| row.names = 1L | ||
| ) | ||
| ) | ||
| }) | ||
|
|
||
| ### IPC Stream | ||
| stream_file <- "files/ex_data.stream" | ||
|
|
||
| test_that("Can read the file (parquet)", { | ||
| # We can read with no error, we assert metadata below | ||
| expect_error( | ||
| df <- read_ipc_stream(stream_file), | ||
| NA | ||
| ) | ||
| }) | ||
|
|
||
| test_that("Can see the metadata (stream)", { | ||
| skip_if_version_less_than("2.0.0", "Version 1.0.1 can't read new version metadata.") | ||
| df <- read_ipc_stream(stream_file) | ||
|
|
||
| expect_s3_class(df, "tbl") | ||
|
|
||
| expect_equal( | ||
| attributes(df), | ||
| list( | ||
| names = letters[1:4], | ||
| row.names = 1L, | ||
| top_level = list( | ||
| field_one = 12, | ||
| field_two = "more stuff" | ||
| ), | ||
| class = c("tbl_df", "tbl", "data.frame") | ||
| ) | ||
| ) | ||
|
|
||
| # column-level attributes | ||
| expect_equal(attributes(df$a), list(class = "special_string")) | ||
| expect_equal( | ||
| attributes(df$c), | ||
| list( | ||
| row.names = 1L, | ||
| names = c("c1", "c2", "c3"), | ||
| class = c("tbl_df", "tbl", "data.frame") | ||
| ) | ||
| ) | ||
| }) | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| library(arrow) | ||
|
|
||
| if (!dir.exists("extra-tests/files")) { | ||
| dir.create("extra-tests/files") | ||
| } | ||
|
|
||
| source("tests/testthat/helper-data.R") | ||
|
|
||
| write_parquet(example_with_metadata, "extra-tests/files/ex_data.parquet") | ||
|
|
||
| for (comp in c("lz4", "uncompressed", "zstd")) { | ||
| if(!codec_is_available(comp)) break | ||
|
|
||
| name <- paste0("extra-tests/files/ex_data_", comp, ".feather") | ||
| write_feather(example_with_metadata, name, compression = comp) | ||
| } | ||
|
|
||
| example_with_metadata_v1 <- example_with_metadata | ||
| example_with_metadata_v1$c <- NULL | ||
| write_feather(example_with_metadata_v1, "extra-tests/files/ex_data_v1.feather", version = 1) | ||
|
|
||
| write_ipc_stream(example_with_metadata, "extra-tests/files/ex_data.stream") |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about factoring this out to an
expect_something()helper in this file, such that your tests here are justexpect_something(df)?Of course, the simplest one would be
expect_identical(df, example_with_metadata), right? Though there are tradeoffs with that.