-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-11787: [R] Implement write csv #10141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ll WriteCSV functions
3401d83 to
3239575
Compare
r/R/csv.R
Outdated
| #' @docType class | ||
| #' @usage NULL | ||
| #' @format NULL | ||
| #' @description `CsvReadOptions`, `CsvParseOptions`, `CsvConvertOptions`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description doesn't look quite correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to documenting this here (and cleaning up the bad copy-paste) would be to document it with CsvReadOptions et al.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated now
nealrichardson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done. Some suggestions/leading questions.
r/R/csv.R
Outdated
| #' @docType class | ||
| #' @usage NULL | ||
| #' @format NULL | ||
| #' @description `CsvReadOptions`, `CsvParseOptions`, `CsvConvertOptions`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to documenting this here (and cleaning up the bad copy-paste) would be to document it with CsvReadOptions et al.
r/R/csv.R
Outdated
| #' } | ||
| #' @include arrow-package.R | ||
| write_csv_arrow <- function(x, | ||
| sink, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like indentation is slightly off here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated now
r/R/csv.R
Outdated
| assert_that(length(include_header) == 1) | ||
| assert_that(is.logical(include_header)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you remove these--will the C++ static typing validate this enough?
What happens if include_header = NA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed those as totally sensible errors from the C++ as you say. If include_header = NA with the assert_that removed, no header is written.
r/R/csv.R
Outdated
| assert_that(length(include_header) == 1) | ||
| assert_that(is.logical(include_header)) | ||
|
|
||
| write_options = CsvWriteOptions$create(include_header, batch_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| write_options = CsvWriteOptions$create(include_header, batch_size) | |
| write_options <- CsvWriteOptions$create(include_header, batch_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
r/R/csv.R
Outdated
| x <- Table$create(x) | ||
| } | ||
|
|
||
| assert_is(x, c("Table", "RecordBatch")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assert_is(x, c("Table", "RecordBatch")) | |
| assert_is(x, "ArrowTabular") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
r/tests/testthat/test-csv.R
Outdated
|
|
||
| }) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add tests for handling bad inputs too. Also might make more sense to put the writing tests at the bottom of the test file instead of the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
r/tests/testthat/test-csv.R
Outdated
|
|
||
| expect_identical(tbl_in, tbl_expected) | ||
|
|
||
| skip("Doesn't yet work with date columns due to ARROW-12540") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to test the file-with-dates in every combination of parameters, just the first one is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - updated.
r/tests/testthat/test-csv.R
Outdated
| expect_identical(tbl_in, tbl_expected) | ||
| }) | ||
|
|
||
| test_that("Write a CSV file with different batch sizes", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this testing? What does batch_size do? It doesn't look like there is an observable difference in the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch size dictates how much data is buffered when translating to CSV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the output will be the same, but what's happening internally will be different. I included it as I wanted to make sure I could pass through the param, but I guess it's C++ functionality. Should I remove the tests for the different batch sizes and just make sure I can pass through the param once?
nealrichardson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Let's try moving the validation like this, and assuming the tests pass I'll merge (or someone else can)
No description provided.