Skip to content

Conversation

@thisisnic
Copy link
Member

No description provided.

@github-actions
Copy link

@thisisnic thisisnic force-pushed the arrow-11787-write_csv branch from 3401d83 to 3239575 Compare April 27, 2021 14:13
@thisisnic thisisnic changed the title ARROW-11787: [R] Implement write csv [WIP] ARROW-11787: [R] Implement write csv Apr 27, 2021
@thisisnic thisisnic marked this pull request as ready for review April 27, 2021 14:13
r/R/csv.R Outdated
#' @docType class
#' @usage NULL
#' @format NULL
#' @description `CsvReadOptions`, `CsvParseOptions`, `CsvConvertOptions`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description doesn't look quite correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to documenting this here (and cleaning up the bad copy-paste) would be to document it with CsvReadOptions et al.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated now

Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done. Some suggestions/leading questions.

r/R/csv.R Outdated
#' @docType class
#' @usage NULL
#' @format NULL
#' @description `CsvReadOptions`, `CsvParseOptions`, `CsvConvertOptions`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to documenting this here (and cleaning up the bad copy-paste) would be to document it with CsvReadOptions et al.

r/R/csv.R Outdated
#' }
#' @include arrow-package.R
write_csv_arrow <- function(x,
sink,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like indentation is slightly off here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated now

r/R/csv.R Outdated
Comment on lines 643 to 644
assert_that(length(include_header) == 1)
assert_that(is.logical(include_header))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you remove these--will the C++ static typing validate this enough?

What happens if include_header = NA?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed those as totally sensible errors from the C++ as you say. If include_header = NA with the assert_that removed, no header is written.

r/R/csv.R Outdated
assert_that(length(include_header) == 1)
assert_that(is.logical(include_header))

write_options = CsvWriteOptions$create(include_header, batch_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
write_options = CsvWriteOptions$create(include_header, batch_size)
write_options <- CsvWriteOptions$create(include_header, batch_size)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

r/R/csv.R Outdated
x <- Table$create(x)
}

assert_is(x, c("Table", "RecordBatch"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert_is(x, c("Table", "RecordBatch"))
assert_is(x, "ArrowTabular")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


})


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add tests for handling bad inputs too. Also might make more sense to put the writing tests at the bottom of the test file instead of the top.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


expect_identical(tbl_in, tbl_expected)

skip("Doesn't yet work with date columns due to ARROW-12540")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to test the file-with-dates in every combination of parameters, just the first one is sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - updated.

expect_identical(tbl_in, tbl_expected)
})

test_that("Write a CSV file with different batch sizes", {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this testing? What does batch_size do? It doesn't look like there is an observable difference in the output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch size dictates how much data is buffered when translating to CSV

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the output will be the same, but what's happening internally will be different. I included it as I wanted to make sure I could pass through the param, but I guess it's C++ functionality. Should I remove the tests for the different batch sizes and just make sure I can pass through the param once?

Copy link
Member

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Let's try moving the validation like this, and assuming the tests pass I'll merge (or someone else can)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants