-
Notifications
You must be signed in to change notification settings - Fork 1.1k
allow to read non-standard CSV #326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
split into build_csv_reader, from_csv_reader add escape, quote, terminator arg to build_csv_reader
schema inference support for non-standard CSV add fn infer_file_schema_with_csv_options add fn infer_reader_schema_with_csv_options ReaderBuilder support for non-standard CSV add escape, quote, terminator field add fn with_escape, with_quote, with_terminator change ReaderBuilder::build for non-standard CSV
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution @kazuk
This code looks really nice and clean. 👍
The only thing that I think is missing from this PR is some basic tests -- specifically to show everything hooked together properly and the options get to the csv reader.
I think such tests would be of most value to ensure that as we change this code in the future, we don't accidentally break this new functionality. Give we are delegating to the implementation in the csv reader crate I don't think we need to test a large number of corner cases -- just that we can read a CSV file with a non default escape, quote and terminator character
Thanks again!
Codecov Report
@@ Coverage Diff @@
## master #326 +/- ##
==========================================
- Coverage 82.52% 82.50% -0.03%
==========================================
Files 162 162
Lines 44007 44036 +29
==========================================
+ Hits 36316 36331 +15
- Misses 7691 7705 +14
Continue to review full report at Codecov.
|
|
Thank you for review.
I add a basic tests for detects feature broken. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me -- thank you @kazuk
|
The MIRI CI check is not related to this PR (edit #345) However, the lint failure seems to be: https://github.com/apache/arrow-rs/pull/326/checks?check_run_id=2658268180 I think it can be resolved by running |
|
oh!, Thanks. I run |
|
The MIRI failure is unrelated to this PR: #345 |
|
Thanks again @kazuk |
* refactor Reader::from_reader split into build_csv_reader, from_csv_reader add escape, quote, terminator arg to build_csv_reader * add escape,quote,terminator field to ReaderBuilder schema inference support for non-standard CSV add fn infer_file_schema_with_csv_options add fn infer_reader_schema_with_csv_options ReaderBuilder support for non-standard CSV add escape, quote, terminator field add fn with_escape, with_quote, with_terminator change ReaderBuilder::build for non-standard CSV * minimize API change * add tests add #[test] fn test_non_std_quote add #[test] fn test_non_std_escape add #[test] fn test_non_std_terminator * apply cargo fmt
* refactor Reader::from_reader split into build_csv_reader, from_csv_reader add escape, quote, terminator arg to build_csv_reader * add escape,quote,terminator field to ReaderBuilder schema inference support for non-standard CSV add fn infer_file_schema_with_csv_options add fn infer_reader_schema_with_csv_options ReaderBuilder support for non-standard CSV add escape, quote, terminator field add fn with_escape, with_quote, with_terminator change ReaderBuilder::build for non-standard CSV * minimize API change * add tests add #[test] fn test_non_std_quote add #[test] fn test_non_std_escape add #[test] fn test_non_std_terminator * apply cargo fmt Co-authored-by: kazuhiko kikuchi <kazuk.dll@kazuk.jp>
Which issue does this PR close?
Closes #315.
Rationale for this change
What changes are included in this PR?
Reader::from_readersplit intoReader::build_csv_reader,Reader::from_csv_reader.Reader::build_csv_readerbuildscsv_crate::Reader,with all CSV reader options.Rader::from_csv_readerbuildsarrow::csv::reader::Readerforcsv_crate::Reader.add fn
infer_file_schema_with_csv_options.change
infer_file_schemacallsinfer_file_schema_with_csv_optionswith default options.add fn
infer_reader_schema_with_csv_options.change
infer_reader_schemacallsinfer_reader_schemawith default options.add
escapequoteterminatortoReaderBuilderReaderBuilder::builduses added options.Are there any user-facing changes?
currently minimized API change.
ReaderBuilder::escapeReaderBuilder::quoteReaderBuilder::terminatorplease concider make public for fn
infer_file_schema_with_csv_options,infer_reader_schema_with_csv_options,Reader::build_csv_reader,Reader::from_csv_reader.