[R] Arrow to R fails with embedded nuls in strings

Apologies if this issue isn't categorized or documented appropriately.  Please be gentle! :)

As a heavy R user that normally interacts with parquet files using SparklyR, I have recently decided to try to use arrow::read_parquet() on a few parquet files that were on my local machine rather than in hadoop.  I was not able to proceed after several various attempts due to embedded nuls.  For example:

try({df <- read_parquet('out_2019-09_data_1.snappy.parquet') })
Error in Table__to_dataframe(x, use_threads = option_use_threads()) : 
  embedded nul in string: 'INSTALL BOTH LEFT FRONT AND RIGHT FRONT  TORQUE ARMS\0 ARMS'

Is there a solution to this?

I have also hit roadblocks with embedded nuls in the past with csvs using data.table::fread(), but readr::read_delim() seems to handle them gracefully with just a warning after proceeding.

Apologies that I do not have a handy reprex. I don't know if I can even recreate a parquet file with embedded nuls using arrow if it won't let me read one in, and I can't share this file due to company restrictions.

Please let me know how I can be of any more help!

**Environment**: Windows 10
R 3.4.4

**Reporter**: [John Cassil](https://issues.apache.org/jira/browse/ARROW-6582)
**Assignee**: [Neal Richardson](https://issues.apache.org/jira/browse/ARROW-6582) / @nealrichardson
#### Related issues:
- [function read_parquet(*,as_data_frame=TRUE) fails when embedded nuls present. ](https://github.com/apache/arrow/issues/26314) (is duplicated by)
- [[R] Regression from 2.0.0 -> 3.0.0:  Null character in string prevents dataset from loading](https://github.com/apache/arrow/issues/27545) (is related to)
- [[R] Consider ways to make arrow.skip_nul option more user-friendly](https://github.com/apache/arrow/issues/27361) (is related to)
#### Original Issue Attachments:
- [embedded_nul.parquet](https://issues.apache.org/jira/secure/attachment/13019596/embedded_nul.parquet)
#### PRs and other links:
- [GitHub Pull Request #8365](https://github.com/apache/arrow/pull/8365)

<sub>**Note**: *This issue was originally created as [ARROW-6582](https://issues.apache.org/jira/browse/ARROW-6582). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R] Arrow to R fails with embedded nuls in strings #22939

Related issues:

Original Issue Attachments:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[R] Arrow to R fails with embedded nuls in strings #22939

Description

Related issues:

Original Issue Attachments:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions