Skip to content

Irreversible empty string handling by fread() and fwrite() #2214

@ezwelty

Description

@ezwelty

First off, thank you for this fantastic package. It effortlessly powers many of my data caving adventures.

Sometimes, it's necessary to distinguish between null (NA) and empty ("") strings, and I'm trying to establish a pipeline that preserves this distinction with minimal markup. This doesn't currently work. To work, fread() would need to distinguish between and "", and fwrite() ideally would quote empty strings when quote = "auto".

Consider this data.table:

dt <- data.table::data.table(chr = c(NA, "", "a"), num = c(NA, NA, 1))

Here is the fwrite output with quote="auto":

csv <- paste(
  capture.output(
    data.table::fwrite(dt, quote = "auto")
  ),
  collapse = "\n"
))
cat(csv)
chr,num
,
,
a,1

The empty string is not quoted, and thus indistinguishable from the null string. They are both read back in as empty strings:

data.table::fread(csv)
   chr num
1:      NA
2:      NA
3:   a   1

If instead we force quotes, the distinction is kept between the null and empty strings:

csv_quoted <- paste(
  capture.output(
    data.table::fwrite(dt, quote = TRUE)
  ),
  collapse = "\n"
)
cat(csv_quoted)
"chr","num"
,
"",
"a",1

However, there is no way to read them back in as such. Either they are both empty:

data.table::fread(csv_quoted)
   chr num
1:      NA
2:      NA
3:   a   1

Or both null:

data.table::fread(csv_quoted, na.strings = "")
   chr num
1:  NA  NA
2:  NA  NA
3:   a   1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions