Skip to content

fread: quotes in quoted string fields #1299

@berndbischl

Description

@berndbischl

Hi,

how do I properly encode / import string fields with double quotes in them?
The docs say:

character columns can be quoted (...,2,"Joe Bloggs",3.14,...) or not quoted (...,2,Joe Bloggs,3.14,...).

Due to the restrictions on unquoted char cols, my cols are always quoted (a sep can appear in them)

Thus, unescaped quotes may be present in a quoted field (...,2,"Joe, "Bloggs"",3.14,...) as well as
escaped quotes (...,2,"Joe ",Bloggs"",3.14,...). If an embedded quote is followed by the separator
inside a quoted field, the embedded quotes up to that point in that field must be balanced; e.g.
...,2,"www.blah?x="one",y="two"",3.14,....

Due to the restrictions on "normal dquotes" inside of the string, I have to escape them.

Now the problem is that this is imported correctly, as I want it:

"a", "b"
"x",  "my name is "joe""

See here

d = fread("test_dt.csv", header = FALSE, sep = ",", stringsAsFactors = FALSE, data.table = FALSE)
  V1                   V2
1  a                  "b"
2  x   "my name is "joe""

But this is what I have to use, but the backslashes used for quoting the extra dquotes now get doubled

File:

"a", "b"
"x",  "my name is \"joe\""

fread output:

  V1                       V2
1  a                      "b"
2  x   "my name is \\"joe\\""

Note: I have some control over the csv files, as I am already preprocessing them a bit. But I need a routine that works on general files, so in my string columns I have to expect arbitrary input.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions